Peter P. Ware
Thomas W. Page, Jr.
Barry L. Nelson
Peter P. Ware
Department of Computer and Information Science,
The Ohio State University
Columbus, OH 43210, U.S.A.}
ware@cis.ohio-state.edu
http://www.cis.ohio-state.edu/~ware
Barry L. Nelson
Department of Industrial Engineering and Management Sciences,
Northwestern University
Evanston, IL 60208, U.S.A.
nelsonb@random.iems.nwu.edu
Thomas W. Page, Jr.
Department of Computer and Information Science,
The Ohio State University
Columbus, OH 43210, U.S.A.
page@cis.ohio-state.edu
ACM Transactions on Modeling and Computer Simulation
vol. 8, no. 3 (July 1998)
Paper (PostScript
588 KB)
Paper (GZipped PostScript
136 KB)
Papers only available to TOMACS subscribers and others authorized to access the ACM Digital Library.
This paper describes a method for analyzing, modeling and simulating a two-level arrival-counting process.
This method is particularly appropriate when the number of independent processes is large, as is the case in our
motivating application which requires analyzing and representing computer file system trace data for activity on
nearly 8,000 files. The method is also applicable to network trace data characterizing communication patterns
between pairs of computers.
We apply cluster analysis to separate the arrival process into groups or bursts of activity on a file. We then
characterize the arrival process in terms of the time between bursts of activity on a file, the time between file
events within bursts, and the number of events in a burst. Finally, we model these three components individually,
then reassemble the results to produce a synthetic trace generator. In
order to gauge the effectiveness of this method, we use synthetically generated (simulated) trace data produced
in this way to drive a discrete-event simulation of a distributed, replicated, file system.
We compare the results of the simulation driven by the synthetic trace with the same simulation driven by the original
trace data, and conclude that the synthetic data captures the essential characteristics of the empirical trace.
clustering, trace data, two-level arrival process, file access patterns
C.4 Computer Systems Organization,
Performance of Systems, Modeling techniques
G.3 Mathematics of Computing, Probability and Statistics
I.6.m Simulation and Modeling, Miscellaneous
replication, file system, synthetic traces