The eXecutable Micro-Architectural Specification (xMAS) based Quality-of-Service (QoS) analysis developed in recent years finds an effective way to model the on-chip communication fabrics and enables performance bound analysis at the micro-architectural level. For this performance analysis of Network-on-Chip (NoC), model validation is essential to ensure correctness and accuracy. In order to facilitate the xMAS modeling and corresponding analysis validation, this work presents a unified platform based on xMAS in Simulink. The platform provides a friendly graphical interface for xMAS modeling and parameter setup by taking advantages of the Simulink modeling environment. Hierarchical model build-up and Verilog-HDL code generation are essentially supported to manage complex models and to conduct cycle-accurate bit-accurate simulations. Moreover, the heuristics aided model validation is incorporated with the Verilog-HDL simulations to evaluate the analysis results. We demonstrate the application as well as the work flow of the xMAS tool through a two-agent communication example.
Large-scale agent-based traffic simulation is computationally intensive. This triggers the need for using parallel computing techniques. The parallelization of agent-based traffic simulations is generally performed by decomposing the road network into spatial subregions. The agents in each subregion are executed by a Logical Process (LP). Synchronization of LPs is required due to data dependencies between them. Synchronization protocols can be synchronous or asynchronous. An asynchronous protocol allows LPs to progress asynchronously and communicate individually only when there are data dependencies. LPs use lookahead to indicate the time to synchronize with other LPs. The larger the lookahead values, the less frequent the synchronization operations are required. High synchronization overhead is still a major performance issue in large-scale parallel agent-based traffic simulations. In this paper, we develop two heuristics to increase the lookahead of LPs for an asynchronous protocol. Taking advantage of the intrinsic uncertainties in traffic simulation, lookahead is increased by allowing certain violation of data dependencies. In the first heuristic, some synchronization operations are simply skipped. In the second heuristic, the temporal resolution of the models of those agents that cause data dependencies is reduced. Skipping synchronization operations or decreasing the temporal resolution of models should not alter the simulation results statistically. The efficiency of the proposed heuristics is investigated in the parallel agent-based traffic simulator SEMSim Traffic. Experiment results showed that compared to the existing methods the heuristics are able to reduce the running time of the parallel simulation without sacrificing the statistical accuracy of the simulation.
Simulation cloning is an efficient way to analyze multiple configurations in a parameter exploration task. A simulation model usually contains a set of tunable parameters for exploring different configurations of a system. To evaluate different design alternatives, multiple simulation instances need be launched, each evaluating a different parameter configuration. It usually takes a considerable amount of time to execute these simulation instances. Simulation cloning is proposed to reuse computations among simulation instances and to shorten the overall execution time. It is a challenging task to design cloning strategies to explore the computation sharing among simulation instances while maintaining the correctness of execution. In this paper, we propose two ABS cloning strategies, the top-down cloning strategy and the bottom-up cloning strategy. The top-down cloning strategy is initially designed and can only be applied to limited scenarios. The bottom-up cloning strategy is an improved strategy to overcome the limitation of the top-down cloning strategy. In the experiments, the effectiveness of the two strategies is analyzed. A large-scale ABS parameter exploration task is performed to show the performance advantages and generality of the bottom-up cloning strategy.
The domain-specific modeling and simulation language ML-Rules is aimed at facilitating the description of cell biological systems at different levels of organization. Model states are chemical solutions that consist of dynamically nested, attributed entities. The model dynamics are described by rules that are constrained by arbitrary functions, which can operate on the entities' attributes, (nested) solutions, and the reaction kinetics. Thus, ML-Rules supports an expressive hierarchical, variable structure modeling of cell biological systems. The formal syntax and semantics of ML-Rules show its being firmly rooted in continuous time Markov chains. In addition to a generic stochastic simulation algorithm for ML-Rules, we introduce several specialized algorithms that are able to handle subclasses of ML-Rules more efficiently. The algorithms are compared in a performance study, leading to conclusions on the relation between expressive power and computational complexity of rule-based modeling languages.
As parallel and distributed systems are evolving towards extreme scale, e.g., high-performance computing systems involve millions of cores and billion-way parallelism, and high-capacity storage systems require efficient access to petabyte or exabyte of data, many new challenges are posed on designing and deploying next generation interconnection communication networks in these systems. Fat-tree networks have been widely used in both data centers and HPC systems in the past decades and are promising candidates of the next-generation extreme-scale networks. In this paper, we present FatTreeSim, a simulation framework that supports modeling and simulation of extreme-scale fat-tree networks with the goal of understanding the design constraints of next-generation HPC and distributed systems, and aiding the design and perfor- mance optimization of the applications running on these systems. We have systematically experimented FatTreeSim on Emulab and Blue Gene/Q, and analyzed the scalability and fidelity of FatTreeSim with var- ious network configurations. On the Blue Gene/Q Mira, FatTreeSim can achieve a peak performance of 305 million events per second using 16,384 cores. Finally, we have applied FatTreeSim to simulate several large-scale Hadoop YARN applications to demonstrate its usability.
Time Warp is the reference synchronization protocol for parallel speculative processing of discrete event simulation models. Although PDES platforms relying on the baseline Time Warp specification already allow for exploiting parallelism, several techniques have been proposed to further favor performance. Among them we find optimized approaches for state recoverability, and techniques for load balancing or (dynamically) controlling the speculation degree, the latter being specifically targeted at reducing the incidence of causality errors leading to waste of computation. However, in state of the art Time Warp systems, events are processed like batch-jobs, say with no reliance on any preemption support, which may prevent the possibility to promptly react to the injection of higher priority (say lower timestamp) events. Delaying the processing of these events may, in turn, give rise to higher incidence of incorrect speculation. In this article we present the design and realization of a time-sharing Time Warp system, to be run on multi-core Linux machines, which makes systematic use of event preemption in order to dynamically reassign the CPU to higher priority events/tasks. Our proposal is based on a truly dual mode execution, application vs platform, which includes a timer-interrupt based support for bringing control back to platform mode for possible CPU reassignment according to very fine grain periods. The latter facility is offered by an ad-hoc timer-interrupt management module for Linux, which we release, together with the overall time-sharing support, within the open source ROOT-Sim platform. An experimental assessment is presented showing the effectiveness of our approach.