In the context of bi-objective ranking and selection (R&S), we derive (1) the asymptotically optimal simulation budget allocation, which we characterize as the solution to a bi-level optimization problem, and (2) an easily-implementable SCORE (Sampling Criteria for Optimization using Rate Estimators) sampling framework that approximates the optimal allocation when the number of design points, or systems, is large. Notably, our allocations account for dependence between the objectives and balance the probabilities associated with two types of misclassification error: misclassification by exclusion (MCE), and misclassification by inclusion (MCI). Like much of the R&S literature, our focus is on the case in which the simulation observations are bivariate normal. Toward (2) and assuming normality, we show that in a certain asymptotic regime, the optimal allocations to non-Pareto systems in (1) are inversely proportional to a single intuitive measure called the score. Perhaps surprisingly, in this asymptotic regime, the optimal allocation exclusively controls for MCI events. We also provide an iterative algorithm that repeatedly estimates the score to determine how the available simulation budget should be expended across the competing systems. Our numerical experience with the resulting SCORE framework is promising. For problems of up to ten thousand systems, we are able to identify the optimal allocation to negligible error within a few seconds on a personal computer.
Integration of emulation and simulation in virtual time requires that emulated execution bursts be ascribed a duration in virtual time, and that emulated execution and simulation executions be coordinated within this common virtual time basis. This paper shows how an open source tool TimeKeeper for coordinating emulations in virtual time can be integrated with four different existing software emulations/simulations: CORE, ns3, S3F and Emane. We describe for each of these the modifications made to the tools to support this integration, and examine experiments designed to assess the accuracy of the combined models.
This paper focuses on evaluating the probability that both components of a two-dimensional stochastic process will ever, but not necessarily at the same time, exceed some large level u. An important application is in determining the probability of large delays occurring in two correlated queues. Since exact analysis of this probability seems prohibitive, we focus on deriving asymptotics and on developing efficient simulations techniques. Large deviations theory is used to characterise logarithmic asymptotics. The second part of this paper focuses on efficient simulation techniques. Using ``nearest-neighbour random walk'' as an example, we first show that a ``naive'' implementation of importance sampling, based on the decay rate, is not asymptotically efficient. A different approach, which we call partitioned importance sampling, is developed and shown to be asymptotically efficient. The results are illustrated through various simulation experiments.
RCR Report for "ProPPA: Probabilistic Programming for Stochastic Dynamical Systems"
Simulation is commonly used to study the random behaviors of large-scale stochastic systems with correlated inputs. Since unknown input models are often estimated from finite real-world data, we introduce a factor-based Bayesian framework that can reduce the input estimation uncertainty and improve the assessment of system risk performance. Specifically, we develop a flexible multivariate input model that can capture important properties in the real-world data. A nonparametric Bayesian approach is used to model marginal distributions and it can capture the properties, including multi-modality and skewness. Since the correlation among components is often induced by latent common factors in many cases, we explore the factor structure of the underlying generative processes for the dependence. Both input and simulation estimation uncertainty are characterized by the posterior distributions. In addition, we interpret the latent factors and estimate their effects on the system performance, which could be used to support diagnostics and decision making for large-scale stochastic systems. Our approach is supported by both asymptotic theory and empirical study.
This article considers how a formal language and toolset for modelling Collective Adaptive Systems (CAS) can be applied to the mesoscopic modelling of pedestrian movement along paths expressed as a directed graph. The formal language used is Carma (Collective Adaptive Resource-sharing Markovian Agents), and the model is expressed in the Carma Specification Language (CaSL) to make use of the Carma Eclipse Plug-in for simulation and analysis. Carma and its tools have been developed to model CAS, which are systems consisting of interacting components that cooperate to achieve their shared goals or compete to achieve their individual goals. Our modelling approach is mesoscopic in nature because it does not not consider the movement of individuals in 2-dimensional space, nor does it consider the density of individuals using partial differential equations. In the models presented, pedestrians move along path segments at rates determined by the presence of other pedestrians, and make choices at the intersections of paths. The Carma modelling framework makes it straightforward to change the topology of the paths and other aspects of the model without rewriting the model. Pedestrians are expressed as Carma components and the information about the topology of the path network and the topography of the landscape can be expressed as separate functional and spatial aspects of the model.
Interactive, distributed, and embedded systems often behave stochastically, for example when inputs, message delays, or failures conform to a probability distribution. However, reasoning analytically about the behavior of complex stochastic systems is generally not scalable. While simulations of systems are commonly used in engineering practice, they have not traditionally been used to reason about formal specifications. Statistical model checking (SMC) addresses this weakness by using a simulation based approach to reason about precise properties specified in a probabilistic temporal logic. A specification for a communication system may state that within some time bound, the probability that the number of messages in a queue will be greater than 5 must be less than 0.01. Using SMC, executions of a stochastic system are first sampled, after which statistical techniques are applied to determine whether such a property holds. While the output of sample-based methods are not always correct, statistical inference can quantify the confidence in the result produced. In effect, SMC provides an alternative to analysis of properties of stochastic system using precise numerical methods, that can accommodate large state spaces. SMC techniques have been successfully applied to analyze large-scale systems in areas such as computer networking, security, and systems biology. In this paper, we survey SMC algorithms, techniques, and tools, while emphasizing current limitations and trade-offs between precision and scalability.
We introduce Path-ZVA: an efficient simulation technique for estimating the probability of reaching a rare goal state before a renewal state in a Markov chain. Standard (Monte Carlo) simulation techniques do not work well for rare events, so we use importance sampling; i.e., we change the probability measure governing the Markov chain such that transitions `towards' the goal state become more likely. To do this we need an idea of distance to the goal state, so some level of knowledge of the Markov chain is required. In this paper, we use graph analysis to obtain this knowledge. In particular, we focus on knowledge of the shortest paths (in terms of `rare' transitions) to the goal state. We show that only a subset of the (possibly huge) state space of the Markov chain needs to be considered. We compare our results to well-known importance sampling methods from the literature and demonstrate the large potential gains of our method.
Cloning is a technique to efficiently simulate a tree of multiple what-if scenarios that are unraveled during the course of a base simulation. However, cloned execution is highly challenging to realize on large, distributed memory computing platforms, due to the dynamic nature of computational load and the complex dependencies spanning the clone tree. We present the design and implementation of CloneX, our system for scalable cloning that efficiently and dynamically creates whole logical copies of a dynamic tree of simulations across a large parallel system without full physical duplication of computation and memory. We describe the conceptual framework, the algorithmic foundations, and a prototype interface and implementation of CloneX runtime scaled to supercomputing systems. The system is illustrated with example applications whose aggregate concurrency and memory demands for ensemble runs relative to a single run are increased by over two orders of magnitude. Our scalable cloning implementation (a) demonstrates a novel way to exploit very large scale computing systems for applications that fall short in scaling potential in the traditional sense, and (b) offers two to three orders of magnitude reduction in aggregate memory requirement relative to replicated runs. We evaluate the performance of our approach with three benchmarks -- heat diffusion, forest fire, and disease propagation models -- and present a performance study on upto 1024 GPUs, varying the key cloning parameters, namely, branching factors, branching levels, and fraction of common state between clone. Experiments show speed up of over two orders of magnitude compared to replicated runs.
Wireless Sensor Networks (WSNs) are important examples of Collective Adaptive System (CAS) which consist of a set of motes that are spatially distributed in an indoor or outdoor space. Each mote monitors its surrounding conditions, such as humidity, intensity of light, temperature, vibrations but also complex information such as images or small videos and cooperates with the whole set of motes forming the WSN in order to allow the routing process. The traffic in the WSN consists of packets which contain the data harvested by the motes and can be classified according to the type of information that they carry. One pivotal problem in WSNs is the bandwidth allocation among the motes. The problem is known to be challenging due to the reduced computational capacity of the motes, their energy consumption constraints and the fully decentralised network architecture. In this paper we introduce a novel algorithm to allocate the WSN bandwidth among the motes by taking into account the type of traffic they aim to send. The algorithm has a simple implementation and does not introduce extra control packets. Under the assumption of a mesh network and Poisson distributed harvested packets, we propose an analytical model for its performance evaluation that allows a designer to study the optimal configuration parameters. By an extensive set of simulations, we show that the analytical model accurately approximates the performance of networks that do not satisfy the assumptions. The algorithm is studied with respect to the achieved throughput and fairness.
Collective adaptive systems (CAS) often adopt cooperative operating strategies in order to run distributed decision-making mechanisms. Sometimes, their effectiveness massively relies on the collaborative nature of individuals' behavior. Stimulating cooperation while preventing selfish and malicious behaviors is the main objective of trust and reputation models. These models are largely used in distributed, peer-to-peer envinronments and, therefore, represent an ideal framework for improving the robustness, as well as security, of CAS. In this paper, we propose a formal framework for modeling and verifying trusted CAS. From the modeling perspective, mobility, adaptiveness and trust-based interaction represent the main ingredients used to define a flexible and easy-to-use paradigm. Concerning analysis, formal automated techniques based on equivalence and model checking support the prediction of the CAS behavior and the verification of the underlying trust and reputation models, with the specific aim of estimating robustness with respect to the typical attacks conducted against webs of trust.
Formal languages like process algebras have been shown to be effective tools in modelling a wide range of dynamic systems, providing a high-level description that is readily transformed into an executable model. However their application is sometimes hampered because the quantitative details of many real-world systems of interest are not fully known. In contrast, in machine learning there has been work to develop probabilistic programming languages, which provide system descriptions that incorporate uncertainty and leverage advanced statistical techniques to infer unknown parameters from observed data. Unfortunately current probabilistic programming languages are typically too low-level to be suitable for complex modelling. In this paper we present ProPPA, the first instance of the probabilistic programming paradigm being applied to a high-level, formal language, and its supporting tool suite. We explain the semantics of the language in terms of a quantitative generalisation of Constraint Markov Chains and describe the implementation of the language, discussing in some detail the different inference algorithms available, and their domain of applicability. We conclude by illustrating the use of the language on simple but non-trivial case studies: here ProPPA is shown to combine the elegance and simplicity of high-level formal modelling languages with an effective way of incorporating data, making it a promising tool for modelling studies.
The demand for provisioning, using and maintaining distributed computational resources is growing hand in hand with the quest for ubiquitous services. Centralized infrastructures such as cloud computing systems provide suitable solutions for many applications, but their scalability is limited by the server-side resources: an increase of clients and demands may compromise the cloud's ability to provide a satisfactory QoS, e.g., in case of latency-dependent applications. The volunteer cloud paradigm overcomes such limitation by encouraging clients to offer their own spare, perhaps unused, computational resources. Volunteer clouds are thus complex, large-scale, dynamic systems which demand for self-* capabilities to offer effective services, as well as modeling and analysis techniques to predict their behavior. We propose a novel holistic approach to improve the QoS of volunteer clouds supporting collaborative task execution services. We instantiate our approach by extending a recently proposed ant colony optimization algorithm for distributed task execution with a workload-based partitioning of the overlay network of the volunteer cloud. Finally, we evaluate our approach using simulation-based statistical analysis techniques on a workload benchmark provided by Google. Our results show that the proposed approach outperforms some traditional distributed task scheduling algorithms.