The risk posed by insider threats has usually been approached by analyzing the behavior of users solely in the cyber domain. In this paper, we show the viability of using physical movement logs, collected via a building access control system, together with an understanding of the layout of the building housing the system's assets, to detect malicious insider behavior that manifests itself in the physical domain. In particular, we propose a systematic framework that uses contextual knowledge about the system and its users, learned from historical data gathered from a building access control system, to select suitable models for representing movement behavior. We suggest two different models of movement behavior in this paper and evaluate their ability to represent normal user movement. We then explore the online usage of the learned models, together with knowledge about the layout of the building being monitored, to detect malicious insider behavior. Finally, we show the effectiveness of the developed framework using real-life data traces of user movement in railway transit stations.
As a distributed system, Hadoop heavily relies on the network to complete data processing jobs. While the traffic generated by Hadoop jobs is critical for job execution performance, the actual behaviour of Hadoop network traffic is still poorly understood. This lack of understanding greatly complicates research relying on Hadoop workloads. In this paper, we explore Hadoop traffic through empirical traces. We analyse the generated traffic of multiple types of MapReduce jobs, with varying input sizes, and cluster configuration parameters. We present Keddah, a toolchain for capturing, modelling and reproducing Hadoop traffic, for use with network simulators to better capturing the behaviour of Hadoop. By imitating the Hadoop traffic generation process and considering the Yarn resource allocation, Keddah can be used to create Hadoop traffic workloads, enabling reproducible Hadoop research in more realistic scenarios.
The goal of ranking and selection (R&S) procedures is to identify the best among a finite set of alternative systems evaluated by stochastic simulations, providing a probability guarantee on the quality of the solution. In order to solve large-scale R&S problems, especially in parallel computing platforms where variable numbers of cores might be used, it is helpful to be able to predict the running time of a given procedure for a given problem. Non-trivial issues arise due to the need to estimate the system configuration. We propose a set of methods for predicting the running time. Numerical results compare our predictions for several leading R&S procedures to actual running times.
Continuous-time Markov chains with alarms (ACTMCs) allow for alarm events that can be non-exponentially distributed. Within parametric ACTMCs, the parameters of alarm-event distributions are not given explicitly and can be subject of parameter synthesis. In this line, an algorithm is presented that solves the µ-optimal parameter synthesis problem for parametric ACTMCs with long-run average optimization objectives. The approach provided in this paper is based on a reduction of the problem to finding long-run average optimal policies in semi-Markov decision processes (semi-MDPs) and sufficient discretization of the parameter (i.e., action) space. Since the set of actions in the discretized semi-MDP can be very large, a straightforward approach based on an explicit action-space construction fails to solve even simple instances of the problem. The algorithm presented uses an enhanced policy iteration on symbolic representations of the action space. Soundness of the algorithm is established for parametric ACTMCs with alarm-event distributions that satisfy four mild assumptions, fulfilled by many kinds of distributions. Exemplifying proofs for the satisfaction of these requirements are provided for Dirac, uniform, exponential, Erlang, and Weibull distributions in particular. An experimental implementation shows that the symbolic technique substantially improves the efficiency of the synthesis algorithm and allows to solve instances of realistic size.
We propose a new method for estimating rare event probabilities when independent samples are available. It is assumed that the underlying probability measures satisfy a large deviations principle with scaling parameter " which we call temperature. We show that by combining samples at different temperatures, one can construct an estimator with greatly reduced variance. Although as presented here the method is not as broadly applicable as other rare event simulation methods, such as splitting or importance sampling, it does not require any problem-dependent constructions that can lead to worse performance than ordinary Monte Carlo if done incorrectly.
Using computer simulation to analyze large-scale discrete event systems requires repeated executions with various scenarios or parameters. Such repeated executions can induce significant redundancy in event processing when the modification from a prior scenario to a new scenario is relatively minor, and when the altered scenario influences only a small part of the simulation. For example, in a city-scale traffic simulation, an altered scenario of blocking one junction may only affect a small part of the city for considerable length of time. However, traditional simulation approaches would still repeat the simulation for the whole city even when the changes are minor. In this paper, we propose a new redundancy reduction technique for large-scale discrete event simulations, called exact-differential simulation, which simulates only the altered portions of scenarios and their influences in repeated executions while still achieving the same results as re-executing entire simulations. The paper presents the main concepts of the exact-differential simulation, the design of its algorithm, and a method to build an exact-differential simulation middleware, which supports multiple applications of discrete event simulation. We also evaluate our approach by using two case studies, the PHOLD benchmark and a traffic simulation of Tokyo.
In this paper we present Three-Valued Spatio-Temporal Logic (TSTL), which enriches the available spatio-temporal analysis of properties expressed in Signal Spatio-Temporal Logic (SSTL), to give further insight into the dynamic behaviour of systems. Our novel analysis starts from the estimation of satisfaction probabilities of given SSTL properties and allows the analysis of their temporal and spatial evolution. Moreover, in our verification procedure, we use a three-valued approach to include the intrinsic and unavoidable uncertainty related to the simulation-based statistical evaluation of the estimates; this can be also used to assess the appropriate number of simulations to use depending on the analysis needs. We present the syntax and three-valued semantics of TSTL and a specific extended monitoring algorithm to check the validity of TSTL formulas. We introduce a reliability requirement for TSTL monitoring and an automatic procedure to verify it. Two case studies demonstrate how TSTL broadens the application of spatio-temporal logics in realistic scenarios, enabling analysis of threat monitoring and privacy preservation based on spatial stochastic population models.
Predicting performance of an application running on parallel computing platforms is increasingly becoming important because of its influence on development time and resource management. However, predicting the performance with respect to parallel processes is complex for iterative, multi-stage applications. This research proposes a performance approximation approach FiM to predict the calculation time with FiM-Cal and communication time with FiM-Com, of an application running on a master-compute distributed framework. FiM-Cal consists of two key components that are coupled with each other: 1) Stochastic Markov Model to capture non-deterministic runtime that often depends on parallel resources, e.g., number of processes. 2) Machine Learning Model that extrapolates the parameters for calibrating our Markov model when we have changes in application parameters such as dataset. Along with the parallel calculation time these platforms also consumes some data transfer time to communicate between master and compute nodes. FiM-Com consists of a simulation queuing model to quickly estimate communication time. Our new modeling approach considers different design choices along multiple dimensions, namely (i) process level parallelism, (ii) distribution of cores on multi-processor platform, (iii) application related parameters, and (iv) characteristics of datasets. The major contribution of our prediction approach is that FiM is able to provide an accurate prediction of parallel computation time for the datasets which have much larger size than that of the training datasets.
Discrete event simulation (DES) is traditionally used as an offline tool to help users to carry out analysis for complex systems. As real time sensor data becomes more and more available, there is increasing interest of assimilating real time data into DES to achieve on-line simulation to support real time decision making. This paper presents a data assimilation framework that works with DES models. Solutions are proposed to address unique challenges associated with data assimilation for DES. A tutorial example of discrete event road traffic simulation is developed to demonstrate the developed framework as well as principles of data assimilation in general. This paper makes contributions to the DES community by developing a new data assimilation framework and a concrete example that helps readers to grasp the details of data assimilation for DES.
Accurate Modelling of a real world system with probabilistic behavior is a difficult task. Sensor noise and statistical estimations, among other errors, make the exact probability values impossible to obtain. In this paper, we consider the Interval Markov decision processes IMDPs, which generalise classical MDPs by having interval-valued transition probabilities. They provide a powerful modelling tool for probabilistic systems with an additional variation or uncertainty that prevents the knowledge of the exact transition probabilities. We investigate the problem of robust multi-objective synthesis for IMDPs and Pareto curve analysis of it. We study how to find a robust (randomized) strategy that satisfies multiple objectives involving rewards, reachability, and omega-regular properties against all possible resolutions of the transition probability uncertainties, as well as to generate an approximate Pareto curve providing an explicit view of the trade-offs between multiple objectives. We show that the multi-objective synthesis problem is PSPACE-hard and afterwards, we provide a value iteration-based decision algorithm to approximate the Pareto set of achievable points. We finally demonstrate the practical effectiveness of our proposed approaches by applying them on several case studies using a prototypical tool.
Statistical Model Checking (SMC) is an approximate verification method that overcomes the state space explosion problem for probabilistic systems by Monte Carlo simulations. Simulations might be however costly if many samples are required. It is thus necessary to implement efficient algorithms to reduce the sample size while preserving precision and accuracy. In the literature, some sequential schemes have been provided for the estimation of property occurrence based on predefined confidence and absolute or relative error. Nevertheless, these algorithms remain conservative and may result in huge sample sizes if the required precision standards are demanding. In this article, we compare some useful bounds and some sequential methods. We propose outperforming and rigorous alternative schemes, based on Massart bounds and robust confidence intervals. Our theoretical and empirical analysis show that our proposal reduces the sample size while providing guarantees on error bounds.
The existing performance evaluation methods for discrete-state stochastic models such as Petri nets either generate the reachability graph followed by a numerical solution of equations, or use some variant of simulation. Both methods have characteristic advantages and disadvantages depending on the size of the reachability graph and type of performance measure. The paper proposes a hybrid performance evaluation algorithm for Generalized Stochastic Petri Nets that integrates elements of both methods. It automatically adapts its behavior depending on the available size of main memory and number of model states. As such, the algorithm unifies simulation and numerical analysis in a joint framework. It is proved to result in an unbiased estimator whose variance tends to zero with increasing simulation time. The paper extends earlier results with an algorithm variant that starts with a small maximum number of particles and increases them during the run to increase the efficiency in cases which are rapidly solved by regular simulation. The algorithm's applicability is demonstrated through case studies.