Monte Carlo Simulation — a practical guide | by Robert Kwiatkowski | Towards Data Science

Monte Carlo Simulation — a practical guide

A versatile method for parameters estimation. Exemplary implementation in Python programming language.

Robert Kwiatkowski
Towards Data Science

--

Image by 15299 from Pixabay modified by author

Monte Carlo Simulation (or Method) is a probabilistic numerical technique used to estimate the outcome of a given, uncertain (stochastic) process. This means it’s a method for simulating events that cannot be modelled implicitly. This is usually a case when we have a random variables in our processes.

In this article, I give you a brief background of this technique, I show what steps you have to follow to implement it and, at the end, there will be two examples of a problems solved using Monte Carlo in Python programming language.

  1. A bit of history

Monte Carlo Simulation, as many other numerical methods, was invented before the advent of modern computers — it was developed during World War II — by two mathematicians: Stanisław Ulam and John von Neumann. At that time, they both were involved in the Manhattan project, and they came up with this technique to simulate a chain reaction in highly enriched uranium. Simply speaking they were simulating an atomic explosion.

Solving the “neutron diffusion” model was too complex to describe and to solve explicitly, especially keeping in mind they had only IBM punch-card machines or later a computer called ENIAC. During his stay in hospital Stanisław Ulam was killing boredom by playing cards and then a new idea struck him. After returning to work he shared his novel idea with a colleague from the laboratory John von Neumann. The development of a new solving method got a codename “Monte Carlo”. Method this was based on random sampling and statistics. Thanks to it both mathematicians were able to speed up the calculation process, make incredibly good predictions and deliver useful and highly needed at that time results to the project.

Stanisław Ulam (left) and John von Neumann (right) worked together during the Manhattan project; Images by AlexAntropov86 from Pixabay and from Wikimedia (credits: LANL) modified by author

While working in Los Alamos National Laboratory Stanisław Ulam published in 1949 the first unclassified document describing Monte Carlo Simulation.

2. Applications now

Currently, due to the ease of implementation and available high computing power this technique is widely used across various industries. Let us look on some documented use cases.

Health:

Finance:

Production:

Transport:

Engineering/Science:

3. Backbone of the Monte Carlo Method

The core concept behind the Monte Carlo Simulation is a multiple random sampling from a given set of probability distributions. These can be of any type, e.g.: normal, continuous, triangular, Beta, Gamma, you name it.

To use this technique, you have to follow four main steps:

  1. Identify all input components of the process and how do they interact e.g., do they sum up or subtract?
  2. Define parameters of the distributions.
  3. Sample from each of the distributions and integrate the results based on point 1.
  4. Repeat the process as many times as you want.

During the running of this simulation, your resultant parameter (e.g. cost or risk) will converge toward the Normal Distribution even the source distributions can be different. This is the effect of the Central Limit Theorem and that is one of the reasons why this technique became immensely popular in various industries.

4. Implementation in Python — basics

Monte Carlo Simulation can be easily implemented using any programming language. In this case we will use Python. NumPy library will be very handy here as it has multiple most popular probability distributions implemented. For example:

5. Example 1

Let’s assume we have a process constructed from 3 stages (X1, X2, X3). Each one has an average duration (5, 10 and 15 minutes) which vary following the normal distribution and we know their variance (all 1 minute).

Notation of the normal distribution with parameters
Process components; Image by author

We want to know what is the probability that the process will exceed 34 minutes?

This example is so trivial that it can be solved manually what we do later to validate the Monte Carlo result.

We know all the individual components so let’s define the relationship between them (it’s additive):

Now we can start coding. The single-component can be represented with a short function:

The Monte Carlo simulation code shown below uses this function as a basic block. The number of iterations for this use case is set at 10 000 but you can change it. The last section of a code checks the probability of exiting the limit of 34 minutes (once again it uses the sampling technique).

After running the following code we get the following answer but it will vary every time you run the code:

Probability of exceeding the time limit:  1.035 %

Now we can plot the histogram of the estimated parameter (time). We clearly see it follows the normal distribution.

Total time histogram; Image by author

Let’s verify our results with hand calculations.

As a result the total time follows the normal equation with parameters:

In order to calculate the probability we have to find the z-score first.

P-value we read now from the z-score table. A right tailed probability we calculate as:

As you remember our Monte Carlo simulation gave us the result of 1.05% which is pretty close.

6. Example 2

In this example let’s assume we want to assemble three blocks inside a container of a given width. Nominal dimensions are shown on the picture below. We see that by design there is a nominal gap of 0.5mm.

Three blocks inside a container; Image by author

However, the real dimensions of these three blocks and a container can vary due to technological reasons. For the sake of demonstration let’s assume that none of these variations follow the normal distribution. Three blocks will follow triangular distributions shown below and a container’s dimensions spread will follow an uniform distribution in a range of +/-0.1 mm.

Dimensions distributions for three blocks X1, X2 and X3; Image by author

Now, by simply calculating the extreme values we can see that in the worst scenario blocks have 17mm and a container has a width of only 16.4mm meaning, in this case, we cannot fit them all together.

The question is: what is the probability that we won’t be able to fit all the blocks into a container?

In this case relationships between blocks look like this:

By modifying the previous code we obtain a function to sample the triangular distribution. The same we can do to obtain the uniform distribution sampling function.

A modified core code for the MC simulation:

After running above code we get the answer in order of:

Probability of not fitting the blocks:  5.601 %

After checking the mean and standard deviation we can say even more about the expected gap dimension:

The mean gap width will be 0.33mm with a standard deviation of 0.2mm.

Now let’s plot the histogram of the estimated parameter (gap width) to show that it follows the normal distribution even though none of the input distribution is of this type.

Gap width histogram following the normal distribution; Image by author

You may now wonder how the result will vary with increase number of samples. To check that see the graph below showing the gap width estimation with 95% confidence interval, depending on the sample size (from 100 to 7000 samples):

An estimated gap width with 95% confidence interval; Image by author

From this graph it’s evident that the mean of the estimated value doesn’t change significantly but the spread decreases with the number of samples. This means that with a new run of the simulation bigger samples give you smaller results spread. However, at some point adding more samples doesn’t help anymore.

7. Summary

As you’ve seen Monte Carlo is basically a very simple idea yet very powerful. After reading this article I hope you understand the core concept of the this method, when to use it and how to implement it in Python programming language.

--

--

Machine Learning Engineer with a background in the Aerospace Industry www.linkedin.com/in/robertkwiatkowski01

Recommended from Medium

Lists

See more recommendations