#MonteCarlo Simulation: An Introduction With An Objection To The Imperial #COVID Model

We have just spent £337bn on the COVID response in the UK.  Much of this was generated by the virus itself, but much of it was generated merely by the lockdown.  A primary motivation for the lockdown were the results of the Imperial model, which was a Monte Carlo simulation.  I explain here briefly what a Monte Carlo simulation is and bring out one objection (among many) to the Imperial approach.

Monte Carlo simulations exist to address a class of problems which do not have an analytical answer.  Imagine I am in the pub and my home is 40 paces away.  If I walk at two paces a second, I will arrive home in 20s.  That’s an analytical question which has an exact answer.  Here is a non-analytical question.  I have drunk a bottle of tequila in the pub.  The probability that I take a pace forward is only 50%; there is an equal probability that I take a pace backward.  This does not appear to be analytical.  You can’t say immediately what the probability is that I get home.

This is where Monte Carlo simulation comes in.  What you can do is simulate my journey home from the pub and see if I get home.  (It’s called Monte Carlo because the method is based on random numbers, as we will see next.)

Here’s a really simple Python script called Randwalk that simulates the walk home.  It’s called that because this is a random walk simulation.  This sort of thing might be familiar to you from Brownian motion.

You can see that all it does is roll a dice 100 times and check to see if the dice shows three or less.   That’s the 50/50 part.  If the dice does show three or less, I take a step back.  If the dice shows more than three, I take a step forward.  This is repeated 1000 times, meaning I take 1000 steps in this version.

This entire code consists of a conditional statement in a loop.  It’s extremely simple.

We can then plot the output and we will see something like the below.

As you can see, a jagged pattern is generated.  On this occasion, plenty of my steps were unfortunately in the wrong direction, and I never got home.  But that won’t happen every time.  Below is a simulation in which I was luckier, or maybe drank less tequila.

As you can see, here I was more than 40 paces north in this simulation.  So I got home.  (I haven’t bothered to clean up the code to stop running when I “arrive home” in order to keep it simple, but that could easily be done.)

So now we see how we can use Monte Carlo simulation to answer the original question.  What we need to do is run scenarios like the above a large number of times and see how often I get home.

Here is some only slightly more complicated Python called SimpMC which does that.

This just puts the whole of the previous code in another loop.  So we do the simulation multiple times — that variable is itot = 10 in the code above.  We then calculate the fraction of scenarios in which I get home.

This generates an answer of 0.2. But it is different every time I run it.  Sometimes I get 0.3 and sometimes 0.4.  That is happening because I have inadequate statistics.  So let’s set the run number to 100.

Now I get: 0.14, 0.17, 0.21, 0.19, 0.15.  Better but still not stable.  Let’s set the run number to 1000.

Now I get: 0.195, 0.191, 0.208, 0.192, 0.205.  That’s starting to get there.  I am clearly converging on a probability estimate here.  If I ran overnight, I would get a good answer.

Finally to the objection to the Imperial model.  Their code was observed to be unstable on multiple cores.  Their response to this was “it’s a stochastic model so it will give different answers each time.”  That response does not fly, as I will now explain.

Saying it is a stochastic model just means it uses this random number Monte Carlo approach.  However — that does not mean it should produce different outcomes when run on multiple cores.  It should not be unstable at all.  Bear in mind that the reported instability in the Imperial model is 80,000 deaths — i.e. merely the error bar in the Imperial result is larger than the current total number of COVID deaths! — and that should not happen.  To claim otherwise is to mix up the randomnesses.  (I just made that word up but that seems fine.)  For sure, we saw randomness in the randwalk code — but that was just one run.  When we did lots of runs in the SimpMC code, we started t0 converge.  We got the same result every time in other words when we did enough runs.  The Imperial model is getting different results each time you run a large number of scenarios through it with the same parameters.  That is equivalent to me getting different answers on the 1000 run version of SimpMC.  If that happens, it doesn’t mean I wrote a stochastic model.  It means I wrote a buggy model.  Imperial have potentially cost us a lot of money here.