Monte Carlo Simulation: An Introduction With An Objection To The Imperial COVID Model


We have just far spent £337bn on the COVID response in the UK. This is reported by the FT at: The virus itself generated much of this spend, but much of it was generated merely by the lockdown.  The results of the Imperial model were a primary motivation for lockdown. That model was a Monte Carlo Simulation.  I explain here briefly what a Monte Carlo simulation is and bring out one objection (among many) to the Imperial approach.

What is Monte Carlo Simulation?

Monte Carlo simulations exist to address a class of problems which do not have an analytical answer.  Imagine I am in the pub and my home is 40 paces away.  If I walk at two paces a second, I will arrive home in 20s.  That’s an analytical question which has an exact answer.  Here is a non-analytical question.  I have drunk a bottle of tequila in the pub.  The probability that I take a pace forward is only 50%; there is an equal probability that I take a pace backward.  This does not appear to be analytical.  You can’t say immediately what the probability is that I get home.

This is where Monte Carlo Simulation comes in.  What you can do is simulate my journey home from the pub and see if I get home.  (It’s called Monte Carlo because the method is based on random numbers, as we will see next.)

Sample Python Code

A very simple Monte Carlo

Here’s a really simple Python script called Randwalk that simulates the walk home.  It’s called that because this is a random walk simulation.  This sort of thing might be familiar to you from Brownian motion.

You can see that all it does is roll a dice 100 times and check to see if the dice shows three or less.   That’s the 50/50 part.  If the dice does show three or less, I take a step back.  If the dice shows more than three, I take a step forward.  This is repeated 1000 times, meaning I take 1000 steps in this version.

This entire code consists of a conditional statement in a loop.  It’s extremely simple.

Output from Simple Python

We can then plot the output and we will see something like the below.

Output from a random walk simulation

As you can see, a jagged pattern is generated.  On this occasion, plenty of my steps were unfortunately in the wrong direction, and I never got home.  But that won’t happen every time.  I was luckier in the run below, or maybe I drank less tequila.

Random walk output in a better scenario

As you can see, here I was more than 40 paces north in this simulation.  So I got home.  (I haven’t bothered to clean up the code to stop running when I “arrive home” in order to keep it simple, but that could easily be done.)

Using Monte Carlo on the Random Walk Problem

So now we see how we can use Monte Carlo Simulation to answer the original question.  What we need to do is run scenarios like the above a large number of times and see how often I get home.

A more complex piece of Python to loop over scenarios

Here is some only slightly more complicated Python called SimpMC which does that.

This just puts the whole of the previous code in another loop.  So we do the simulation multiple times — that variable is itot = 10 in the code above.  We then calculate the fraction of scenarios in which I get home.

Monte Carlo Results

This generates an answer of 0.2. But it is different every time I run it.  Sometimes I get 0.3 and sometimes 0.4.  That is happening because I have inadequate statistics.  So let’s set the run number to 100.

Now I get: 0.14, 0.17, 0.21, 0.19, 0.15.  Better but still not stable.  Let’s set the run number to 1000.

Now I get: 0.195, 0.191, 0.208, 0.192, 0.205.  That’s starting to get there.  I am clearly converging on a probability estimate here.  If I ran overnight, I would get a good answer.

Why is this an Objection to the Imperial Model

Finally to the objection to the Imperial model.  Their code was unstable on multiple cores.  Their response to this was “it’s a stochastic model so it will give different answers each time.”  That response does not fly, as I will now explain.

Saying it is a stochastic model just means it uses this random number Monte Carlo approach.  However — that does not mean it should produce different outcomes when run on multiple cores.  It should not be unstable at all.  The reported instability in the Imperial model is 80,000 deaths. This means that merely the error bar in the Imperial result is larger than the current total number of COVID deaths! — and that should not happen.  To claim otherwise is to mix up the randomnesses.  (I just made that word up but that seems fine.)

For sure, we saw randomness in the randwalk code — but that was just one run.  When we did lots of runs in the SimpMC code, we started t0 converge.  We got the same result every time in other words when we did enough runs.  The Imperial model produces different results each time you run a large number of scenarios through it with the same parameters.  That is equivalent to me getting different answers on the 1000 run version of SimpMC.  If that happens, it doesn’t mean I wrote a stochastic model.  It means I wrote a buggy model.  Imperial have potentially cost us a lot of money here.

By Tim Short

I am a former investment banking and securitisation specialist, having spent nearly a decade on the trading floor of several international investment banks. Throughout my career, I worked closely with syndicate/traders in order to establish the types of paper which would trade well and gained significant and broad experience in financial markets.
Many people have trading experience similar to the above. What marks me out is what I did next. I decided to pursue my interest in philosophy at Doctoral level, specialising in the psychology of how we predict and explain the behaviour of others, and in particular, the errors or biases we are prone to in that process. I have used my experience to write The Psychology of Successful Trading. In this book, I combine the above experience and knowledge to show how biases can lead to inaccurate predictions of the behaviour of other market participants, and how remedying those biases can lead to better predictions and major profits. Learn more on the About Me page.

9 replies on “Monte Carlo Simulation: An Introduction With An Objection To The Imperial COVID Model”

I have some criticisms of the use of that Imperial model, but there is a problem with the objection here. In your simple pub walk example, what happens when you increase the distance of your home from the pub? It requires a larger number of runs to converge on a stable answer about what fraction of the time you will make it home. If you lived 4000 paces away instead of 40, you would find it will take considerably longer to converge.

Now, the UK population is 65 million, but let’s say it is sufficient to model only one city of about 1 million and scale up the expectations appropriately. You will need to generate a very long sequence of random numbers to sample that very large phase space. It could potentially be many times the population size. A single run will therefore 1) not contribute much statistical significance and 2) take a long time because random number generation is expensive. You could run that model for a very long time to get the statistical uncertainty below an arbitrarily small number, but 10% or so is probably ok if that’s what you get from a few days of running. This is especially true if you are trying to rush results out and can’t wait weeks to get the uncertainty down to 1% or whatever. 80,000 out of a prediction of 500,000 is ok – the main thing is you *know* what your statistical precision is.

People have got the wrong end of the stick with the multi-core thing. Multi-threaded programming is quite difficult and not to be embarked upon lightly by amateur coders like the Imperial group, but that is an orthogonal issue to the stochasticity. Running in a multi-threaded mode will change the random number sequence, and therefore even with the same random seed you are likely to get a different sampling point.

I think the model was probably badly coded, and was used inappropriately, but the statistical uncertainty of 80,000 is not the smoking gun here. There are much worse practices going on I think.

That’s a fair objection about distance. But I was running on a laptop. IC presumably had more or could have asked for more CPU power.

Ok on your stats point.

I guess I am using the 80,000 error as an illustration of the problems rather than as a smoking gun. My main objection is to the IC response to the Edinburgh objection “your code is unstable on multiple cores.” The IC response was “it’s going to be unstable because it’s stochastic.” Objecting to that response is my only objective in this post.

How do we know the Imperial Model is pure Monte Carlo? it would seem just as likely that it has elements akin to atmospheric modelling – with the latter you can do hundreds/thousands of runs and you don’t exactly converge on an answer – there are often outliers, there’s that whole fractal dimension to this (butterfly effect) and the modellers/forecasters have years of experience to know when the model is throwing out nonsense. Not sure what the crashes on multiple processors means either – as I said on FB (I guess you don’t read the comments there) – need to see the link to a couple of your statements.

As for Monte Carlo, was that Geant? I remember that fucking thing – ultimately just a big 3D lump of probability innit? Doubt epidemiological modelling is like that.

Ok we don’t know that. But the Imperial response to the Edinburgh Objection was that “the model is unstable because it is stochastic.” They did not say “the model is unstable because it is like atmospheric modelling.” They also did not say “ you need years of experience to use this model.“ Geant was indeed a Monte Carlo program. If you put the same starting seeds in it, you would always get the same answers. Exactly. And you would get very similar answers if you change the seeds. Not getting that was a sign that you’d written code which was producing results driven by bugs and not by randomness. My post is aimed solely at refuting the Imperial claim that Monte Carlo explains instability.

Yeh, but you are still equating ‘stochastic’ with Monte Carlo, which is not valid – stochastic has a much wider general meaning – including things like atmospheric models which have nothing to do with Monte Carlo.

No beef with Geant, I’m sure it’s very mature, just that it was a great big black box and quite unsatisfying (to me) as a research technique.

OK then maybe Imperial should have said “our model is unstable because it is stochastic but not stochastic like a Monte Carlo, it is stochastic like an atmospheric model which produces utterly different output every time you run it even with the same parameters but it is still a good basis for deciding to incur expenditure of £337bn.”

The code is available so you can look at it and decide whether you think it is a Monte Carlo or an atmospheric model.

I would bet a considerable amount of money – perhaps even £375bn – that GEANT4 is a much higher quality piece of code, with much stronger testing and validation practices, than that Imperial epidemiological model.

That is the real problem with their model. What testing have they done? I don’t need a complex model to tell me that if there is a disease that kills roughly 1% of those it infects, 100% of the population is susceptible to catching it, and it is spreading fast so that well over 50% of the population will catch it then we are looking at north of 300,000 deaths in the UK alone. The question I need to ask though is what are the chances that those input assumptions about a new and unstudied disease are true? How do I know they are true? A model cannot tell me that. Well, it has told me that too late because the fact the model has deviated so badly from reality tells me that at least one of those assumptions is false, but we don’t know which one. This is why it was inappropriate to appeal to that model in the first place – they should instead have appealed to the intrinsic uncertainty and risk involved with a new a dangerous disease. Then the lockdown in the UK would have felt very different I think, perhaps more like here in Denmark.

Where a model would be useful is – if you trust that you actually understand the parameters and behaviour of the disease – to guide you in which policies are effective. Is closing schools more or less important that banning large gatherings? Do we need to worry about people meeting in parks? To trust those sorts of details though you need to thoroughly validate the model against data, which they don’t have because it’s a new disease. So not only is it wrong, it is also largely useless. At the very least they could have sought out data on epidemics of other coronaviruses (there is some) to see if it models that correctly. The bad coding practices make bugs and mistakes more likely, but even well coded models have bugs, and this is why you need to validate and test. Since there was no data on the behaviour of SARSCoV2, they can’t have tested it.

I fully agree with all of this.

The fact that the model had 450 parameters is already enough to tell me it was wrong before we even get to the discussion as to whether we know any of those parameters — including the ludicrous “probability that a key worker lives with another key worker.”

Even had we done Denmark lockdown only in the UK, it would have been infinitely easier for those of us like myself who have currently seen zero people in three months. And the economic effects will be catastrophic for health for decades.

Leave a Reply