We have just spent £337bn on the COVID response in the UK. Much of this was generated by the virus itself, but much of it was generated merely by the lockdown. A primary motivation for the lockdown were the results of the Imperial model, which was a Monte Carlo simulation. I explain here briefly what a Monte Carlo simulation is and bring out one objection (among many) to the Imperial approach.

Monte Carlo simulations exist to address a class of problems which do not have an analytical answer. Imagine I am in the pub and my home is 40 paces away. If I walk at two paces a second, I will arrive home in 20s. That’s an analytical question which has an exact answer. Here is a non-analytical question. I have drunk a bottle of tequila in the pub. The probability that I take a pace forward is only 50%; there is an equal probability that I take a pace backward. This does not appear to be analytical. You can’t say immediately what the probability is that I get home.

This is where Monte Carlo simulation comes in. What you can do is simulate my journey home from the pub and see if I get home. (It’s called Monte Carlo because the method is based on random numbers, as we will see next.)

Here’s a really simple Python script called Randwalk that simulates the walk home. It’s called that because this is a random walk simulation. This sort of thing might be familiar to you from Brownian motion.

You can see that all it does is roll a dice 100 times and check to see if the dice shows three or less. That’s the 50/50 part. If the dice does show three or less, I take a step back. If the dice shows more than three, I take a step forward. This is repeated 1000 times, meaning I take 1000 steps in this version.

This entire code consists of a conditional statement in a loop. It’s extremely simple.

We can then plot the output and we will see something like the below.

As you can see, a jagged pattern is generated. On this occasion, plenty of my steps were unfortunately in the wrong direction, and I never got home. But that won’t happen every time. Below is a simulation in which I was luckier, or maybe drank less tequila.

As you can see, here I was more than 40 paces north in this simulation. So I got home. (I haven’t bothered to clean up the code to stop running when I “arrive home” in order to keep it simple, but that could easily be done.)

So now we see how we can use Monte Carlo simulation to answer the original question. What we need to do is run scenarios like the above a large number of times and see how often I get home.

Here is some only slightly more complicated Python called SimpMC which does that.

This just puts the whole of the previous code in another loop. So we do the simulation multiple times — that variable is itot = 10 in the code above. We then calculate the fraction of scenarios in which I get home.

This generates an answer of 0.2. But it is different every time I run it. Sometimes I get 0.3 and sometimes 0.4. That is happening because I have inadequate statistics. So let’s set the run number to 100.

Now I get: 0.14, 0.17, 0.21, 0.19, 0.15. Better but still not stable. Let’s set the run number to 1000.

Now I get: 0.195, 0.191, 0.208, 0.192, 0.205. That’s starting to get there. I am clearly converging on a probability estimate here. If I ran overnight, I would get a good answer.

Finally to the objection to the Imperial model. Their code was observed to be unstable on multiple cores. Their response to this was “it’s a stochastic model so it will give different answers each time.” That response does not fly, as I will now explain.

Saying it is a stochastic model just means it uses this random number Monte Carlo approach. However — that does not mean it should produce different outcomes when run on multiple cores. It should not be unstable at all. Bear in mind that the reported instability in the Imperial model is 80,000 deaths — i.e. merely the error bar in the Imperial result is larger than the current total number of COVID deaths! — and that should not happen. To claim otherwise is to mix up the randomnesses. (I just made that word up but that seems fine.) For sure, we saw randomness in the randwalk code — but that was just one run. When we did lots of runs in the SimpMC code, we started t0 converge. We got the same result every time in other words when we did enough runs. The Imperial model is getting different results each time you run a large number of scenarios through it with the same parameters. That is equivalent to me getting different answers on the 1000 run version of SimpMC. If that happens, it doesn’t mean I wrote a stochastic model. It means I wrote a buggy model. Imperial have potentially cost us a lot of money here.

I have some criticisms of the use of that Imperial model, but there is a problem with the objection here. In your simple pub walk example, what happens when you increase the distance of your home from the pub? It requires a larger number of runs to converge on a stable answer about what fraction of the time you will make it home. If you lived 4000 paces away instead of 40, you would find it will take considerably longer to converge.

Now, the UK population is 65 million, but let’s say it is sufficient to model only one city of about 1 million and scale up the expectations appropriately. You will need to generate a very long sequence of random numbers to sample that very large phase space. It could potentially be many times the population size. A single run will therefore 1) not contribute much statistical significance and 2) take a long time because random number generation is expensive. You could run that model for a very long time to get the statistical uncertainty below an arbitrarily small number, but 10% or so is probably ok if that’s what you get from a few days of running. This is especially true if you are trying to rush results out and can’t wait weeks to get the uncertainty down to 1% or whatever. 80,000 out of a prediction of 500,000 is ok – the main thing is you *know* what your statistical precision is.

People have got the wrong end of the stick with the multi-core thing. Multi-threaded programming is quite difficult and not to be embarked upon lightly by amateur coders like the Imperial group, but that is an orthogonal issue to the stochasticity. Running in a multi-threaded mode will change the random number sequence, and therefore even with the same random seed you are likely to get a different sampling point.

I think the model was probably badly coded, and was used inappropriately, but the statistical uncertainty of 80,000 is not the smoking gun here. There are much worse practices going on I think.

LikeLike

That’s a fair objection about distance. But I was running on a laptop. IC presumably had more or could have asked for more CPU power.

Ok on your stats point.

I guess I am using the 80,000 error as an illustration of the problems rather than as a smoking gun. My main objection is to the IC response to the Edinburgh objection “your code is unstable on multiple cores.” The IC response was “it’s going to be unstable because it’s stochastic.” Objecting to that response is my only objective in this post.

LikeLike

How do we know the Imperial Model is pure Monte Carlo? it would seem just as likely that it has elements akin to atmospheric modelling – with the latter you can do hundreds/thousands of runs and you don’t exactly converge on an answer – there are often outliers, there’s that whole fractal dimension to this (butterfly effect) and the modellers/forecasters have years of experience to know when the model is throwing out nonsense. Not sure what the crashes on multiple processors means either – as I said on FB (I guess you don’t read the comments there) – need to see the link to a couple of your statements.

As for Monte Carlo, was that Geant? I remember that fucking thing – ultimately just a big 3D lump of probability innit? Doubt epidemiological modelling is like that.

LikeLike

Ok we don’t know that. But the Imperial response to the Edinburgh Objection was that “the model is unstable because it is stochastic.” They did not say “the model is unstable because it is like atmospheric modelling.” They also did not say “ you need years of experience to use this model.“ Geant was indeed a Monte Carlo program. If you put the same starting seeds in it, you would always get the same answers. Exactly. And you would get very similar answers if you change the seeds. Not getting that was a sign that you’d written code which was producing results driven by bugs and not by randomness. My post is aimed solely at refuting the Imperial claim that Monte Carlo explains instability.

LikeLike

Yeh, but you are still equating ‘stochastic’ with Monte Carlo, which is not valid – stochastic has a much wider general meaning – including things like atmospheric models which have nothing to do with Monte Carlo.

No beef with Geant, I’m sure it’s very mature, just that it was a great big black box and quite unsatisfying (to me) as a research technique.

LikeLike

OK then maybe Imperial should have said “our model is unstable because it is stochastic but not stochastic like a Monte Carlo, it is stochastic like an atmospheric model which produces utterly different output every time you run it even with the same parameters but it is still a good basis for deciding to incur expenditure of £337bn.”

The code is available so you can look at it and decide whether you think it is a Monte Carlo or an atmospheric model.

LikeLike

I would bet a considerable amount of money – perhaps even £375bn – that GEANT4 is a much higher quality piece of code, with much stronger testing and validation practices, than that Imperial epidemiological model.

That is the real problem with their model. What testing have they done? I don’t need a complex model to tell me that if there is a disease that kills roughly 1% of those it infects, 100% of the population is susceptible to catching it, and it is spreading fast so that well over 50% of the population will catch it then we are looking at north of 300,000 deaths in the UK alone. The question I need to ask though is what are the chances that those input assumptions about a new and unstudied disease are true? How do I know they are true? A model cannot tell me that. Well, it has told me that too late because the fact the model has deviated so badly from reality tells me that at least one of those assumptions is false, but we don’t know which one. This is why it was inappropriate to appeal to that model in the first place – they should instead have appealed to the intrinsic uncertainty and risk involved with a new a dangerous disease. Then the lockdown in the UK would have felt very different I think, perhaps more like here in Denmark.

Where a model would be useful is – if you trust that you actually understand the parameters and behaviour of the disease – to guide you in which policies are effective. Is closing schools more or less important that banning large gatherings? Do we need to worry about people meeting in parks? To trust those sorts of details though you need to thoroughly validate the model against data, which they don’t have because it’s a new disease. So not only is it wrong, it is also largely useless. At the very least they could have sought out data on epidemics of other coronaviruses (there is some) to see if it models that correctly. The bad coding practices make bugs and mistakes more likely, but even well coded models have bugs, and this is why you need to validate and test. Since there was no data on the behaviour of SARSCoV2, they can’t have tested it.

LikeLike

I fully agree with all of this.

The fact that the model had 450 parameters is already enough to tell me it was wrong before we even get to the discussion as to whether we know any of those parameters — including the ludicrous “probability that a key worker lives with another key worker.”

Even had we done Denmark lockdown only in the UK, it would have been infinitely easier for those of us like myself who have currently seen zero people in three months. And the economic effects will be catastrophic for health for decades.

LikeLike

I wrote about this before the UK’s lockdown began, and unlike certain SPADs I have not edited it after the fact. I think my last paragraph in particular is quite prescient

https://twoev.wordpress.com/2020/03/19/modelling-the-invisible-part-of-the-cross-section/

If anything my main error was only directing this at amateur modellers who were all over the blogs back then, instead of being more critical of the professionals as well

LikeLike