Everyone wants to know what drives stock performance. It seems intuitive that over the long-term, stocks should correlate with GDP — but some have questioned this. In this article, I will aim to show that if you look over a long enough time period and don’t cut the data, that stocks are indeed correlated with GDP.
The first issue when seeking to examine the correlation between two variables is determining the optimal dataset to examine. This is highly contentious since it is quite likely that spurious correlations can appear in datasets that have been excessively manipulated either by cherry picking the date range or by eliminating outliers. For that reason, I will select the largest reasonable dataset and do nothing to it.
I think the earliest reasonable starting point is somewhere around 1950. Much before then, and one is looking at periods where the economy is so different to that of the current day, that it would be inappropriate to try to draw any conclusions. In addition, it is not that helpful to produce a model going further back and have to say, when asked about a prediction, that “this is what happens to stock performance when Hitler invades Poland…”
Here is a plot showing US GDP since 1950
This immediately looks like some kind of power law growth with a kink for the global financial crisis. (We don’t really have enough data to look at COVID properly yet, but I expect it will produce another kink and not really disturb the trend line.)
This second plot illustrates stocks performance — it is the S&P 500 index over the same timescale.
Here’s a brief aside about polynomials. This is skippable if you don’t care what curves I am going to fit to the above two lines. Polynomials are curves of the form y = ax^3 + bx^2 + cx + d. The order of the polynomial is how many of the a, b, c, d coefficients are non-zero. So a first order polynomial is just a straight line of the usual form y = mx +c. The reason this matters is that we do not want to overfit or underfit.
If we underfit, we fail to capture information in the curve we are modelling. If we overfit, we get all the information in the curve, but we may add some features that are not really there. The best way to check for that is to see what happens if you extrapolate the curves beyond your training data. If you have overfitted, it will often be the case that your predictions go insane as soon as you are off-piste — i.e. outside the training dataset.
Here is an illustration of what under and overfitting might look like.
Fitting a third order polynomial to US GDP gives me the following plot.
If you want to know the equation of that curve, it is :-5.265e-11 x + 4.351e-05 x – 0.2643 x + 414.1
If you think that the very small cubic coefficient means I don’t need third order, you are basically right but again I don’t think it matters as long as the curve extrapolates reasonably.
Let’s return to stock performance. Conducting similar manipulations on the S&P500 (recalibrating to days since the start of 1950 and setting the initial value to zero) gives me a different fitted curve, as below.
Note that in both cases, there are around 25000 days between the start of 1950 and today. (70 * 365 = 25,550).
Here, the equation of the fitted line is: 2.726e-10 x – 3.87e-06 x + 0.02366 x – 10.68
Again, the cubic coefficient is very small. The notable point about this curve is that it is very noisy. But it still follows a clear trend line.
Now we get to the dangerous part. To look at stock performance long-term, we need to see if these curves behave once we extrapolate them. Let’s look at what they do if we double the day count to 50,000. (This is equivalent to making the date range double — so we are currently 70 years on from 1950; 70 further years added on takes us to 2090.)
There are two reasons this is dangerous. As I said, if the curves blow up, we have achieved nothing. The other issue is that you can’t extrapolate power laws forever. I will now discuss a brief example of that not working.
You will recall that in the early stages of COVID, people were plotting case counts and fitting exponentials to them. That looked like it was panning out to begin with. But the plot below shows what happens if you fit an exponential to the Florida case count as of yesterday.
So what happens if we extrapolate the curves we fitted? This is shown in the plot below.
This shows firstly and importantly that the curves continue to behave out to 50,000 days.
We can conclude that if the S&P500 and US GDP continue to develop as they have done, then by the year 2090, US GDP will have reached USD $90tn and the S&P 500 will have reached 26,000. That would be some great stock performance!
There are various conclusions that I have not argued for. I have said nothing about other countries. The above analysis would definitely not work for the FTSE-100 because that does not grow exponentially. It seems rather to exhibit a large saw tooth oscillation between 4000 and 7000 with a period of about a decade. That won’t correlate with anything. Similar points apply in Japan.
Secondly, I have not shown what happens if you try to fit curves to only more recent data. They are very noisy and that explains why a lot of analysis does not show any correlation.
Thirdly, this does not really backup Portnoy when he says “stocks always go up.” Partly that is the case because of the huge noise in the S&P 500 curve. Partly it is the case because you might have to wait a long time. Partly it is the case because the curve only shows that the S&P 500 always goes up. But the trend is your friend here if you wait long enough.