Saturday 23 March 2024

Statistical Shenanigans

The story I will cover today is, in a sense, old news. It first broke about a month ago, in February. It is concerned with statistics: according to Mark Twain, the third of the three categories of lies. And to use a phrase, it gives me furiously to think.

What has happened is that the UK’s Office for National Statistics (ONS) has re-defined the way in which excess deaths are to be calculated. In a way that seems to have greatly reduced the resulting numbers, and thus the apparent size of the ongoing “excess deaths” problem. While breaking the link between their figures and hard evidence from the real world.

Background

Among those who follow health statistics, “excess deaths” has been a big topic of discussion recently. Andrew Bridgen MP, in particular, has brought to public attention that there have been recently, and still are, an awful lot of them. And that this has been going on for some time. Indeed, continuously since the start of the COVID pandemic in early 2020. Not just in the UK, but in many other countries too, both in Europe and outside.

Obviously, a lot of deaths in 2020 and 2021 were caused by COVID itself. But, as the epidemic has waned, the excess deaths have not stopped. In December 2023, deaths in the UK were still being reported for most weeks as significantly above the deaths at the same time of year, averaged over past years. And deaths over the years 2020-2023 as a whole were 9.62% higher than the average deaths per year over the pre-COVID period, 2015-2019.

Calculating excess deaths

The standard way of calculating excess deaths for a period of the year is (has been) as follows. Count the deaths recorded in the period of interest (usually a week, defined by particular start and end days) in the geographical area of interest (such as the UK as a whole). Take a suitable base period, usually 5 years long, and count the deaths recorded over the same period of the year during each of the years in the base period. Divide by the number of years in the base period to give an average. Subtract this average from the deaths recorded in the period, to give the total number of excess deaths in the period. This is usually expressed as a percentage increase (or decrease) relative to the base period.

There is a simple description of the method, on a page from the British Heart Foundation (BHF), here: [[1]]. But plainly, we are talking about the same calculation.

You can do this calculation on filtered data, too. If you have your raw data sufficiently well broken down, you can produce an excess deaths figure – for example – restricted to people aged 35-44, people in England only, people who died of a particular cause or causes, or people who were vaccinated against COVID (or not). If the population of such a cohort has changed significantly since the base period, you might need feel a need to re-express the figures in terms of deaths per hundred, thousand or million in the cohort, in order to allow a fairer comparison.

One vital thing to note about this process. All the data which go into it are real-world data. A death is (was) a death. It happened. Surely, the data may be incomplete as yet, or subject to later change (for example, if a coroner has still to determine the cause of death). But at some point, the data that goes into the calculation becomes definitive for all time. There are no models involved, no opportunities to massage the data, and no ambiguity in the results, except perhaps in how best to express them.

Base period changes

In principle, you can do this calculation for any base period for which you have data broken down in the same way as for the period of interest to you. In practice, the base period normally used is (has been) the last five years prior to the year you are considering. This base period (2015-2019) was indeed used for the 2020 figures. It was obviously not possible to use 2020 data as part of a base period, because the figures were perturbed by COVID. So, the base period 2015-2019 continued to be used for 2021.

It looks as though, for 2022 and 2023, the base period used was the six years prior to the year of interest, cutting out 2020. I am surprised that 2021 was not cut out as well as 2020. For the deaths due to COVID in England and Wales in 2021 (67,350, 11.5% of all deaths) [[2]] were very much comparable with those in 2020 (73,766, 12.1% of the total) [[3]]. COVID deaths in 2022 were small in comparison to these two years, but still not insignificant.

The ONS proposal

On February 20th, 2024, the ONS published a new proposal for how to calculate excess deaths. It is here: [[4]]. The document does not state the rationale for making the changes, but dives right into their main features.

Estimating expected deaths

Here is the top bullet point. “Trends in population size, ageing and mortality rates are accounted for by the new method for estimating the expected number of deaths used in the calculation of excess mortality (the difference between the actual and expected number of deaths); this is not the case for the current method, which uses a simple five-year average to estimate the number of expected deaths.”

My instant reaction to this was: What’s all this about ‘estimating’ an ‘expected’ number? As soon as an estimate raises its head, we are no longer in the real world of death certificates, doctors and coroners. My bullshit meter is triggered to, at the very least, “orange alert” status.

Now, I am not criticizing government for projecting future deaths. I expect such projections would be important to certain government departments, for example those who calculate budgets for state pensions. What I am criticizing is the substitution, for an “excess deaths” figure based entirely on measured data, of one which is calculated using a “statistical model.” If they felt they needed a new measure, that is one thing. But to give the new measure the same name as the old, particularly in an area which is attracting interest from MPs and from the general public, I see as – at best – a serious obfuscation.

Change in the mechanics

There is also a change in the mechanics of the calculations. “Individual weeks and months that were substantially affected by the immediate mortality impact of the coronavirus (COVID-19) pandemic are removed from the data when estimating expected deaths in subsequent periods…”

This acknowledges the problem I brought up above: why was 2021 not dropped from the base period also, when looking at 2022 onwards? It would be interesting to see what the effects would have been under the old method, if this had been done.

The new definition

The new approach amounts not just to a new way to calculate excess deaths, but to a re-definition of the concept of excess deaths. “Excess mortality is the difference between the observed number of deaths in a particular period and the number of deaths that would have been expected in that period, based on historical data.” In place of my understanding, which is: “Excess mortality is the difference between the observed number of deaths in a particular period in a year and the numbers of deaths recorded for that period in the year in the past, based on an average of historical data.”

Supporting blog post

The document links to a supporting blog post, here: [[5]]. This says: “The weakness of this [the old] approach is that it doesn’t take into account the ageing and growing population of the UK (all else being equal, more people means more deaths, particularly if a greater share of the population are elderly); nor does it reflect recent trends in population mortality rates, which were generally falling until 2011 before levelling off until the onset of the pandemic.”

In my view, the first is only a weakness if you are not “slicing and dicing” your data finely enough. Ageing can be taken into account by considering age group cohorts separately. And population growth can be dealt with by re-expressing the results in terms of percentages for each cohort. As to trends, the way you should look for them is to plot the raw data, then analyze any trends that may appear.

The article also says: “Importantly, this approach moves away from averages drawn from raw numbers…” Leaving unanswered the zillion dollar question: Why do this in the first place? My bullshit meter is now on full red. “Code Red for humanity,” ho ho.

The detail of the algorithm

The original post gives some details of the statistical model. They use a “quasi Poisson regression.” (A technique which, pun intended, is seen by some as a bit fishy). OK, they are trying to model many sets of data all at once, rather than having to do many different calculations for different cohorts, one by one. But I for one am still stuck at the starting-gate. Why did they feel a need to do any modelling at all?

Testing, testing…

Like good scientists, the ONS have tested their model against the previously published real-world-based numbers. They say: “On an annual basis, the new method estimates 76,412 excess deaths in the UK in 2020, compared with 84,064 estimated by the current method (Table 2) … In the latest year, 2023, the new method estimates 10,994 excess deaths in the UK, 20,448 fewer than the current method.”

That last is an amazing difference. The new method estimates only one-third of the excess deaths which, according to the old method, have already happened!

And remember, we are talking about 2023 here. In which, the deaths ascribed to COVID are minuscule compared with 2020 or 2021, and even small compared to 2022. A 20 or even 30 per cent difference I might have believed, allowing for aging and population growth. But nearly 200 per cent, no.

Anyone worthy of the name scientist always performs one key step of the scientific method before they publish their results. That is, to check the predicted consequences of their hypothesis against what is evident in the real world. And if the two are too far apart (bearing in mind appropriate confidence intervals), the hypothesis must be modified, or even scrapped.

I give kudos to the ONS for doing the check. But I would rate this result a failure. Even climate models can “hindcast” successfully to a certain extent. But this one can’t even get close to the figures for last year!

As an outside observer, I would conclude either that their model is not fit for purpose, or that its actual purpose is something other than what is being put forward. I smell rat.

The politics

There is, of course, an enormous amount of politicking going on in and around this area. The reactions which Andrew Bridgen received when he first brought up the subject in parliament suggest that someone, or some group, in a very high position does NOT want the full truth about excess deaths in the UK since 2020 brought out into the open.

Now, when I was young, I was told that there were three things you could always trust: the government, the police and the Post Office. Being a strongly evidence-based person, I never believed this. And my 70 years to date have provided strong evidence that I was right in my disbelief. I have seen government, again and again and again, lie, mislead, stonewall, obfuscate or try to suppress discussion when dealing with the people they are supposed to serve. “Three weeks to flatten the curve.” “Saddam Hussein has weapons of mass destruction.” “There’s a climate crisis.” To say there is no climate crisis is “denial,” “misinformation” or “conspiracy theory.” “The Post Office Horizon system is free from faults.” “The COVID vaccines are safe and effective.” There have been many more.

I have become sufficiently cynical, that my best guess is that this exercise was intended to suppress the truth over excess deaths, and so to sweep the whole issue under the carpet. My level of trust in government is now below zero. I regard anything they say as a lie or a misdirection, until I find hard evidence otherwise. I know I am not alone in such views.

Where did my data go?

So, it seems that the ONS are not going to publish proper excess deaths numbers for any dates after the end of 2023. The next thing, I thought was: what about other data sources?

Well, other UK government departments publish similar data. The Office for Health Improvements and Disparities (OHID), for example, has a page here: [[7]]. Here it is:

OHID’s excess deaths data is being nobbled, too. The page linked to by the green box talks about a change to a “post-pandemic method,” and was first published on – ahem – the same day as the ONS paper. This suggests this change is UK government wide, not just the ONS.

So, let’s try the Organization for Economic Co-operation and Development (OECD) [[8]]:

So, their old platform is about to be removed. Convenient, eh? I tried the new one, but it returned hundreds of results for “excess deaths,” the only even partly relevant looking one being “infant mortality.” For “mortality by week,” it returned over 11,000 results!

Strike two. So, let’s haul out the big guns.

The Our World in Data COVID data feed has never failed me yet (except when it switches columns around without warning). I got the latest file, and was a bit surprised to see that Our World in Data are now getting their excess mortality figures, not from national governments, but from the UN’s World Health Organization (WHO). When I looked at the contents, while there were records in the file for 2024, the excess mortality figures – for ALL countries – cut off precisely on December 31st, 2023. Before then, the more statistically savvy countries, including the UK, had been able to provide these figures on a weekly basis with a lag time of 3 to 4 weeks. Now, niks. For almost three months. Strike three.

So, the dog has eaten my data. It looks as if the entire world may have stopped providing any excess mortality figures which are founded on real-world evidence! Cynical me does not think this is a co-incidence.

For today, I shall refrain from saying any more.


No comments: