My friend Joshua Jay, who is one of the world’s top magicians, emails me from time to time with math questions. Sometimes they’re about card tricks, sometimes other things. Last night he sent me an excellent question about COVID-19, and I imagine that many others have wondered about this too. So I thought I’d share my response, in case it’s helpful to anyone.
JJ: Since the government is predicting between 100k – 240k deaths from COVID-19, let’s for argument’s sake split the difference and call it 170k projected deaths. They’re ALSO telling us they believe the deaths will “peak” something like April 20th. Am I wrong in assuming, then, that if we assume 170k total deaths, and the halfway point is a mere two weeks away, then they’re projecting 85k deaths before (and after) April 20th?
When I start to think about the idea of of 85k deaths between now and April 20th, and we’ve only experienced 5k so far, it means that 80k people are projected to die in the next two weeks. Surely that can’t be correct, or else it would be dominating the news cycle, right?
I’m not asking whether you think those projections are accurate… I’m just trying to wrap my head around the relationship between total projected deaths (whatever it is) and the projected peak of the curve.
MB: Excellent question! I’m glad you’re not asking about specific projections, because I have little insight into that — there are so many unknowns due to lack of testing, unpredictable government and community responses, limited hospital capacity, etc. that predicting the number of deaths even with a fairly large margin of error is very difficult. But what you’re asking is: let’s fix a particular model — without worrying too much about how accurate it is — and try to estimate cumulative deaths as a function of time. In particular, how does this behave in terms of where the “peak” happens?
The basic observation which I think is relevant here is that the the cumulative number of deaths as a function of time is the integral of the expected number of deaths as a function of time. If you’re rusty on your calculus, this means that if f(t) is the number of deaths at time t, plotted as a curve, then the cumulative number of deaths from time a to time b is the area under this curve between time a and time b. At a heuristic level, then, your question comes down to how ‘symmetric’ the curve is. If it were a perfectly symmetric bell-type curve, then we’d expect the number of deaths before the peak to be roughly the same as the number of deaths after the peak, and your 50% heuristic would be correct. Is that actually the case, though?
Well, that depends very much on the actual model, and I don’t think there’s a simple answer. But based on the graphs and projections I’ve seen, I’d say that most models are skewed to the right of the peak, meaning that the number of deaths after the peak is bigger than the number of deaths before.
For example, let’s take a look at https://covid19.healthdata.org/
Here’s a snapshot of their projections for the number of U.S. deaths (they project 93,500 by August 4th, so that’s actually on the low end of what the CDC has been saying lately):
With a quick visual glance, you can see that the curve given by the dashed line is not symmetric. If you integrate this function, here’s what you get:
In the first graph, the peak occurs on April 16th with 2,644 deaths on that day. The cumulative number of deaths by that date is projected to be 35,303. So in this model, only about 37.7% of the total deaths between now and August 4th will occur on or before the peak.
The shaded region in these graphs represents the uncertainty of the model. If you look at the upper border of the shaded region, it represents about 180,000 total deaths by Aug. 4, which is closer to the particular estimate you were asking about. In that model, the peak occurs on April 21st and the cumulative number of deaths as of April 21st is roughly 67,000 (which is still about 37% of the Aug. 4th number).
So if we assume 170k total deaths, as in your email, then we shouldn’t take 50% of that to see how many will die by April 20th, but rather around 37%. That gives about 63,000 deaths by April 20th. Since we’ve had about 6k deaths so far (there were 1000 deaths yesterday so your 5k number is already outdated), that means 57k U.S. deaths in the next 17 days for the specific estimate that you’re asking about. Better than 80k, but still incredibly grim…
Here are a couple of other mathematical thoughts about COVID-19 which I thought I’d share.
So if epidemiological models don’t give us certainty—and asking them to do so would be a big mistake—what good are they? Epidemiology gives us something more important: agency to identify and calibrate our actions with the goal of shaping our future. We can do this by pruning catastrophic branches of a tree of possibilities that lies before us.
Epidemiological models have “tails”—the extreme ends of the probability spectrum. They’re called tails because, visually, they are the parts of the graph that taper into the distance. Think of those tails as branches in a decision tree. In most scenarios, we end up somewhere in the middle of the tree—the big bulge of highly probable outcomes—but there are a few branches on the far right and the far left that represent fairly optimistic and fairly pessimistic, but less likely, outcomes. An optimistic tail projection for the COVID-19 pandemic is that a lot of people might have already been infected and recovered, and are now immune, meaning we are putting ourselves through a too-intense quarantine. Some people have floated that as a likely scenario, and they are not crazy: This is indeed a possibility, especially given that our testing isn’t widespread enough to know. The other tail includes the catastrophic possibilities, like tens of millions of people dying, as in the 1918 flu or HIV/AIDS pandemic.
The most important function of epidemiological models is as a simulation, a way to see our potential futures ahead of time, and how that interacts with the choices we make today. With COVID-19 models, we have one simple, urgent goal: to ignore all the optimistic branches and that thick trunk in the middle representing the most likely outcomes. Instead, we need to focus on the branches representing the worst outcomes, and prune them with all our might. Social isolation reduces transmission, and slows the spread of the disease. In doing so, it chops off branches that represent some of the worst futures. Contact tracing catches people before they infect others, pruning more branches that represent unchecked catastrophes…
Sometimes, when we succeed in chopping off the end of the pessimistic tail, it looks like we overreacted. A near miss can make a model look false. But that’s not always what happened. It just means we won. And that’s why we model.
2. My friend Janos Csirik shared this essay with me on how to ‘calibrate’ the risk posed by COVID-19. Again, I encourage you to read the whole thing. The upshot seems to be:
The chance of dying from COVID-19, conditional on contracting the virus, is roughly equal to the chance of dying from any cause over the next year, conditional on not contracting the virus.
(Note added: A slightly simplified but easier to understand version of this statement is that if you get COVID-19, your chance of dying within the next year roughly doubles.)
The striking observation is that this relationship seems to hold uniformly for more or less every age group.