Radoslav Harman, my former statistics professor (and also one of my favorite people), is working on a stochastic model of COVID19 in Slovakia. Below is a picture from my simulation of the model. Red/orange line is the actual number of new positive cases each day – the real data. Blue dots represent outcomes of the simulation, the darker the more frequent. The picture is made of out 200 runs for one particular setting of free parameters b_{0}, α and prefix (their meaning and exact values aren’t important for now).
Simulation of daily new COVID19 cases 
On March 21, 41 people tested positive out of 440. The previous day (March 20), 13 out of 367 tested positive. With only only 20% more tests on March 21 we got 3 times more (41 versus 13) positive results! The model and my simulation predicted about 10 to 30 positive tests from March 20 to March 22. The simulation is clearly off around this time – it overshoots on March 20 and 22 and undershoots on March 21. How could we fix it? My suspicion is that the population tested on March 21 was qualitatively different, for example it could have contained a big group of people returning from abroad. Actually, this is exactly what happened on April 5, when a group of 35 people returning from Austria tested positive. April 5 is the second to last day in the graph and for this day the model underestimated the number of positive new cases by a huge margin.
The result of the model could be improved by providing information about where each tested person traveled and who they interacted with in the last few days (and this is exactly the type of information that needs to be collected for life to get back to normal).
All models are wrong, but some are useful. – George BoxThe model is wrong about daily new cases, but actually being right about daily new cases is not its goal. It’s supposed to describe the behavior of the epidemic in the long term. The following graph is similar to the previous one, but contains cumulative/total cases instead. Also, it contains two more days. Again, the red line is the reality while the simulation is blue.
Simulation of total COVID19 cases 
Is such a model useful? Sure, it could be more precise, but at least it gives a range and that’s a start. I’d say that it’s fairly useful. And as always, we need more and better data.
Exponentialgrowth models no longer useful
Exponential growth has been mentioned a lot regarding epidemics and the spread of COVID19. However, in countries implementing social distancing, exponentialgrowth models are so far from reality that they are no longer useful. And compared to the model above, they cannot be improved using more and better data.Exponential functions are straight lines on semilog graphs, so the total number of cases should form a straight line on a semilog graph. Below is a picture of total cases in Italy on a semilog graph together with the exponential function 220 ⋅ 1.145^{t}, where t is the number of days since the 200th case.


According to the paper Fractal kinetics of COVID19 pandemic by Ziff and Ziff published in February 2020, the growth of active cases in China was much slower than exponential – roughly 0.0854 ⋅ t^{3.09} ⋅ e^{ − t/8.9}. Notice the exponent of 3.09 being almost the same as 3 for Italy, though the formula plotted above for Italy didn’t have exponential decay. There is another important difference – the two graphs for Italy contain total confirmed cases while Ziff and Ziff quantify active cases. In particular, active cases can decrease while total confirmed cases never decrease, since it’s the number of people who tested positive.
A few weeks ago, Slovak mathematicians Katarína Boďová and Richard Kollár looked at multiple countries using the same principles presented by Ziff and Ziff, and made a generalized formula for the number of active cases N(t) on day t (t = 1 is chosen as the first day with 200 active cases).
N(t) = (A/T_{G}) ⋅ (t/T_{G})^{6.23} / e^{t/TG}
 T_{G} is a countryspecific parameter – the average speed in days in which “the sick are removed from the system” (to stop spreading the disease).
 A is a scaling constant.
 6.23 is not just some constant that fits the data but a root of an important equation according to the authors. We’ll have to wait for the manuscript to know more details.
Country  T_{G}  A  Max. cases  Max. date 

USA  10.2  72329  1 241 389  20200508 
Spain  6.4  3665  99 978  20200412 
Italy  7.8  4417  99 459  20200412 
Germany  6.7  3773  98 038  20200414 
United Kingdom  7.2  2719  66 082  20200421 
France  6.5  1961  53 060  20200412 
Iran  8.7  2569  51 773  20200421 
Scoring the predictions
On 20200410, 11 days after the predictions, I’ve looked at the data to see how did the predictions do. However, I’ve disqualified France, since they first withheld information for multiple days and then reported all of it on 20200404, making it unfair to anyone predicting trends. Slate Star Codex questions Iran’s data, but I haven’t investigated that, so I keep my rating for now.Country  Subjective rating 

Italy  ★★★★★ 
United Kingdom  ★★★★★ 
Spain  ★★★★★ 
Germany  ★★★★ 
USA  ★★★★ 
Iran  ★★ 
Italy
The prediction (dashed line) is spoton. Note that the green zone marks the data available at the day of prediction, so until March 29.Active cases in Italy until 20200409 together with the predicted trend 
Spain
The original prediction (upper dashed line) was a bit pessimistic, but another curve with a slightly lower T_{G} of 6.2 (versus the original 6.4) fits the data really well so far.Active cases in Spain until 20200409 together with the predicted trends 
Germany
Germany, like Spain, at first looked like having lower T_{G} of 6.3, but recently their patients are recovering really fast and the number of new cases is steady. It might mean that their T_{G} got even lower but it could also mean that there is something wrong about the model.Active cases in Germany until 20200409 together with the predicted trends 
USA
USA had the most pessimistic prediction and it’s good that the number of cases is smaller than predicted. I scored it highly (★★★★), because keeping the predicted T_{G} but lowering the value of A by 15% is in line with the trend. However, USA is still before the first inflection point, so it’s too early to make any confident judgements and I’m not confident about my exact score either.Active cases in USA until 20200409 together with the predicted trend 
For the sake of brevity I skipped the two remaining countries, but you can check their predictions on a web dashboard. The data is updated daily.
Discussion
The classic exponentialgrowth models have a key assumption that infected and uninfected people are randomly mixing: every day, you go to the train station or grocery store where you happily exchange germs with other random people. This assumption is not true now that most countries implemented control measures such as social distancing or contact tracing with quarantine.You might have heard of the term six degrees of separation, that any two people in the world are connected to each other via at most 6 social connections. In a highly connected world, germs need also very short humantohuman transmission chains before infecting a high proportion of the population. The average length of transmission chains is inversely proportional to the parameter R_{0} (which you probably heard of).
When strict measures are implemented, the random mixing of infected with uninfected crucial for exponential growth is almost nonexistent. For example with social distancing, the average length of humantohuman transmission chains needed to infect high proportion of the population is now orders of magnitude bigger. It seems like the value of R_{0} is decreasing rapidly with time, since you are meeting the same people over and over instead of random strangers. The few social contacts are most likely the ones who infected you, so there’s almost no one new that you can infect. Similarly for contact tracing and quarantine – it’s really hard to meet an infected person when these are quickly quarantined.
The formula N(t) = (A/T_{G}) ⋅ (t/T_{G})^{6.23} ⋅ e^{ − t/TG} has two free parameters A and T_{G}. One objection to the finding might be that with two parameters you can create many different functions, making fitting arbitrary curves easy. However, a simple analysis shows that A and T_{G} only scale the graph of the function vertically and horizontally. The observation is left as an exercise to the reader. As a hint, here’s a picture of three functions t^{6.23}/e^{t}, (t/2)^{6.23}/e^{t/2} and 2t^{6.23}/e^{t}.
Further reading
 Richard Kollár on Facebook, he regularly posts updates about their research.
 Fractal kinetics of COVID19 pandemic by Ziff and Ziff
 Polynomial growth in agedependent branching processes with diverging reproductive number by Alexei Vazquez.
 Martin Niepel’s notes (in Slovak only)
 Martin Niepel’s lecture (in Slovak only)
 Richard Kollár on national TV (in Slovak only)
Back to the first model
The model of Radoslav Harman is an explanatory model: it tries to explain the past. It estimates when the infection arrived to Slovakia and how fast it’s spreading since then. Also, it’s estimating how good the testing selection process is, whether we test too many people with common cold or flu instead of COVID19 patients.These three estimation goals are expressed as the following parameters of the model.
 b_{0} is a measure of overall quality of the testing selection. That is, the greater b_{0}, the better the efficiency of the system in restricting nonCOVID19 individual from testing. The smaller b_{0}, the more nonCOVID19 individuals get tested.
 α or γ_{2} describe the growth of infected cases. The model works both for polynomial growth (parameter α) and for exponential growth (parameter γ_{2}).
 α is the exponent in the polynomial growth function t^{α}. It’s very hard to estimate T_{G} for Slovakia, since we have few dead and recovered, so there is no support for exponential decay yet.
 γ_{2} is the rate of exponential growth after March 12, when Slovakia implemented restrictive policies. Judging by the mobility report released by Google, March 12 seems to be the right choice.
 t_{max} is the total length of the simulation. (My code instead uses prefix_length as a parameter and calculates t_{max} = prefix_length + days, where days is the number of days for which we have data.)
The code is on github, including visualizations and dashboards running on a web server. If you’d like to help in any way, feel free to get in touch.
Picture time!
Let me end with some pictures from the simulations. They confirm Rado’s claim that we cannot distinguish between start of the infection in midFebruary versus end of February with faster spread. You can also view the visualizations online: polynomial and exponential growth.The axes of the heat maps on the pictures show b_{0} and prefix length. The closer the color on the heat map is to yellow, the better a particular configuration matches reality. Notice that yellow color spans through the pictures, so we have plenty of fairly good configurations. There are two pictures for polynomial growth of t^{1.24} and t^{1.28}, and one for exponential growth of 1.04^{t}. So far, the scores of these three setups aren’t that different from each other, but we expect it to change in the coming weeks.
Heat map of errors for different b_{0} and prefix length, with growth of t^{1.24} 
Heat map of errors for different b_{0} and prefix length, with growth of t^{1.28} 
Heat map of errors for different b_{0} and prefix length, with growth of 1.04^{t} 