The present outbreak of COVID-19 disease, caused by the SARS-CoV-2 virus, has put the planet in quarantine. On January 30, 2020, the World Health Organization (WHO) declared the COVID-19 outbreak a “public health emergency of international concern”, and then a pandemic on March 11.

Spain has become the fifth country worldwide with more infected cases, officially registering over thousands of cases in a short time. Although many critical and severe measures have been considered from the authorities to lessen the impact of the outbreak and help flatten the curve, they rely on numbers that could be unreliable and therefore misrepresent the implications of such pandemic.

Counts in Spain due to the protocols used for testing, mainly include individuals with severe symptoms. The authorities have juste announced a new protocol with rapid tests to be implementend in a few days elpais.com.

Given the nature of our data, we can guess that the estimated number of cases that we are finding are in fact potentially severe cases, and presumably the size of the infected population (asymptomatic) is even higher.

Accordingly, the current analysis aims to update the situation concerning COVID-19 daily, and particularly quantify the potential under-reporting in the official registered cases by region in Spain. Results herein can help to have a more realistic picture of the pandemic at a real time as well as to more accurately estimate essential measures such as the basic reproduction number or the fatality rate that are used for practitioners and politicians to make decisions.

The data for the analysis have been extracted from eldiario.es, where official data are gathered.

Notice that this analysis can be easily reproduced for other countries.

minimum | mean | median | maximum | standard deviation | dispersion index | |
---|---|---|---|---|---|---|

Andalucia | 0 | 45.82 | 12.5 | 176 | 61.99 | 83.86 |

Aragon | 0 | 12.82 | 5.0 | 67 | 18.37 | 26.34 |

Asturias | 0 | 13.27 | 2.0 | 50 | 17.62 | 23.39 |

Baleares | 0 | 7.68 | 1.5 | 57 | 15.19 | 30.02 |

Canarias | 0 | 10.00 | 5.5 | 39 | 11.81 | 13.95 |

Cantabria | 0 | 3.77 | 0.0 | 20 | 5.92 | 9.29 |

Castilla La Mancha | 0 | 43.95 | 8.0 | 166 | 58.25 | 77.20 |

Castilla Leon | 0 | 39.45 | 9.5 | 237 | 64.78 | 106.37 |

Extremadura | 0 | 10.95 | 1.0 | 47 | 15.98 | 23.31 |

Galicia | 0 | 20.59 | 2.0 | 112 | 31.25 | 47.43 |

La Rioja | 0 | 21.27 | 17.5 | 64 | 20.54 | 19.84 |

Murcia | 0 | 7.59 | 2.5 | 45 | 11.54 | 17.55 |

Navarra | 0 | 21.91 | 1.5 | 96 | 31.03 | 43.93 |

Pais Vasco | 0 | 67.00 | 38.0 | 217 | 79.95 | 95.41 |

Valencia | 0 | 54.55 | 7.0 | 279 | 93.26 | 159.46 |

If the under-reporting is ignored, the daily counts can be appropriately modeled following: \(exp(\alpha_0 + \alpha_1t)\), since the number of daily COVID-19 cases overtime properly growths exponentially according to Figure 1. At the moment, there are no evidences of a seasonal behaviour of SARS-CoV-2 virus, unlike the MERS-cov (Alkhamis, Fernández-Fontelo et al., 2018).

However, if we consider that the official number of daily cases does not reflect the total number of cases (e.g., a proportion of the cases is not observed, and thus the data are misreported), the model above does not make any sense, and therefore a more appropriate alternative should be considered.

We shall base all the subsequent analysis in a model introduced by Fernández-Fontelo, Cabaña et al. (2016). We have also applied a similar methodology in Fernández-Fontelo, Cabaña et al. (2019) and in other papers submitted for publication (Moriña, Fernández-Fontelo et al. (2020a) and Moriña, Fernández-Fontelo et al. (2020b)).

In that model, two different processes are considered: \(X_n\) which is the true process but unobserved (latent), and \(Y_n\) which is observed and potentially under-reported. In this application, the latent process is assumed to be Poisson distributed with time-dependent rate, \(\lambda_t=exp(\beta_0 + \beta_1t)\). The observed process will always be lower or equal than the latent process (due to the under-reporting) in such a way that \(Y_n\) will be equal than \(X_n\) (non under-reporting) with probability \(1-\omega\), or \(Y_n\) is \(q \circ X_n\) with probability \(\omega\). Parameters \(\omega\) and \(q\) quantify the overall frequency and intensity of the phenomenon, which roughly speaking describe respectively the number of times the observed counts are not equal to the real ones, and the distance between the real and observed processes.

\(\beta_0\) | \(\beta_1\) | \(\omega\) | \(q\) | AIC | ||
---|---|---|---|---|---|---|

Andalucia | 0.3279 | 0.2479 | 0.8572 | 0.6122 | 212.7 | |

s.e. (Andalucia) | 0.1728 | 0.0087 | 0.0883 | 0.0211 | ||

Aragon | -0.3013 | 0.2046 | 0.3709 | 0.2282 | 166.5 | |

s.e. (Aragon) | 0.33 | 0.0175 | 0.1347 | 0.0526 | ||

Asturias | -0.3282 | 0.2051 | 0.4396 | 0.2662 | 134.1 | |

s.e. (Asturias) | 0.3922 | 0.0202 | 0.1492 | 0.0684 | ||

Baleares | -0.7201 | 0.2237 | 0.8452 | 0.2877 | 120 | |

s.e. (Baleares) | 0.6296 | 0.0294 | 0.1078 | 0.0559 | ||

Canarias | 0.0592 | 0.1659 | 0.3799 | 0.284 | 121.6 | |

s.e. (Canarias) | 0.4031 | 0.021 | 0.1475 | 0.0848 | ||

Cantabria | 1.8292 | 0.0344 | 0.6844 | 0.0411 | 101.5 | |

s.e. (Cantabria) | 0.4494 | 0.0244 | 0.1019 | 0.022 | ||

Castilla La Mancha | -0.1119 | 0.2631 | 0.5525 | 0.478 | 179.6 | |

s.e. (Castilla La Mancha) | 0.1969 | 0.0102 | 0.1294 | 0.0261 | ||

Castilla Leon | -0.6833 | 0.2945 | 0.7692 | 0.5552 | 164.8 | |

s.e. (Castilla Leon) | 0.2506 | 0.0121 | 0.1235 | 0.0286 | ||

Extremadura | 0.0904 | 0.173 | 0.518 | 0.0471 | 108.7 | |

s.e. (Extremadura) | 0.4445 | 0.0229 | 0.114 | 0.0292 | ||

Galicia | -0.8934 | 0.2656 | 0.5915 | 0.4145 | 171.8 | |

s.e. (Galicia) | 0.3198 | 0.0162 | 0.1364 | 0.0478 | ||

La Rioja | 2.1865 | 0.0921 | 0.5259 | 0.2519 | 208.8 | |

s.e. (La Rioja) | 0.2126 | 0.0118 | 0.1116 | 0.0357 | ||

Murcia | -2.2174 | 0.2724 | 0.1708 | 0.2922 | 87.9 | |

s.e. (Murcia) | 0.4695 | 0.024 | 0.1459 | 0.1265 | ||

Navarra | -0.7687 | 0.2661 | 0.7357 | 0.5638 | 186.3 | |

s.e. (Navarra) | 0.262 | 0.0133 | 0.1173 | 0.0353 | ||

Pais Vasco | 1.2253 | 0.2143 | 0.7241 | 0.6253 | 260.2 | |

s.e. (Pais Vasco) | 0.141 | 0.0072 | 0.1089 | 0.0196 | ||

Valencia | 4.4696 | 0.0447 | 0.7727 | 0.0572 | 377.4 | |

s.e. (Valencia) | 0.2823 | 0.014 | 0.0893 | 0.0096 |

Using the Viterbi algorithm, the model also enables reconstructing the most likely sequence of real COVID-19 cases throughout the study. This allows us to have an estimated time series of truly daily cases and evaluate the impact of under-reporting over measures such as the basic reproduction number. Figure 2 shows the observed and reconstructed series over time by region.

Table 3 shows the percentages of means counts that are not covered by the official registers. Thus, the highest the rate, the lower is the coverage, and therefore the severe is the impact of the under-reporting.

observed mean | true mean | % not covered | |
---|---|---|---|

Andalucia | 29.1579 | 46.0000 | 36.61 |

Aragon | 9.2105 | 19.5789 | 52.96 |

Asturias | 9.3158 | 11.7368 | 20.63 |

Canarias | 6.2632 | 7.8421 | 20.13 |

Cantabria | 3.0526 | 9.0526 | 66.28 |

Castilla Leon | 17.5789 | 23.4211 | 24.94 |

Catalunya | 47.5263 | 57.8421 | 17.83 |

Extremadura | 5.8421 | 9.0000 | 35.09 |

Galicia | 12.8947 | 19.5789 | 34.14 |

La Rioja | 16.4211 | 27.6842 | 40.68 |

Madrid | 219.2105 | 306.0526 | 28.37 |

Navarra | 14.4211 | 18.3158 | 21.26 |

Pais Vasco | 48.1053 | 60.0000 | 19.82 |

It is instructive to see what the difference would be on epidemic spread by fitting an epidemic model to the reconstructed series of counts and the observed counts recorded by public agencies. We fit the classic SIR (Susceptible-Infectious-Recovered) model. Table 4 shows the basic reproduction rate by using the reconstructed series ( \(R_{0E}\) ) and the observed ( \(R_{0R}\) ).

The dynamics of the spread of the virus in the SIR model is described by the following differential equations:

\(\frac{dS}{dt} = -\beta \frac{IS}{N} \)

\(\frac{dI}{dt} = \beta \frac{IS}{N}- \gamma I \)

\(\frac{dR}{dt} = \gamma I \)

where the parameters are \(\beta\) , the infection rate, and \(\gamma\) , the recovery rate, and \(N\) is the total population.

We seek the values of \(\beta\) and \(\gamma\) that minimizes de residual sum of squares (RSS) between the number of infected individuals and the corresponding number of cases as predicted by the model at any time.

Once the values of \(\beta\) and \(\gamma\) are known we can compute the important basic reproduction number: \(R_0=\beta/\gamma\) . This number \(R_0\) gives us an estimate of the average number of susceptibles individuals who are infected by each infected individual.

\(\beta_E\) | \(\gamma_E\) | \(R_{0E}\) | \(\beta_R\) | \(\gamma_R\) | \(R_{0R}\) | |
---|---|---|---|---|---|---|

Andalucia | 0.3036 | 0.0000 | Inf | 0.6382 | 0.3618 | 1.7640 |

Aragon | 0.5000 | 0.5000 | 1.0000 | 0.5000 | 0.5000 | 1.0000 |

Asturias | 0.5000 | 0.5000 | 1.0000 | 0.5000 | 0.5000 | 1.0000 |

Canarias | 0.5314 | 0.4686 | 1.1341 | 0.5213 | 0.4787 | 1.0890 |

Cantabria | 0.5685 | 0.4315 | 1.3174 | 0.5000 | 0.5000 | 1.0000 |

Castilla Leon | 0.5000 | 0.5000 | 1.0000 | 0.5000 | 0.5000 | 1.0000 |

Catalunya | 0.6215 | 0.3785 | 1.6420 | 0.6189 | 0.3811 | 1.6243 |

Extremadura | 0.5000 | 0.5000 | 1.0000 | 0.5000 | 0.5000 | 1.0000 |

Galicia | 0.5000 | 0.5000 | 1.0000 | 0.5000 | 0.5000 | 1.0000 |

La Rioja | 1.0000 | 0.8147 | 1.2274 | 0.5000 | 0.5000 | 1.0000 |

Navarra | 0.5000 | 0.5000 | 1.0000 | 0.5000 | 0.5000 | 1.0000 |

Pais Vasco | 0.5000 | 0.5000 | 1.0000 | 0.5000 | 0.5000 | 1.0000 |

Madrid | 1.0000 | 0.6685 | 1.4957 | 0.6509 | 0.3490 | 1.8648 |