The epidemic volatility index, a new early warning tool to identify new waves of an epidemic
The epidemic volatility index
The EVI is based on the calculation of the sliding standard deviation for a time series of epidemic data (i.e. the number of new cases per day). The number of consecutive observations used for this calculation is the size of the moving window –m. At each time step, for a sliding window of size m, observations in the window are obtained by moving the window forward, over the time series data, one observation at a time (Fig. 1).
For each moving window, the standard deviation of the newly reported cases is then calculated, which makes it possible to estimate the EVI as the relative change in the standard deviation between two consecutive moving windows. A warning signal is issued if (i) this relative change exceeds a threshold (c left ({c in left[ {0,1} right]} to the right)) and (ii) the cases observed at present are higher than the average of the cases reported the previous week.
Criterion and desired precision
The accuracy of the EVI is measured by its sensitivity ( left ({Se} right) ) (i.e. the probability of correctly issuing an early warning for an upcoming epidemic wave) and its specificity ( left ({Sp} right) ) (i.e. the likelihood of not reporting an alarm in the absence of upcoming waves) and depends on the criteria used to define what constitutes a noticeable increase in the number of expected cases indicating an upcoming epidemic wave . For example, a criterion can be, as in the example application which follows, an increase in the average number of cases between two consecutive weeks greater than 20%.
For a specified criterion, the precision of the EVI depends on the size of the window m and the threshold vs, which should be selected so as to achieve a desired precision goal. One option is the selection of m and vs values â€‹â€‹that lead to the best Se and Sp combination for EVI, thanks to the maximization of the Youden index ( left ({J = Se + Sp – 1} right) )^{12} and, therefore, the overall minimization of false results (i.e. the total number of false positive and false negative early warnings). Another approach could be to select (m ) and (vs) such as the highest (Se left ({or ; Sp} right) ) is reached with (Sp left ({or ; Se} right) = 1 ) or not to fall below a critical value (for example 0.95). Advanced receiver operating characteristic curve analysis can also be performed^{13} and the selection of critical values â€‹â€‹can be based on cues that quantify the relative cost of false positive warnings (i.e. falsely predicting an upcoming epidemic wave) versus false negatives (i.e. does not predict a coming epidemic wave), such as the misclassification cost term.
Optimal selection m and vs and generation of an early warning
For a specified criterion and a desired precision, target the (m ) and (vs) are selected by an iterative process. In short, every time a new point in time (t ) Is observed:

1.
Case up to (t ) are analyzed for all possible window sizes ( left (m right) ) and thresholds ( left (c right) ).

2.
For each of the (m ) and (vs) combinations, the (Se _ {{t_ {m, c}}} ) and (Sp _ {{t_ {m, c}}} ) are estimated for the specified criterion.

3.
the (m ^ { prime} ) and (c ^ { prime} ) who gives the best (Se _ {{t_ {m ^ { prime}, c ^ { prime}}}} ) and (Sp _ {{t_ {m ^ { prime}, c ^ { prime}}}} ) combination are selected (i.e. overall minimization of false results).

4.
Based on (m ^ { prime} ) and (c ^ { prime} ), EVI is calculated at the new time point (t ) and a decision is made whether or not a warning signal is issued.
The graphical representation of the entire process is given in Fig. 2, while the statistical details are described in the â€œAnnexâ€.
Overall precision and predictive values
It is possible, at any moment t, to calculate positive and negative predictive values, defined as the probability of observing an increase or decrease in the future number of cases, given that an early warning has been issued or not, respectively. Finally, once all of the time series data has been observed, the Se_{EVI} and Sp_{EVI} can be estimated.
Sensitivity analysis
For each epidemic, the precision of the EVI depends on the criterion specified. Ideally, different criterion values â€‹â€‹should be explored to identify those that are suitable for optimal monitoring of the epidemic. In the following example, a sensitivity analysis based on an alternative criterion was performed.
Example of application
The most serious threat to global health and economy today^{14} is the COVID19 pandemic that was first reported to the WHO country office in China on December 31, 2019^{15}. Data on confirmed COVID19 cases was taken from the COVID19 data repository, which is maintained by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University^{16}. The number of new daily confirmed cases of COVID19, for each country, from January 22, 2020 to April 13, 2021, was analyzed. Due to the artificial variability of reported cases between working days and weekends, a 7day moving average rather than the actual observed cases was analyzed. For analysis, (m_ {max} ) was limited to 30 days in order to avoid the effect of potentially higher volatility from previous epidemic waves on the most recent data volatility estimates and the predictive ability of the EVI for future epidemic waves and maybe softer.
The criterion used was an increase in the mean of expected cases, between two consecutive weeks, equal to or greater than twenty percent. For the sensitivity analysis, detection of an increase in the mean of expected cases equal to or greater than 50 percent was considered. The data was analyzed separately for each country and for each state in the United States of America that had experienced a total number of cases greater than 20,000, up to April 13, 2021.
Statistical software
All models were run in R^{17}. Readxl packages^{18}, ggplot2^{19}cowplot^{18.20} and reader^{21} were used. EVI is also available as a Stata module (type “scc install evi” in the command line)^{22} and as an Rpackage (https://github.com/kuawdc/EVI).
Results
Results for Italy, one of the most affected EU countries^{23}, and New York, which was at the epicenter of the pandemic in the United States^{24}, are presented in the main manuscript. Daily updated results for all countries around the world and each of the United States are available online at http://83.212.174.99:3838.
The confirmed cases of COVID19 for Italy and New York State, from January 22, 2020 to April 13, 2021, are shown in Figs. 3 and 4, respectively. The red dots correspond to the times when an early warning has been issued and indicate that, according to the defined criterion, an increase in the average of expected cases equal to or greater than twenty percent is expected in the coming week. Gray dots are points in time with no early warning indication. Further, the positive and negative predictive values â€‹â€‹at each time point are in Figs. 5 and 6, respectively.
For Italy, the overall sensitivity for the EVI was 0.82 (95% confidence intervals: 0.75; 0.89) and the specificity was 0.91 (0.88; 0.94) . For New York, the corresponding values â€‹â€‹were 0.55 (0.47; 0.64) and 0.88 (0.84; 0.91).
The results of the sensitivity analysis for Italy are shown in Figure 7. Under the alternative endpoint to detect an increase in the mean of expected cases equal to or greater than 50%, the overall sensitivity and specificity was 0 , 75 (0.66; 0.85) and 0.93 (0.91; 0.96), respectively.
A consistent finding in the results from all countries was that consecutive early alerts are linked to the start of a new epidemic wave, while the absence of alerts indicates a stable pattern or a future decline in the number of new COVID cases. 19 (Fig. 3, 4 and http://83.212.174.99:3838/).