key: cord-0928621-nw2c36fh
authors: Kurita, J.; Sugawara, T.; Ohkusa, Y.
title: Interim evaluation of Google AI forecasting for COVID-19 compared with statistical forecasting by human intelligence in the first week
date: 2020-12-18
journal: nan
DOI: 10.1101/2020.12.16.20248358
sha: f0a32793045e59bfba50b90561c2f9e4f2529021
doc_id: 928621
cord_uid: nw2c36fh

Background: Since June, Google (Alphabet Inc.) has provided forecasting for COVID-19 outbreak by artificial intelligence (AI) in the USA. In Japan, they provided similar services from November, 2020. Object: We compared Google AI forecasting with a statistical model by human intelligence. Method: We regressed the number of patients whose onset date was day t on the number of patients whose past onset date was 14 days prior, with information about traditional surveillance data for common pediatric infectious diseases including influenza, and prescription surveillance 7 days prior. We predicted the number of onset patients for 7 days, prospectively. Finally, we compared the result with Googles AI-produced forecast. We used the discrepancy rate to evaluate the precision of prediction: the sum of absolute differences between data and prediction divided by the aggregate of data. Results: We found Google prediction significantly negative correlated with the actual observed data, but our model slightly correlated but not significant. Moreover, discrepancy rate of Google prediction was 27.7% for the first week. The discrepancy rate of our model was only 3.47%. Discussion and Conclusion: Results show Googles prediction has negatively correlated and greater difference with the data than our results. Nevertheless, it is noteworthy that this result is tentative: the epidemic curve showing newly onset patients was not fixed.

In Japan, the COVID-19 outbreak had two peaks until October. Data show that it worsened again in November, 2020 ( Figure 1 ) provided by Ministry of Health, Labour and Welfare (MHLW) [1] . Even though the first peak around April is explainable by voluntary restriction of going out (VRG) [2] , the second peak around July cannot be explained by any mathematical model to date.

This system mainly monitors prescriptions from hospitals or clinics that use Application Service Provider for medical claims in pharmacies through a safe internet connection. The collected data represent the number of prescriptions given, classified into therapeutic categories as described below. Therefore, the collected data include no personal information. Data related to the number of prescriptions are extracted automatically and are analyzed daily.

In this system, the numbers of patients were estimated from the numbers of prescriptions for neuraminidase inhibitors, anti-varicella-herpes-zoster virus (VZV) drugs, antibiotic drugs, antipyretic analgesics, and multi-ingredient cold medications by prefecture each day. The numbers of patients with neuraminidase inhibitors or anti-VZV drugs were classified by three age groups: children younger than 15 years old, adults younger than 64 years old, and elderly people over 65 years old and older. Moreover, antibiotics were classified into five types: penicillin, cephem, macrolide, new quinolone, and others [8] . These drugs were chosen to identify clusters of rash, fever, or digestive symptoms to detect bioterrorism attack, emerging/remerging diseases, and mass food poisoning. Particularly because anti-VZV drug is a drug for varicella and zoster, a cluster of this drug in adults with no cluster in children or elderly people represents a signal of small pox [6] . The following morning, these data are presented on a web page All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this this version posted December 18, 2020. ; https://doi.org/10.1101/2020.12.16.20248358 doi: medRxiv preprint (http://prescription.orca.med.or.jp/syndromic/kanjyasuikei/).

Information about the COIVD-19 outbreak was provided by MHLW [1] . The PS system estimates the numbers of patients by multiplying the reciprocal of the participation rate of pharmacies to PS. It also estimates the reciprocal of the proportion of external prescription in the prefecture to the total number of prescriptions in the prefecture. The numbers of patients who received neuraminidase inhibitors, anti-VZV drugs, antibiotic drugs, antipyretic analgesics, multi-ingredient cold medications, and antidiarrheal/ intestinal drugs at an external pharmacy are recorded.

We examined the short-term prediction of the respective incidences of influenza, RS virus infection (RS), pharyngoconjunctival fever (PCF), group A streptococcal pharyngitis (A-SP), varicella, hand, foot and mouth disease (HFMD), erythema infectiosum (EI), exanthem subitum (ES), herpangina, and mumps in NOSSID. Also, NOSSID provides the numbers of patients per sentinel per week as the incidence of each disease considered in this study, except for RS. For RS, NOSSID provides only the total number of patients per week. This information has been published officially every All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this this version posted December 18, 2020. ; https://doi.org/10.1101/2020.12.16.20248358 doi: medRxiv preprint Friday at noon, reflecting the situation of the prior week. For the procedures explained below, we use the latest information of NOSSID from Saturday. Therefore, on Saturday and Sunday, we can use NOSSID information for the prior week or of earlier weeks.

We first examined the association among the epidemic curve of the COVID-19 outbreak and the incidences from NOSSID and the information from PS. We regressed the number of patients whose onset date was day t on their lagged variables for two weeks, t-1 to t-k, and the information of PS and NOSSID available one week before, on day t-7. The length of autoregressive parts, k, was determined to reach the highest the adjusted coefficient of determinant. As described above, NOSSID data are delayed by cases, which were defined as incremental cumulative confirmed cases at day t.

In our model, although the lagged variable of the number of newly onset patients for t-1 to t-k can be predicted recursively, information from PS and NOSSID cannot be predicted. The maximum prediction period is therefore up to 7 days. We updated our prediction every day from November 21, 2020. However, for comparison with the Google prediction produced using the same information, we evaluated the first prediction on November 21 for one week. In other words, we did not use predictions using information related to November 22, 2020 or later. Therefore, we note that our prediction after November 22 was not the same as the prediction on November 21, even for the same day. By contrast, Google provided their prediction for 4 weeks from November 19, 2020.

We evaluated the predictive capability of the two models by the discrepancy rate and correlation coefficient among predictions from the data. the weighted average of absolute value of difference among the actual data and forecasting divided by the actual data All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this this version posted December 18, 2020. ; https://doi.org/10.1101/2020.12.16.20248358 doi: medRxiv preprint Google published their prediction for four weeks from 21 November on 20

November. Prediction by our model started on 21 November. Therefore, it predicted for the period of 22-28 November. Predictions made during 22-28 November were of the period for evaluation.

As described earlier, the epidemic curve was expected to change over time within 30 days. However, we did not wait for such a longer time to evaluate predictions.

Instead, we used the epidemic curve as of the end of November, which was three days after the last day of evaluation.

Data from PS were aggregated and de-linked from personal information related to patients, medical institutions, and pharmacies: these are anonymous data. Information from NOSSID and Google were published data. Therefore, no ethical issue was posed by the use of these data for this study.

The adjusted coefficients of determination indicate that the maximum length of the auto regression part, k, was 14. Figure 2 depicts the epidemic curve and fitted values based on estimation results of our statistical model within the sample. It is apparently All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this this version posted December 18, 2020. ; https://doi.org/10.1101/2020.12.16.20248358 doi: medRxiv preprint almost perfectly fitted within the sample: the adjusted coefficient of determination was 0.9896. 

Results show that our model was superior to that of the Google AI prediction in terms of both of the discrepancy rate and correlation rate. As shown in Figure 3 , the observed epidemic curve lay within the band of CI of our model's prediction, no significant difference was found between the predicted and observed values. Conversely, CI of Google's prediction for the newly confirmed cases was not provided, although it All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this this version posted December 18, 2020. ; https://doi.org/10.1101/2020.12.16.20248358 doi: medRxiv preprint provided CI of prediction of the cumulative confirmed cases.

Although correlation coefficients are sometimes used for evaluation, they are insufficient to evaluate the gap separating data and prediction. It simply indicated whether two data were proportional or not. Therefore, it might be inappropriate for the evaluation of prediction because it is most important that the average of prediction was similar with the data. Therefore, we adopted the discrepancy rate for evaluation of prediction.

Because Google AI prediction depends on the mathematical model, it probably cannot explain several peaks of the COVID-19 outbreak because the mathematical model implies that the peak will be achieved by herd immunity when the proportion of the infected persons was higher than 1-1/R 0 . Actually, a study conducted in the US [3] used data in the upward phase before the first peak. In other words, they did not evaluate that the model can predict the peak. In the upward phase only, the mathematical model can explain the data well, but it does not have predictive power for the peak or reemergence of the outbreak. In this sense, their study appeared to be too early to evaluate the model.

Similarly, regarding forecasting in Japan, details were not disclosed in the manuscript, the Google AI probably used data after July, when the outbreak was All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this this version posted December 18, 2020. ; https://doi.org/10.1101/2020.12.16.20248358 doi: medRxiv preprint downward sloping or almost stable. In this or a similar phase, any model can probably predict outcomes easily. In this sense, the Google AI forecasting model has not been evaluated yet for its prediction power for the peak. Conversely, our examined statistical model can explain the second peak around the end of July. Therefore, it might predict the third or later peaks.

As described above, the Google AI forecasting model has provided the number of newly confirmed cases rather than the number of onset patients, which means the epidemic curve. The series of the number of confirmed cases might be meaningless in terms of epidemiological perception because it might be affected by the testing capacity or strategy or the day of the week. For instance, the building capacity of testing might find a higher proportion of asymptomatic cases and might exaggerate the size of the outbreak. Actually, newly confirmed cases were fewer for the weekend and higher on the days after holidays or Sunday. It might result from the fact that fewer patients visit a doctor during the weekend or on holidays. However, the number of newly confirmed cases was always reported publicly and is very familiar to the general population.

Moreover, the figure will not change over time.

Conversely, our model predicted the number of onset patients, meaning that it predicted the epidemic curve. It is therefore more appropriate from the perspective of All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this this version posted December 18, 2020. ; https://doi.org/10.1101/2020.12.16.20248358 doi: medRxiv preprint epidemiology. However, because there were lags from onset to reporting and distribution of the report, the number of onset patients in the most recent days certainly fluctuates over time for two weeks or for one month. In other words, two weeks or one month duration is necessary to fix the epidemic curve. Therefore, precise prediction for the epidemic curve is extremely important. However, because it requires fixing for a long time, it was not published publicly and routinely and is not commonly available to the general public.

By the definition of AI, because the AI can use knowledge or a model used by human intelligence and because human intelligence cannot access the AI model in detail, AI can always provide better prediction than human intelligence. The problem remains of whether differences between predictions by an AI and human intelligence are useful or not.

In general, AI cannot predict new situation which was not included teaching data [11] . Because neural network adjust their parameter in the region of [0,1]. It cannot set some parameter as higher than one or lower than zero. In other words, AI was a kind of average in the teaching data though its procedure was complex. Therefore, AI cannot expect outside the set of teaching data. However, if the situation was included in the set f teaching data, it follows usual pattern, and thus prediction should be easily even by All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this this version posted December 18, 2020. ; https://doi.org/10.1101/2020.12.16.20248358 doi: medRxiv preprint human being. Forecasting should be important in the newly emerging situation which was unusual and not included teaching data. In this case, AI may not predict better than human intelligence.

Although definitions of predicted variables for Google's AI and our model were not the same, our predictive model was shown to be more precise than Google's AI model. The present study has some limitations. First, although we evaluate two predictions in the first week, evaluations might be different for a longer time or a different phase of the outbreak. Therefore, we must continue to evaluate the two predictions over a longer time.

Secondly, the epidemic curve predicted by our model was not fixed until one month later. Therefore, evaluation for the prediction of our model might change up to one month. The obtained result should be regarded as tentative. By contrast, the evaluation of Google's prediction will not change.

We demonstrated that our model was more appropriate than Google's for the first week. However, it is noteworthy that this result is tentative: not conclusive. We must monitor these two predictions' respective performances carefully. All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this this version posted December 18, 2020. ; preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this this version posted December 18, 2020. ; https://doi.org/10.1101/2020.12.16.20248358 doi: medRxiv preprint preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this this version posted December 18, 2020. ; https://doi.org/10.1101/2020.12.16.20248358 doi: medRxiv preprint preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this this version posted December 18, 2020. ; https://doi.org/10.1101/2020.12.16.20248358 doi: medRxiv preprint

Japan Ministry of Health, Labour and Welfare

Mobility data can explain the entire COVID?19 outbreak course in Japan

Interpretable Sequence Learning for COVID-19 Forecasting

Annual incidence rate of infectious diseases estimated from sentinel surveillance data in Japan

We acknowledge the great efforts of all staff at public health centers, medical institutions, and other facilities who are fighting the spread and destruction associated with COVID-19.

The present study is based on the authors' opinions: it does not reflect any stance or policy of their professionally affiliated bodies.