key: cord-0737407-dl6z8x9h
authors: Dandekar, R.; Rackauckas, C.; Barbastathis, G.
title: A machine learning aided global diagnostic and comparative tool to assess effect of quarantine control in Covid-19 spread
date: 2020-07-24
journal: nan
DOI: 10.1101/2020.07.23.20160697
sha: f345ab104027e5bffaa69b9e220e3e0985de5830
doc_id: 737407
cord_uid: dl6z8x9h

We have developed a globally applicable diagnostic Covid-19 model by augmenting the classical SIR epidemiological model with a neural network module. Our model does not rely upon previous epidemics like SARS/MERS and all parameters are optimized via machine learning algorithms employed on publicly available Covid-19 data. The model decomposes the contributions to the infection timeseries to analyze and compare the role of quarantine control policies employed in highly affected regions of Europe, North America, South America and Asia in controlling the spread of the virus. For all continents considered, our results show a generally strong correlation between strengthening of the quarantine controls as learnt by the model and actions taken by the regions' respective governments. Finally, we have hosted our quarantine diagnosis results for the top $70$ affected countries worldwide, on a public platform, which can be used for informed decision making by public health officials and researchers alike.

We have developed a globally applicable diagnostic Covid-19 model by augmenting the classical SIR epidemiological model with a neural network module. Our model does not rely upon previous epidemics like SARS/MERS and all parameters are optimized via machine learning algorithms employed on publicly available Covid-19 data. The model decomposes the contributions to the infection timeseries to analyze and compare the role of quarantine control policies employed in highly affected regions of Europe, North America, South America and Asia in controlling the spread of the virus. For all continents considered, our results show a generally strong correlation between strengthening of the quarantine controls as learnt by the model and actions taken by the regions' respective governments. Finally, we have hosted our quarantine diagnosis results for the top 70 affected countries worldwide, on a public platform, which can be used for informed decision making by public health officials and researchers alike.

The Coronavirus respiratory disease 2019 originating from the virus "SARS-CoV-2" 1, 2 has led to a global pandemic, leading to 12, 552, 765 confirmed global cases in more than 200 countries as of July 12, 2020. 3 As the disease began to spread beyond its apparent origin in Wuhan, the responses of local and national governments varied considerably. The evolution of infections has been similarly diverse, in some cases appearing to be contained and in others reaching catastrophic proportions.

Given the observed spatially and temporally diverse government responses and outcomes, the role played by the varying quarantine measures in different countries in shaping the infection growth curve is still not clear. With publicly available Covid-19 data by country and world-wide by now widely available, there is an urgent need to use data-driven approaches to bridge this gap, quantitatively estimate and compare the role of the quarantine policy measures implemented in several countries in curtailing spread of the disease. As of this writing, more than a 100 papers have been made available, 9 mostly in preprint form. Existing models have one or more of the following limitations:

• Lack of independent estimation: Using parameters based on prior knowledge of SARS/MERS coronavirus epidemiology and not derived independently from the Covid-19 data 10 or parameters like rate of detection, nature of government response fixed prior to running the model. 11 • Lack of global applicability: Not implemented on a global scale. 12 • Lack of interpretibility: Using several free/fitting parameters making it a cumbersome, complicated model to reciprocate and use by policy makers. 13 In this paper, we propose a globally scalable, interpretable model with completely independent parameter estimation through a novel approach: augmenting a first principles-derived epidemiological model with a data-driven module, implemented as a neural network. We leverage this model to quantify the quarantine strengths and analyze and compare the role of quarantine control policies employed to control the virus effective reproduction number [13] [14] [15] [16] [17] [18] [19] in the European, North American, South American and Asian continents. In a classical and commonly used model, known as SEIR, [20] [21] [22] the population is divided into the susceptible S, exposed E, infected I and recovered R groups, and their relative growths and competition are represented as a set of coupled ordinary differential equations. The simpler SIR model does not account for the exposed population E. These models cannot capture the large-scale effects of more granular interactions, such as the population's response to social distancing and quarantine policies. However, a major assumption of these models is that the rate of transitions between population states is fixed. In our approach, we relax this assumption by estimating the time-dependent quarantine effect on virus exposure as a neural network informs the infected variable I in the SIR model. This trained model thus decomposes the effects and the neural network encodes information about the quarantine strength function in the locale where the model is trained.

In general, neural networks with arbitrary activation functions are universal approximators. [23] [24] [25] Unbounded activation functions, in particular, such as the rectified linear unit (ReLU) has been known to be effective in approximating nonlinear functions with a finite set of parameters. [26] [27] [28] Thus, a neural network solution is attractive to approximate quarantine effects in combination with analytical epidemiological models. The downside is that the internal workings of a neural network are difficult to interpret. The recently emerging field of Scientific Machine Learning 29 exploits conservation principles within a universal differential equation, 30 SIR in our case, to mitigate overfitting and other related machine learning risks.

In the present work, the neural network is trained from publicly available infection and population data for Covid-19 for a specific region under study; details are in the Experimental Procedures section. Thus, our proposed model is globally applicable and interpretable with parameters learned from the current Covid-19 data, and does not rely upon data from previous epidemics like SARS/MERS.

The classic SIR epidemiological model is a standard tool for basic analysis concerning the outbreak of epidemics. In this model, the entire population is divided into three sub-populations: susceptible S; infected I; and recovered R. The sub-populations' evolution is governed by the following system of three coupled nonlinear ordinary differential equations

Here, β is the infection rate and γ is the recovery rates, respectively, and are assumed to be constant in time. The total population N = S(t) + I(t) + R(t) is seen to remain constant as well; that is, births and deaths are neglected. The recovered population is to be interpreted as those who can no longer infect others; so it also includes individuals deceased due to the infection. The possibility of recovered individuals to become reinfected is accounted for by SEIS models, 31 but we do not use this model here, as the reinfection rate for Covid-19 survivors is considered to be negligible as of now. The reproduction number R t in the SEIR and SIR models is defined as

An important assumption of the SIR models is homogeneous mixing among the subpopulations. Therefore, this model cannot account for social distancing or or social network effects. Additionally the model assumes uniform susceptibility and disease progress for every individual; and that no spreading occurs through animals or other non-human means. Alternatively, the SIR model may be interpreted as quantifying the statistical expectations on the respective mean populations, while deviations from the model's assumptions contribute to statistical fluctuations around the mean.

To study the effect of quarantine control globally, we start with the SIR epidemiological model. Figure 1a shows the schematic of the modified SIR model, the QSIR model, which we consider.

We augment the SIR model by introducing a time varying quarantine strength rate term Q(t) and a quarantined population T (t), which is prevented from having any further contact with the susceptible population. Thus, the term I(t) denotes the infected population still having contact with the susceptibles, as done in the standard SIR model; while the term T (t) denotes the infected population who are effectively quarantined and isolated. Thus, we can write an expression for the quarantined infected population T (t) as

Further we introduce an additional recovery rate δ which quantifies the rate of recovery of the quarantined population. Based on the modified model, we define a Covid spread parameter in a similar way to the reproduction number defined in the SIR model (4) as

C p > 1 indicates that infections are being introduced into the population at a higher rate than they are being removed, leading to rapid spread of the disease. On the other hand, C p < 1 indicates that the Covid spread has been brought under control in the region of consideration. Since Q(t) does not follow from first principles and is highly dependent on local quarantine policies, we devised a neural network-based approach to approximate it. Recently, it has been shown that neural networks can be used as function approximators to recover unknown constitutive relationships in a system of coupled ordinary differential equations. 30, 32 Following this principle, we represent Q(t) as a n layer-deep neural network with weights W 1 , W 2 . . . W n , activation function r and the input vector U = (S(t), I(t), R(t)) as

For the implementation, we choose a n = 2-layer densely connected neural network with 10 units in the hidden layer and the ReLU activation function. This choice was because we found sigmoidal activation functions to stagnate. The final model was described by 54 tunable parameters. The neural network architecture schematic is shown in figure 1b. The governing coupled ordinary differential equations for the QSIR model are

More details about the model initialization and parameter estimation methods is given in the Experimental Procedures section. In all cases considered below, we trained the model using data starting from the dates when the 500 th infection was recorded in each region and up to June 1 2020. In each subsequent case study, Q(t) denotes the rate at which infected persons are effectively quarantined and isolated from the remaining population, and thus gives composite information about (a) the effective testing rate of the infected population as the disease progressed and (b) the intensity of the enforced quarantine as a function of time. To understand the nature of evolution of Q(t), we look at the time point when Q(t) approximately shows an inflection point, or a ramp up point. An inflection point in Q(t) indicates the time when the rate of increase of Q(t) i.e dQ(t) dt was at its peak while a ramp up point corresponds to a sudden intensification of quarantine policies employed in the region under consideration.

We define the quarantine efficiency, Q eff as the increase in Q(t) within a month following the detection of the 500 th infected case in the region under consideration. Thus

The magnitude of Q eff shows how rapidly the infected individuals were prevented from coming into contact with the susceptibles in the first month following the detection of the 500 th infected case; and thus contains composite information about the quarantine and lockdown strength; and the testing and tracing protocols to identify and isolate infected individuals. Figure 2 shows the comparison of the model-estimated infected and recovered case counts with actual Covid-19 data for the highest affected European countries as of 1 June 2020, namely: Russia, UK, Spain and Italy, in that order. We find that irrespective of a small set of optimized parameters (note that the contact rate β and the recovery rate γ are fixed, and not functions of time), a reasonably good match is seen in all four cases. recovery rates are assumed to be constant in our model, in the duration spanning the detection of the 500 th infected case and June 1 st , 2020. The average contact rate in Spain and Italy is seen to be higher than Russia and UK over the considered duration of 2 − 3 months, possibly because Russia and UK were affected relatively late by the virus, which gave sufficient time for the enforcement strict social distancing protocols prior to widespread outbreak. For Spain and Italy, the quarantine efficiency and also the recovery rate are generally higher than for Russia and UK, possibly indicating more efficient testing, isolation and quarantine; and hospital practices in Spain and Italy. This agrees well with the ineffectiveness of testing, contact tracing and quarantine practices seen in UK. 35 Although the social distancing strength also varied with time, we do not focus on that aspect in the present study, and will be the subject of future studies. A higher quarantine efficiency combined with a higher recovery rate led Spain and Italy to bring down the Covid spread parameter (defined in (6)), C p from > 1 to < 1 in 16, 25 days. respectively, as compared to 32 days for UK and 42 days for Russia (figure 4). Figure 5 shows Q eff for the 23 highest affected European countries. We can see that Q eff in the western European regions is generally higher than eastern Europe. This can be attributed to the strong lockdown measures implemented in western countries like Spain, Italy, Germany, France after the rise of infections seen first in Italy and Spain. 36 Although countries like Switzerland and Turkey didn't enforce a strict lockdown as compared to their west European counterparts, they were generally successful in halting the infection count before reaching catastrophic proportions, due to strong testing and tracing protocols. 37, 38 Subsequently, these countries also managed to identify potentially infected individuals and prevented them from coming into contact with susceptibles, giving them a high Q eff score as seen in figure 5 . In contrast, our study also manages to identify countries like Sweden which had very limited lockdown measures; 39 with a low Q eff score as seen in figure 5 . This strengthens the validity of our model in diagnosing information about the effectiveness of quarantine and isolation protocols in different countries; which agree well with the actual protocols seen in these countries. Figure 6 shows reasonably good match between the model-estimated infected and recovered case counts with actual Covid-19 data for the highest affected North American states (including states from Mexico, the United States, and Canada) as of 1 June 2020, namely: New York, New Jersey, Illinois and California. Q(t) for New York and New Jersey show a ramp up point immediately in the week following the detection of the 500 th case in these regions, i.e. on 19 March for New York and on 24 March for New Jersey ( figure 7) . This matches well with the actual dates: 22 March in New York and 21 March in New Jersey when stay at home orders and isolation measures were enforced in these states. A relatively slower rise of Q(t) is seen for Illinois while California showing a ramp up post a week after detection of the 500 th case. Although no significant difference is seen in the mean contact and recovery rates between the different US states, the quarantine efficiency in New York and New Jersey is seen to be significantly higher than that of Illinois and California (figure 16b), indicating the effectiveness of the rapidly deployed quarantine interventions in New York and New Jersey. 40 Owing to the high quarantine efficiency in New York and New Jersey, these states were able to bring down the Covid spread parameter, C p to less than 1 in 19 days ( figure 8 ). On the other hand, although Illinois and California reached close to C p = 1 after the 30 day and 20 day mark respectively, C p still remained greater than 1 (figure 8), indicating that these states were still in the danger zone as of June 1, 2020. An important caveat to this result is the reporting of the recovered data.

Comparing with Europe, the recovery rates seen in North America are significantly lower (figures 16a,b) . It should be noted that accurate reporting of recovery rates is likely to play a major role in this apparent difference. In our study, the recovered data include individuals who cannot further transmit infection; and thus includes treated patients who are currently in a healthy state and also individuals who died due to the virus. Since quantification of deaths can be done in a robust manner, the death data is generally reported more accurately. However, there is no clear definition for quantifying the number of people who transitioned from infected to healthy. As a result, accurate and timely reporting of recovered data is seen to have a significant variation between countries, under reporting of the recovered data being a common practice. Since the effective reproduction number calculation depends on the recovered case count, accurate data regarding the recovered count is vital to assess whether the infection has been curtailed in a particular region or not. Thus, our results strongly indicate the need for each country to follow a particular metric for estimating the recovered count robustly, which is vital for data driven assessment of the pandemic spread. Figure 9a shows the quarantine efficiency for 20 major US states spanning the whole country. Figure 9b shows the comparison between a report published in the Wall Street Journal on May 21 highlighting USA states based on their lockdown conditions, 41 and the quarantine efficiency magnitude in our study. The size of the circles represent the magnitude of the quarantine efficiency. The blue color indicate the states for which the quarantine efficiency was greater than the mean quarantine efficiency across all US states, while those in red indicate the opposite. Our results indicate that the north-eastern and western states were much more responsive in implementing rapid quarantine measures in the month following early detection; as compared to the southern and central states. This matches the on-ground situation as indicated by a generally strong correlation is seen between the red circles in our study (states with lower quarantine efficiency) and the yellow regions seen in in the Wall Street Journal report 41 (states with reduced imposition of restrictions) and between the blue circles in our study (states with higher quarantine efficiency) and the blue regions seen in the Wall Street Journal report 41 (states with generally higher level of restrictions). This strengthens the validity of our approach in which the quarantine efficiency is recovered through a trained neural network rooted in fundamental epidemiological equations. Figure 10 shows reasonably good match between the model-estimated infected and recovered case count with actual Covid-19 data for the highest affected Asian countries as of 1 June 2020, namely: India, China and South Korea. Q(t) shows a rapid ramp up in China and South Korea ( figure 11 ) which agrees well with cusps in government interventions which took place in the weeks leading to and after the end of January 4 and February 42 for China and South Korea respectively. On the other hand, a slow build up of Q(t) is seen for India, with no significant ramp up. This is reflected in the quarantine efficiency comparison (figure 16c), which is much higher for China and South Korea compared to India. South Korea shows a significantly lower contact rate than its Asian counterparts, indicating strongly enforced and followed social distancing protocols. 43 No significant difference in the recovery rate is observed between the Asian countries. Owing to the high quarantine efficiency in China and a high quarantine efficiency coupled with strongly enforced social distancing in South Korea, these countries were able to bring down the Covid spread parameter C p from > 1 to < 1 in 21 and 13 days respectively, while it took 33 days in India ( figure 12 ). Figure 13 shows reasonably good match between the model-estimated infected and recovered case count with actual Covid-19 data for the highest affected South American countries as of 1 June 2020, namely: Brazil, Chile and Peru. For Brazil, Q(t) is seen to be approximately constant ≈ 0 initially with a ramp up around the 20 day mark; after which Q(t) is seen to stagnate (figure 14a). The key difference between the Covid progression in Brazil compared to other nations is that the infected and the recovered (recovered healthy + dead in our study) count is very close to one another as the disease progressed ( figure 13 ). Owing to this, as the disease progressed, the new infected people introduced in the population were balanced by the infected people removed from the population, either by being healthy or deceased. This higher recovery rate combined with a generally low quarantine efficiency and contact rate (figure 16d) manifests itself in the Covid spread parameter for Brazil to be < 1 for almost the entire duration of the disease progression (figure 15a). For Chile, Q(t) is almost constant for the entire duration considered (figure 14b). Thus, although government regulations were imposed swiftly following the initial detection of the virus, leading to a high initial magnitude of Q(t), the government imposition became subsequently relaxed. This maybe attributed to several political and social factors outside the scope of the present study. 44 Even for Chile, the infected and recovered count remain close to each other compared to other nations. A generally high quarantine magnitude coupled with a moderate recovery rate (figure 16d) leads to C p being < 1 for the entire duration of disease progression (figure 15b). In Peru, Q(t) shows a very slow build up (figure 14c) with a very low magnitude. Also, the recovered count is lower than the infected count compared to its South American counterparts (figure 13c). A low quarantine efficiency coupled with a low recovery rate (figure 16d) leads Peru to be in the danger zone (C p > 1) for 48 days post detection of the 500 th case (figure 15c).

N Y N J M I P A F L G A C A M A T X IL M D O K U T A Z N E W A O H O R C O S

Our model captures the infected and recovered counts for highly affected countries in Europe, North America, Asia and South America reasonably well, and is thus globally applicable. Along with capturing the evolution of infected and recovered data, the novel machine learning aided epidemiological approach allows us to extract valuable information regarding the quarantine policies, the evolution of Covid spread parameter C p , the mean contact rate (social distancing effectiveness), and the recovery rate. Thus, it becomes possible to compare across different countries, with the model serving as an important diagnostic tool.

Our results show a generally strong correlation between strengthening of the quarantine controls, i.e. increasing Q(t) as learnt by the neural network model; actions taken by the regions' respective governments; and decrease of the Covid spread parameter C p for all continents considered in the present study.

Based on the Covid-19 data collected (details in the Materials and Methods section), we note that accurate and timely reporting of recovered data is seen to have a significant variation between countries; with under reporting of the recovered data being a common practice. In the North American countries, for example, the recovered data are significantly lower than its European and Asian counterparts. Thus, our results strongly indicate the need for each country to follow a particular metric for estimating the recovered count robustly, which is vital for data driven assessment of the pandemic spread. The key highlights of our model are: (a) it is highly interpretable with few free parameters rooted in an epidemiological model, (b) its reliance on only Covid-19 data and not on previous epidemics and (c) it is highly flexible and adaptable to different compartmental modelling assumptions. In particular, our method can be readily extended to more complex compartmental models including hospitalization rates, testing rate and distinction between symptomatic and asymptomatic individuals. Thus, the methodology presented in the study can be readily adapted to any province, state or country globally; making it a potentially useful tool for policy makers in event of future outbreaks or a relapse in the current one.

Finally, we have hosted our quarantine diagnosis results for the top 70 affected countries worldwide on a public platform (https://covid19ml.org/ or https://rajdandekar.github.io/COVID-QuarantineStrength/), which can be used for informed decision making by public health officials and researchers alike. We believe that such a publicly available global tool will be of significant value for researchers who want to study the correlation between the quarantine strength evolution in a particular region with a wide range of metrics spanning from mortality rate to socio-economic landscape impact of Covid-19 in that region. Currently, our model lacks forecasting abilities. In order to do robust forecasting based on prior data available, the model needs to further augmented through coupling to with real-time metrics parameterizing social distancing, e.g. the publicly available Apple mobility data. 45 This could be the subject of future studies.

The starting point t = 0 for each simulation was the day at which 500 infected cases were crossed, i.e. I 0 ≈ 500. The number of susceptible individuals was assumed to be equal to the population of the considered region. Also, in all simulations, the number of recovered individuals was initialized from data at t = 0 as defined above. The quarantined population T (t) is initialized to a small number T (t = 0) ≈ 10.

The time resolved data for the infected, I data and recovered, R data for each locale considered is obtained from the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University. The neural network-augmented SIR ODE system was trained by minimizing the mean square error loss function L NN (W, β, γ, δ) = log(I(t) + T (t)) − log(I data (t)) 2 + log(R(t)) − log(R data (t)) 2

that includes the neural network's weights W . For most of the regions under consideration, W, β, γ, δ were optimized by minimizing the loss function given in (13) . Minimization was employed using local adjoint sensitivity analysis 32, 46 following a similar procedure outlined in a recent study 30 with the ADAM optimizer 47 with a learning rate of 0.01. The iterations required for convergence varied based on the region considered and generally ranged from 40, 000 − 100, 000. For regions with a low recovered count: all US states and UK, we employed a two stage optimization procedure to find the optimal W, β, γ, δ. In the first stage, (13) was minimized. For the second stage, we fix the optimal γ, δ found in the first stage to optimize for the remaining parameters: W, β based on the loss function defined just on the infected count as L(W, β) = log(I(t) + T (t)) − log(I data (t)) 2 .

In the second stage, we don't include the recovered count R(t) in the loss function, since R(t) depends on γ, δ which have already been optimized in the first stage. By placing more emphasis on minimizing the infected count, such a two stage procedure leads to much more accurate model estimates; when the recovered data count is low. The iterations required for convergence in both stages varied based on the region considered and generally ranged from 30, 000 − 100, 000.

Preliminary versions of this work can be found at medRxiv 2020.04.03.20052084 and arXiv:2004.02752.

Data for the infected and recovered case count in all regions was obtained from the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University. All code files are available at https://github.com/RajDandekar/MIT-Global-COVID-Modelling-Project-1. All results are publicly hosted at https://covid19ml.org/ or https://rajdandekar.github.io/COVID-QuarantineStrength/.

A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-to-person transmission: a study of a family cluster

Coronavirus Disease 2019 (COVID-19) Situation Summary

Coronavirus disease 2019 (COVID-19) Situation Report -174

What china's coronavirus response can teach the rest of the world

Whose coronavirus strategy worked best? Scientists hunt most effective policies

First case of 2019 novel coronavirus in the united states

Hidden Outbreaks Spread Through U.S. Cities Far Earlier Than Americans Knew, Estimates Say

Coronavirus in Latin America: What governments are doing to stop the spread

An aggregated dataset of clinical outcomes for covid-19 patients

The effect of travel restrictions on the spread of the 2019 novel coronavirus

Forecasting covid-19 and analyzing the effect of government interventions

The effect of human mobility and control measures on the covid-19 epidemic in china

Impact of nonpharmaceutical interventions (npis) to reduce covid-19 mortality and healthcare demand

Novel coronavirus 2019-ncov: early estimation of epidemiological parameters and epidemic predictions

Estimation of the transmission risk of the 2019-nCov and its implication for public health interventions

Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia

Nowcasting and forecasting the potential domestic and international spread of the 2019-nCov outbreak originating in Wuhan, China: a modelling study

Early dynamics of transmission and control of covid-19: a mathematical modelling study. The Lancet Infectious Diseases

Modelling the sars epidemic by a lattice-based monte-carlo simulation

Extension and verification of the seir model on the 2009 influenza a (h1n1) pandemic in japan

Forecasting epidemics through nonparametric estimation of time-dependent transmission rates using the seir model

Approximations by superpositions of sigmoidal functions

Approximation capabilities of multilayer feedforward networks

Neural network with unbounded activation functions is universal approximator

Deep sparse rectifier neural networks

Maxout networks. 30th int. conf. mach. learn

Improving deep neural networks for LVCSR using rectified linear units and dropout

Workshop report on basic research needs for scientific machine learning: Core technologies for artificial intelligence

Universal Differential Equations for Scientific Machine Learning

Analysis of a spatially extended nonlinear seis epidemic model with distinct incidence for exposed and infectives

) Diffeqflux.jl -A Julia Library for Neural Differential Equations

Spain orders nationwide lockdown to battle coronavirus. The Guardian

Italy extends coronavirus lockdown to entire country, imposing restrictions on 60 million people

How did Britain get its coronavirus response so wrong? Guardian

Coronavirus: What are the lockdown measures across europe

What Switzerland did right in the battle against coronavirus. MarketWatch

2020) Coronavirus: How Turkey took control of covid-19 emergency

Sweden Has Become the World's Cautionary Tale

These states have some of the most drastic restrictions to combat the spread of coronavirus

A Guide to State Coronavirus Reopenings and Lockdowns

Coronavirus cases have dropped sharply in south korea. what's the secret to its success

What's Behind South Korea's COVID-19 Exceptionalism

Politics and poverty hinder Covid-19 response in Latin America

2020) Mobility trend report

Adjoint sensitivity analysis for differentialalgebraic equations: The adjoint dae system and its numerical solution

Adam: A method for stochastic optimization

This effort was partially funded by the Intelligence Advanced Reseach Projects Activity (IARPA.) We are grateful to Emma Wang for help with some of the simulations, and to Haluk Akay, Hyungseok Kim and Wujie Wang for helpful discussions and suggestions.

The authors declare no conflicts of interest.