key: cord-228736-x1w5pi67 authors: Suryanarayanan, Parthasarathy; Tsou, Ching-Huei; Poddar, Ananya; Mahajan, Diwakar; Dandala, Bharath; Madan, Piyush; Agrawal, Anshul; Wachira, Charles; Samuel, Osebe Mogaka; Bar-Shira, Osnat; Kipchirchir, Clifton; Okwako, Sharon; Ogallo, William; Otieno, Fred; Nyota, Timothy; Matu, Fiona; Barros, Vesna Resende; Shats, Daniel; Kagan, Oren; Remy, Sekou; Bent, Oliver; Mahatma, Shilpa; Walcott-Bryant, Aisha; Pathak, Divya; Rosen-Zvi, Michal title: WNTRAC: Artificial Intelligence Assisted Tracking of Non-pharmaceutical Interventions Implemented Worldwide for COVID-19 date: 2020-09-02 journal: nan DOI: nan sha: doc_id: 228736 cord_uid: x1w5pi67 The Coronavirus disease 2019 (COVID-19) global pandemic has transformed almost every facet of human society throughout the world. Against an emerging, highly transmissible disease with no definitive treatment or vaccine, governments worldwide have implemented non-pharmaceutical intervention (NPI) to slow the spread of the virus. Examples of such interventions include community actions (e.g. school closures, restrictions on mass gatherings), individual actions (e.g. mask wearing, self-quarantine), and environmental actions (e.g. public facility cleaning). We present the Worldwide Non-pharmaceutical Interventions Tracker for COVID-19 (WNTRAC), a comprehensive dataset consisting of over 6,000 NPIs implemented worldwide since the start of the pandemic. WNTRAC covers NPIs implemented across 261 countries and territories, and classifies NPI measures into a taxonomy of sixteen NPI types. NPI measures are automatically extracted daily from Wikipedia articles using natural language processing techniques and manually validated to ensure accuracy and veracity. We hope that the dataset is valuable for policymakers, public health leaders, and researchers in modeling and analysis efforts for controlling the spread of COVID-19. The Coronavirus disease 2019 (COVID-19) pandemic has made an unprecedented impact on almost every facet of human civilization from healthcare systems, to economies and governments worldwide. As of August 2020, every country in the world has been affected, with more than 24M confirmed cases of infection and death toll approaching a million cases worldwide [1] [2] [3] . The pandemic has triggered a wide range of non-pharmaceutical intervention (NPI) responses across the world. With therapeutic and preventive interventions still in early stages of development, every country has resorted to NPI as a primary strategy 4, 5 for disease control. Examples of such interventions include community actions (e.g. school closures, restrictions on mass gatherings), individual actions (e.g. mask wearing, self-quarantine), and environmental actions (e.g. public facility cleaning). Such NPIs vary significantly in their implementation based on the maturity of the health infrastructure, robustness of the economy and cultural values unique to the region. Public health policy makers worldwide are striving to introduce successful intervention plans to manage the spread of disease while balancing the socio-economic impacts 6, 7 . These initiatives will benefit from modeling the efficacy of different intervention strategies. The pandemic has sparked an ongoing surge of discovery and information sharing resulting in an unprecedented amount of data being published online 8 . This includes information about NPI measures, which are available in a wide variety of unstructured data sources, including official government websites 9, 10 , press releases, social media, and news articles. However such modeling requires the information about the NPIs to be available in a structured form. To address this urgent need, several data collection initiatives have emerged in the recent months resulting in several publicly available datasets with varying degrees of coverage, data freshness, and sparsity. For example, the CoronaNet dataset 11 contains the monadic and dyadic data on policy actions taken by governments across the world, manually curated by over 500 researchers covering sixteen NPI types and is kept fairly up-to-date. The Complexity Science Hub, Vienna enlisted researchers, students and volunteers to curate the Complexity Science Hub COVID-19 Control Strategies List 12 dataset, of eight different NPI types but covering only 57 countries. Similarly, the Oxford COVID-19 Government Response Tracker 13 dataset, takes a crowd-sourcing approach and covers 17 NPI types, 186 regions, 52 US states and territories. Because all these datasets are assembled manually, each of them is constrained in one or more respects: geographical scope, taxonomic richness, frequency of updates or granularity of details, and evidential sources. An AI-assisted, semi-automated data collection approach, driven by a rich, extensible taxonomy, can help overcome these issues and may result in a larger, frequently updated dataset with less manual labor. Wikipedia is one of the main sources of accessible information on the Internet. Since the start of COVID-19, a dedicated global network of volunteers has been creating, updating, and translating Wikipedia articles with vital information about the pandemic 14 . Over 5,000 new Wikipedia pages on COVID-19 have been written by more than 71,000 volunteers since the onset of the pandemic accumulating more than 440M page views by June 2020. Wikipedia articles, even though crowd-sourced, through the process of collective validation 15 and by citations of credible sources such as government websites, scientific literature, and news articles can serve as a reliable source of NPI data. Further, these Wikipedia articles are constantly updated; have been edited more than 793,000 times as of August 2020 making it both a rich and up-to-date source. Based on this, we postulated that an approach based on automated information extraction from Wikipedia, followed by human validation to ensure accuracy and veracity, would result in a frequently updated dataset with a wider coverage compared to any of the existing datasets. We present the result of our work, WNTRAC, a comprehensive dataset consisting of over 6,000 NPIs implemented worldwide since the start of the pandemic. WNTRAC covers NPIs implemented across 261 countries and territories, and classifies NPI measures into a taxonomy of sixteen NPI categories. NPI measures are automatically extracted daily from Wikipedia articles using natural language processing (NLP) techniques and manually validated to ensure accuracy and veracity. In what follows, we explain the methods used to create the dataset, outline the challenges and key design choices, describe the format, provide an assessment of its quality and lay out our vision of how this dataset can be used by policy makers, public health leaders, and data scientists and researchers to support modeling and analysis efforts. We built a semi-automated system to construct the dataset and keep it current. The NPI measures are modeled as events and evidences for information extraction purposes. This is illustrated by a motivating example shown in the Figure 2 . Each event corresponds to an imposition or lifting of a particular NPI. An event is defined to be a 5-tuple (what, value, where, when, restriction), where 1. What: the type of NPI that was imposed or lifted. NPIs are grouped into sixteen major types. In the example, the type is school closure. 2. Value: sub-category or attribute that further qualifies the NPI type more specifically. In the example, the associated value is all schools closed. A detailed description of each type and the corresponding possible values is shown in Table 1 . 3. Where: the region (country, territory, province, or state) in which the NPI measure has been implemented or withdrawn. In this example, there are three distinct regions, namely, Punjab, Chhattisgarh, Manipur that are identified and three separate events will be extracted. The date from which the NPI was imposed or lifted. In the example, the date will be 13 March, corresponding to the implementation of the NPI, even if a likely date for the cancellation of the NPI, 31 March, is indicated. 5. Restriction: a flag indicating that the event corresponds to the introduction or withdrawal of the NPI. It should be noted that the lifting of the NPI is treated as a separate event. In the example, the restriction type is imposed. In addition to the mandatory fields described above, event contains one or more evidences. An evidence is a span of text extracted from Wikipedia that discusses a particular event. In the example, On 13 March, the Punjab, Chhattisgarh, and Manipur governments declared holidays in all schools and colleges till 31 March. is the evidence. An evidence may support more than one event. Each evidence is accompanied by a source type indicating the type of source of Wikipedia citation. More details about such additional attributes can be found in the data records section. The system, shown in the Figure 3 , is designed to be scalable for continuous gathering, extraction and validation of NPI events. It consists of two subsystems: a data processing pipeline for capturing and extracting potential NPI events from Wikipedia articles and a tool, WNTRAC Curator for human validation of NPI events automatically extracted using the aforementioned pipeline. In the next section, we describe the system and its components at a high level, focusing on key design choices that have a bearing on the quality of the dataset, starting with a brief description of the data collection. The first step in the data processing is to retrieve the aforementioned list of Wikipedia articles on a periodic basis. The crawler module implements this functionality. It uses the MediaWiki API 17 for downloading the articles. As part of this step, we extract the text content of each article, while at the same time preserving all the associated citations. This process produces a document for each article. Each sentence in a document is a candidate for NPI extraction. As of August 2020, the aggregate crawled data contains over 55,000 sentences, with an average of 213 sentences per document. The second step in the pipeline is the extraction of the NPI events from a document. It is broken into a sequence of steps described below. • Pre-processing: As the first step in processing a document, we use sentence boundary detection algorithms from libraries such as spaCy 18 , to identify where sentences begin and end. Although the sentences are used as logical units to extract NPI events, we preserved the order in which they appear in the source document for reasons detailed below. Also, at this step, we extract and retain the citation URL, if available for each sentence. • Sentence classification: Next, we classify the sentence into one of the NPI types such as school closure to identify potential NPI events. If no NPI is discussed in the sentence, we classify it as discarded. We use multiple learning algorithms, including logistic regression, Support Vector Machines, and Bidirectional Encoder Representations from Transformers (BERT) 19 , and employ an ensemble method to obtain better overall predictive performance. A small subset of the data (1490 sentences), was manually annotated to train the models. Independently, we also categorize the sentence as implying either the introduction or the withdrawal of an NPI (restriction). • Named entity recognition and named entity disambiguation: After we identify the potential events in the previous step, we extract specific constituent entities for each candidate event from the sentence. We used state-of-the-art named-entity recognizers (such as spaCy 18 ) and normalizers to detect and normalize locations (Where : [Punjab, Chattisgarh, Manipal ]) and time expressions (When : March 13). In addition, we also link the location entities of type 'GPE' in the Wikipedia article title to the corresponding ISO codes 20, 21 . Even though we use the sentence as a logical unit for the extraction of an NPI event, the sentence itself may not include all the relevant information. For example, date or location may be available in sentences in the vicinity or in the header of the paragraph to which the sentence belongs. To address this key challenge, we developed a heuristic-based relation detection algorithm to associate one of the extracted dates or locations from the current document to each sentence. • Value extraction: The last step in NPI event extraction, is determining the associated value. We use multiple rule-based algorithms that either operate independently or depend on information extracted by the previous steps. For example, given the sentence "On 13 March, it was announced at an official press conference that a four-week ban on public gatherings of more than 100.", the event type is mass gathering and the associated value is maximum number of people in social-gathering allowed by the government. The value extraction is performed using parse-based rule engines 18 . It is worth noting that the value extraction components should know the actual type mass gatherings before extracting the correct value "100". Similarly, given a sentence "On 1 April, the Government of USA suspended flights from New York to Texas", the event type is domestic flight restriction and the associated value is name of the state where the passenger is arriving from. To correctly extract the value, the value extraction needs to know the correct type and normalized locations ("New York") respectively. Thus, using the above procedure, we extract the unique 5-tuples that are the candidate NPI events. Once extracted, they are presented to the volunteers for validation to ensure data quality. This process is repeated every day. In order to minimize manual labor, considering the small number of volunteers, we attempt to detect changes since the last time we crawled Wikipedia. We use a combination of syntactic similarity metrics such as Levenshtein Norm, and semantic similarity metrics such as event attribute matching to perform this daily change detection for each extracted document. The events automatically extracted from the pipeline are vetted by volunteers using the WNTRAC Curator validation tool. The tool is a simple web-application backed by a database as shown in Figure 3 . The tool is shown in Figure 4 . At the top, it displays the complete Wikipedia document extracted by the processing pipeline. Below the document, each candidate event is shown to the volunteer in separate cards. The volunteer can adjudge the candidate event to be a brand new NPI event or an evidence to an existing event or discard the candidate. They can also correct any of the attributes associated with the event extracted by the pipeline. In addition to the key fields discussed earlier, the dataset also contains a few additional attributes for each event. A complete listing of all fields across event and evidence is shown in Table 2 , along with an example for each field. Each version of the dataset consists of two CSV files named ibm-wntrac-yyyy-mm-dd-events.csv and ibm-wntrac-yyyy-mm-dd-evidences.csv, corresponding to events and evidences respectively. The dataset is available in our GitHub repository 22 for download. The dataset is regularly updated. At the time of the submission, the dataset is updated as of Sep 2 nd , 2020. Historical versions of the dataset are made available in the same GitHub repository. Further, a static copy of the dataset containing NPIs recorded as of 8 th July 2020, used for the analysis in the paper has been archived in figshare 23 . In the next section, we include some high-level dataset statistics to provide a sense of the distribution of the data. Figure 5 shows the distribution of the NPI measures imposed worldwide. Entertainment / cultural sector closure, confinement and school closure are the predominant NPIs taken by governments. Figure 6 summarizes the overall total number of regions that implemented NPIs of each type. As shown in the graph confinement, school closure and freedom of movement are the most common NPIs imposed worldwide, as expected from Figure 5 . Figure 7 shows the breakdown of the NPIs within each region, for the top twenty regions that have implemented the highest number of NPIs measures. Figure 8 presents an interactive data browser 26 that uses a chart, map, and histogram to provide a descriptive analyses of NPIs and COVID-19 outcomes such as confirmed cases and deaths. The browser has a control panel used to filter the data being visualized (e.g cases vs deaths), as well as how it is visualized (e.g. linear vs log scale). A play slider can be used to view the temporal evolution of NPIs and COVID-19 outcomes in a given region. The chart illustrates the time points in which a geographical region imposes or lifts an NPI along with the temporal trends of COVID-19 outcomes. The different types of NPIs are illustrated using specific icons that are described in a legend. Groups of interventions are noted with the star icon. The number of countries/territories and the number of NPIs shown in the chart can be adjusted in the settings. The user can select a specific line on the chart referring to a territory to focus on the NPIs imposed and lifted in that location. The histogram below the chart shows the number of territories that have imposed the different types of NPIs and can be selected to see the territories on the map that have imposed the selected subset of NPIs. The map illustrates the proportion of NPI categories (out of the 15 NPI categories in the dataset) implemented in each region using a gray-colored bar. Furthermore, when a region is selected, the gray-colored bar in any other region illustrates the proportion of NPI categories in the other region as a proportion of NPI categories implemented in the selected region. The map is also used to visualize the geographic distribution of the selected COVID-19 outcome using choropleth, spikes, or bubbles. The user can interact with the territories on the map to focus on a location and view the data on the chart. Note that for some countries such as the United States, the map can be zoomed to reveal finer-grained data for sub-regions such as states. The validation team consisted of a mix of experts who participated in the design of the taxonomy and/or the pipeline and IBM volunteers who completed a brief training session about the annotation schema and tool. Validation was done in two stages. In the first phase, because the WNTRAC tool was still being developed, we used simple CSV files to distribute the data for validation. Each annotator was given a complete document corresponding to a Wikipedia article for a particular region, retrieved as on June 6, 2020, pre-annotated with the output of the pipeline. Each sentence was displayed in a separate line with sentences corresponding to candidate events highlighted with a different background color. The attributes extracted by the pipeline were listed next to each sentence. Annotators were asked to verify and correct each of these attributes. If a sentence does not discuss any of the valid event types, they were asked to mark the type as discarded. If a sentence was incorrectly discarded by the pipeline, they were asked to correct the type and fill in the attributes when possible. This was, however, not uniformly enforced. In the second phase, we made WNTRAC Curator tool available to the annotators. The tool randomly assigns a single document to be validated to each annotator. Each document, consists of incremental changes to the underlying Wikipedia article since the last validation of the document. The validation process for the second phase is similar to the first phase except that only candidate events, as determined by the pipeline were shown to the annotators. This time-saving move was based on the observation during the first phase, when all sentences were presented, human annotators generally agreed with the automated pipeline on discarded sentences. The NLP model used a recall-oriented threshold and only discarded sentences with low scores on all valid NPI types. Table 3 . Inter-annotator agreement between average volunteers (A) and two groups of experienced volunteers (E 1 and E 2 ). Region includes both country and state/territories as applicable. To determine the quality of the dataset post validation, inter-annotator agreement (IAA) was calculated on a subset, randomly sampled (2%), from the full set that was validated by IBM volunteers. Each instance in the subset was further double annotated by two experts (randomly selected from a pool of six experts) independently, resulting in three sets of annotations per instance. The IAA was evaluated on all five fields of the 5-tuple that uniquely defines an event. Furthermore, the evaluation was performed at a field level for all fields except the value, which is technically a sub-field of type and it does not make sense to be analyzed on its own. The IAA results are shown in Table 3 . Note that the IAA between experts were consistently high in all categories, indicating that the annotation schema is not ambiguous and most sentences can be consistently assigned to one of the NPI type defined in the taxonomy. The IAA between the volunteers and experts were also good (0.58) at the NPI type level and the agreement is high (0.81) in the five most frequent NPI types. We plan to expand the taxonomy over time to cover more NPI types. We also plan to improve the accuracy of the pipeline by using end-to-end entity linking techniques for entity normalization and state-of-the-art methods for better temporal alignment. We plan to expand to other data sources to improve coverage. One of the primary objectives in creating the WNTRAC dataset was to understand what types of NPIs are being implemented worldwide and to facilitate analysis of the efficacy of the different types of NPIs. Specifically, the data supports a variety of studies, such as correlation analysis to understand the relationship between NPIs and outcomes, causal effect and impact analysis of NPIs on regions, incorporation of NPIs into predictive and epidemiological models, and optimal intervention planning. As an example, consider the question: What NPIs were implemented by different countries to contain the spread of COVID-19?. This could be answered with the Figure 9a which visualizes the elapsed time between the implementation of a travel-related NPIs and the recording of at least 50 cases or at least one death. Travel-related NPIs include domestic flight restrictions, international flight restrictions, freedom of movement (nationality dependent), and introduction of travel quarantine policies. The visualization shows 9 selected regions each of which had at least one travel-related NPI among the first set of NPIs imposed in the country, and was generated by combining WNTRAC dataset with COVID-19 outcomes dataset from the World Health Organization (WHO) 2 . For each region, the blue bar plot illustrates the number of days before 50 cumulative cases, and the red dot plot shows the number of days before the first death. From the graph, it can be observed that Singapore first imposed a travel-related NPI more than 50 days before the first death, and a potentially timelier response than Brazil and New York State where the first travel related NPI were imposed about 10 days after the first death. Similarly, Figure 9b visualizes the elapsed time between the implementation of community-related NPIs and the recording of at least 50 cases or at least one death for 9 selected regions. The community-related NPIs include entertainment/cultural sector closure, confinement, school closure, mass gatherings, mask wearing, public services closure, public transportation, work restrictions, and state of emergency. It can be noted that at least one community-related NPI was implemented for each of the selected regions prior to their first recorded death due to COVID-19. As a second example, we demonstrate how the WNTRAC dataset can be used to visualize the relationship between NPIs and COVID-19 outcomes over time. Figure 10 illustrates this using data from select countries (Israel, the United Kingdom, and South Africa) and US states (Louisiana, New York, and Texas). In the figure, the blue line shows the trend for the exponentially weighted moving average of new cases per day. The red line is the proportion of the NPI (out of thirteen NPI types which exclude, economic impact, contact tracing and changes in prison-related policies) that a region has imposed at a given time. We call this the NPI Index. The graph shows that as the rate of new cases rises, more NPI measures tend to get introduced. Interestingly, the chart for Israel suggests that the lowering of the NPI Index (corresponding to the lifting of certain NPI measures) might be associated with the second wave of new cases. Another important application of this dataset is optimal intervention planning, specifically to provide critical, time-sensitive decision support to the COVID-19 task force teams as they seek to decide which NPIs can be implemented or lifted over time. Efficiency in this decision-making process is important as the complexity of all potential variations of NPIs that can be imposed within a particular region is overwhelming and NPIs have varying degrees of impact on outcomes for that region. Tools 27 that enable what-if analysis and intervention planning at both national and sub-national levels, can be leveraged to meet this need. For such tools to be useful, epidemiological models need to be calibrated in such a way that the resulting forecasts can be trusted as accurate projections of the future. To calibrate these models, it is critical to consider the NPI that have already been implemented so that the drivers of the disease spread can be contextualized for a region. NPI can be used to estimate values for 10/15 In addition to the above examples, the WNTRAC dataset can be used to support other objectives, including estimating the relationships between NPIs and • consumers behavior by, for example, correlating between retail data and NPIs. • environmental changes such as pollution levels. • actual compliance by the population. Naturally, not all the interventions recorded in the dataset are an accurate representation of reality as some of the interventions capture a governmental request that might not be followed by the entire population. Thus, it might be useful to integrate the WNTRAC dataset with other publicly available data sources that can provide information regarding the level of compliance with an intervention, such as mobility information 28, 29 and social media forums. Lastly, one other interesting use case is to estimate the economic impact of NPIs by, for example, relating unemployment rates and jurisdictional debt with NPIs. Estimation of the effect of NPIs on non-COVID-19 health problems, such as late cancer detection due to missed screening tests, will also be useful. The source code for the WNTRAC automated NPI curation system, including the data processing pipeline, WNTRAC Curator tool and NPI data browser is available upon request. Table A1 . List of regions currently supported by the WNTRAC dataset. COVID-19 Dashboard World Health Organization. Coronavirus disease (COVID-19) Weekly Epidemiological Update and Weekly Operational Update COVID-19 Coronavirus Pandemic COVID-19 Healthcare Coalition. Real-time tracking of statewide NPI implementations Impact of non-pharmaceutical interventions (npis) to reduce covid19 mortality and healthcare demand The cost of the covid-19 crisis: Lockdowns, macroeconomic expectations, and consumer spending India under covid-19 lockdown Cord-19: The covid-19 open research dataset Global Dashboard on COVID-19 Government Policies Council of State Governments. COVID-19 Resources for State Leaders Covid-19 government response event dataset (coronanet v.1.0) A structured open dataset of government interventions in response to covid-19 Oxford covid-19 government response tracker Wikimedia Foundation. Wikipedia and COVID-19 Aggregated trustworthiness: Redefining online credibility through social validation Help:Category An improved non-monotonic transition system for dependency parsing Pre-training of deep bidirectional transformers for language understanding Wikipedia contributors. ISO 3166-1 Wikipedia contributors. ISO 3166-2 IBM. WNTRAC data repository Worldwide non-pharmaceutical interventions tracker for covid-19 (wntrac) A platform for disease intervention planning Covid-19 community mobility reports We thank IBM Research volunteers for validation and maintenance of the WNTRAC dataset. IBM Research Haifa team identified the need for the dataset, defined the taxonomy of NPIs based on requirements for epidemiological modeling and developed the validation guidelines for volunteers. IBM Research Yorktown Heights team developed NLP for NPI extraction, developed the semi-automated system to construct the dataset and keep it current and built the WNTRAC Curator tool. IBM Research Nairobi team designed and implemented graphical user interface for the NPI data browser for end users to browse, query and visualize the dataset and the associated descriptive statistics. Senior authors Michal Rosen-Zvi, Divya Pathak and Aisha Walcott-Bryant lead the respective teams. The authors declare no competing interests. Table A2 . List of US states and territories currently supported by the WNTRAC dataset.