key: cord-1036738-mqprdpzs
authors: Obeid, Jihad S; Davis, Matthew; Turner, Matthew; Meystre, Stephane M; Heider, Paul M; Lenert, Leslie A
title: An AI approach to COVID-19 infection risk assessment in virtual visits: a case report
date: 2020-05-25
journal: J Am Med Inform Assoc
DOI: 10.1093/jamia/ocaa105
sha: 25b729a6acf745a9c1024fd123cec0806574bd8a
doc_id: 1036738
cord_uid: mqprdpzs

OBJECTIVE: In an effort to improve the efficiency of computer algorithms applied to screening for COVID-19 testing, we used natural language processing (NLP) and artificial intelligence (AI)-based methods with unstructured patient data collected through telehealth visits. METHODS: After segmenting and parsing documents, we conducted analysis of overrepresented words in patient symptoms. We then developed a word embedding-based convolutional neural network for predicting COVID-19 test results based on patients’ self-reported symptoms. RESULTS: Text analytics revealed that concepts such as “smell” and “taste” were more prevalent than expected in patients testing positive. As a result, screening algorithms were adapted to include these symptoms. The deep learning model yielded an AUC of 0.729 for predicting positive results and was subsequently applied to prioritize testing appointment scheduling. DISCUSSION: Informatics tools such as NLP and AI methods can have significant clinical impacts when applied to data streams early in the development of clinical systems for outbreak response.

The coronavirus disease 2019 is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), a virus in a family of highly pathogenic human coronaviruses. 1 This novel coronavirus is a particularly infectious strain resulting in a global pandemic that reached the U.S. early in the course of the outbreak. 2 One of the lynchpins of controlling the spread of COVID-19 is aggressive testing. 3 Testing for SARS-CoV-2 is resource-intensive as it involves the collection of a nasopharyngeal swab specimen under Biosafety Level 2 conditions and laboratory capacity for reverse transcription polymerase chain reaction (RT-PCR) assay of SARS-CoV-2 RNA. 4 As individual states in the U.S. ramp up testing facilities, prioritizing testing based on risk of exposure, clinical symptoms, and pre-existing risk factors 5 Telehealth providers then screen and prioritize patients for testing. The virtual care visits are captured through a telehealth system, which allows providers to screen patients and prioritize testing via a drive-through testing facility. The data are captured in the telehealth system (Zipnosis, Inc. Minneapolis, Minnesota), which includes patient-entered text information. As testing was a limited resource, even with computer screening, there were significant delays for patients in scheduling tests. The informatics research team at MUSC, as part of its outbreak response strategy, undertook the task of enhancing access to and use of the data in Zipnosis notes to prioritize and inform testing.

One of the main challenges of this task was that the information piped into the EHR is not in a structured format but rather in a text "blob" that contained information both from a templatebased patient-facing form and free-text data entered by the patient. The use of EHR to identify specific clinical phenotypes has gained significant momentum over recent years. [6] [7] [8] Characterizing patients based on EHR has several useful purposes, including, but not limited to, clinical decision support, [9] [10] [11] [12] population health studies [13] [14] [15] and identification of participants for research recruitment. 16, 17 As exemplified by the virtual care data feed at MUSC, a good portion of the information within the EHR resides in free-text format contained inside numerous types of clinical notes. 7, 9 In addition to well-established natural language processing (NLP) pipelines that have been developed for extracting information from unstructured data, 18-20 machine learningbased clinical text classification approaches have also been used to characterize patients using EHR data. [21] [22] [23] More recently, deep learning approaches such as convolutional neural networks (CNN) have been used in both predictive modeling in the clinical domain, 24 as well as for phenotyping efforts through clinical text classification. 25 In this case report we describe the application of text analysis and deep learning methods to improve our testing algorithms.

The virtual urgent care program for COVID-19 was established by MUSC Health based on the Centers for Disease Control and Prevention (CDC) guidelines 26 to screen and evaluate presumptive cases in our region. To minimize exposure and lessen the risk of nosocomial infections, patients are advised to visit MUSC Health virtual urgent care for screening and medical advice from trained MUSC Health care providers via a secure online telehealth virtual care system by Zipnosis, Inc.. Referral for testing for patients at high risk or those who need inpatient care is determined based on the consultation with the providers. The data from the virtual care system are fed into our EHR system (Epic Systems Corporation, Verona, Wisconsin) via a proprietary API (HL7 V2.x.) Data were subsequently extracted from Epic Clarity and moved to a cloud-based "data lake" analytics infrastructure in Azure (Microsoft, Redmond, Washington).

We included patients with virtual care visits with COVID-19 listed as the reason for the visit.

Patients without test results 14 days following the visit were excluded. For patients with multiple test results, only the final result was considered. The total number of patients included in our analysis was 6,813, 498 of whom tested positive, and 6,315 of whom tested negative.

The telehealth system notes were pre-processed using a simple Apache UIMA-based NLP application. 27 A pattern matching-based algorithm split the notes into sections and labeled these sections to enable filtering out boilerplate information and instructions from the Zipnosis template while focusing on relevant sections. Examples of such header-demarcated sections included a "Patient Summary:" section where symptoms were reported by the patient and a section labeled "Pertinent COVID-19 information" where travel information was reported.

Simple pattern matching was also used for limited dataset de-identification, replacing patient names, phone numbers and addresses with generic tokens in order to protect patient privacy.

Diagnosis codes that were demarcated by the template were extracted and appended to the end of the clinical note. Stop words were removed prior to tokenization.

As part of the analysis prior to machine learning, we examined differences in word frequencies across clinical notes with positive test results as compared to notes from those with known negative results. We performed a chi-square analysis to assess words that are overrepresented across these corpora of text. 28 This analysis provided insight into key words associated with positive COVID-19 tests results.

We used Keras 29 and TensorFlow version 2.0 30 for constructing and training the CNN model. To construct the features for the deep learning models, the text sequences were tokenized and padded with zeros at the end of sequences to match the length of the longest string in the training set. The input layer had a dimension size of 628, slightly exceeding the maximum length of the input sequences of tokens. We used word2vec 31 for the word-embedding layer. The embedding weights were initialized with 200 dimension word vectors from a word2vec model pre-trained on a PubMed corpus. 32 The embedding layer had a drop rate of 0.3. This was followed by a convolutional layer with multiple filter sizes (3, 4, and 5) in parallel, with 100 filters in each, ReLU activation, a stride of one, and global max-pooling, which was followed by a merge tensor then a fully connected 512-node hidden layer with ReLU activation and a drop rate of 0.3.

Finally, the output layer had a single binary node with a sigmoid activation function. Several hyperparameter configurations were tried, for example: randomly initialized with uniform distributions with dimension 50, 100, or 200 dimensions in the embedding layer; 50, 80, 100, or 200 filters in the convolutional layer; kernel sizes (2, 3, 4) , (3, 4, 5) , or (4, 5, 6) ; and a variety of learning rates and learning rate reduction factors. These were all tracked with MLflow 33 and the model with best performance on the hold out set was selected. The final learning rate used was 4x10 -4 with a reduction factor of 0.5 on performance plateau.

The data were partitioned into three sets based on random sampling of patients into a training set (60%), cross-validation set (16%), and hold-out test set (24%). There was no overlap of patients across the three partitions. The cross-validation set was used for the validation during training epochs. The test set was only used after model fitting to assess performance. A logistic regression model using a bag-of-words count-based vectors as features was used as a comparator. The performance was evaluated using the area under the receiver operating characteristic curve (AUC). To assess the precision, recall, and F1-score, we down sampled the test set to balance the notes to an equal number of positives and negatives. We did a 100 different cycles of random selections of 120 cases in each class to calculate the mean AUC, precision, recall and F1-score. We used a probability threshold of 0.2 to optimize for the F1score. We later examined the output of the model on all patients with virtual care visits with COVID-19 listed as the reason for the visit to assess the discriminant power of the model across three risk categories based on the predicted probability (low if p <= 0.2, medium if p is between 0.2-0.9, high if p >= 0.9).

The purpose of this project was intended to improve the screening process of our virtual care visit program at MUSC for COVID-19 testing and did not involve a systematic investigation or experimental procedure. Therefore, the project was determined to be quality improvement and was not subject to Institutional Review Board for Human Research approval based on the definition of research pursuant to the Common Rule (45 CFR 46.102(d)). 34, 35 

The results from the analysis of overrepresented keywords in clinical notes with positive test results as compared to notes from those with negative test results are shown in Figure 1 . All results are highly statistically significant. For example, the words "smell", "taste", "sense", and "lost" are mentioned at a much higher frequency (p < 0.0001) by patients who tested positive for SARS-CoV-2, vs. those who did not. The overall rate of positive tests of patients seen via virtual care was 5.6%. In discussions with the telehealth providers, we decided to optimize risk groups into three categories with the selected cutoffs at 0.2 and 0.9 respectively, which puts a low risk group at less than 3% positive test rates and a high risk group at around 60% positive test rate resulting in a reasonable follow up rate of around a few dozen calls per day. Even though the accuracy of the model was only acceptable, it was still useful in discriminating patients into these risk categories ( Table 2) .

Looking across all patients with virtual care visits who were tested, we were able to identify a high risk group that was potentially useful in prioritizing tests. The text analytics highlighted important symptoms that had not been captured by the screening form-namely, lack of smell and taste in affected patients. Anosmia and the alteration of the sense of taste have been reported by mildly symptomatic patients with SARS-CoV-2 infection and are often the first noted symptoms. 40 In our hands, the presence of these symptoms as reported by the patients themselves turned out to be the most sensitive predictor of positive testing results. Other words relevant to COVID-19 signs and symptoms (e.g. "temperature", "fever", "cough", and words related to dyspnea) were not as prominent as we expected, likely due to the fact that such symptoms were captured through the semi-structured template, which could have masked overrepresentation. This finding, along with other published literature, resulted in the alteration of the online screening form to specifically include questions about smell and taste just ahead to the updated CDC guidelines on the "Symptoms of Coronavirus", which includes these specific symptoms. 41 This finding demonstrates the value of a data-driven approach for the identification of relevant symptoms in novel infections such as the one at the root of this rapidly evolving pandemic.

Fortunately, the number of positive SARS-CoV-2 test results was low at our institution. As a result, the sample size for training a deep learning model such as the CNN described herein is suboptimal. More data are needed to refine the model and provide better risk stratification. The complete clinical picture should be considered in testing decisions, including the severity of symptoms and history of underlying chronic diseases. 5, 42 Patients with pre-existing or comorbid conditions are at higher risk of mortality 42 and may need to be prioritized for clinical reasons, even if the risk of a positive test is low.

Future work will include more advanced NLP extraction including local context analysis to identify negated terms (e.g., "denies fever") and terms referring to individuals other than the patient (e.g., "spouse has a fever"), term normalization to standard terminologies, and algorithms that generalize to a variety of clinical text notes. Moreover, expanding training sets and developing predictive models that include pre-existing risk factors will provide a more comprehensive tool that informs the decisions of our telehealth providers.

This case report describes our rapid use of AI methods to improve the efficiency of COVID-19

testing. The results from our text analysis identified symptoms that informed the electronic triage process prior to wide publication of these associations and also revealed how AI methods could be used prioritize patients screening positive for testing.

The work presented in this case report received no specific grant from any funding agency in the public, commercial or not-for-profit sectors.

Coronavirus Infections-More Than Just the Common Cold

The COVID-19 Pandemic in the US: A Clinical Update

From Containment to Mitigation of COVID-19 in the US

Clinical Characteristics of 138 Hospitalized Patients With 2019 Novel Coronavirus-Infected Pneumonia in Wuhan, China

Risk Factors Associated With Acute Respiratory Distress Syndrome and Death in Patients With Coronavirus Disease

EHR Big Data Deep Phenotyping. Contribution of the IMIA Genomic Medicine Working Group

A review of approaches to identifying patient phenotype cohorts using electronic health records

Clinical phenotyping in selected national networks: demonstrating the need for high-throughput, portable, and computational methods

Extracting information from textual documents in the electronic health record: a review of recent research

The emerging role of electronic medical records in pharmacogenomics

Mining electronic health records: towards better research applications and clinical care

Deep Patient: An Unsupervised Representation to Predict the Future of Patients from the Electronic Health Records

Validation of the Harvard Cancer Risk Index: a prediction tool for individual cancer risk

Use of International Classification of Diseases, Ninth Revision, Clinical Modification codes and medication use data to identify nosocomial Clostridium difficile infection

An efficient approach for surveillance of childhood diabetes by type derived from electronic health record data: the SEARCH for Diabetes

A survey of practices for the use of electronic health records to support research recruitment

Electronic health records to facilitate clinical research

An overview of MetaMap: historical perspective and recent advances

Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications

CLAMP -a toolkit for efficiently building customized clinical natural language processing pipelines

Comparison of machine learning classifiers for influenza detection from emergency department free-text reports

Improving sensitivity of machine learning methods for automated case identification from free-text electronic medical records

Identifying child abuse through text mining and machine learning

Scalable and accurate deep learning with electronic health records

Automated detection of altered mental status in emergency department clinical notes: a deep learning approach

Centers for Disease Control and Prevention

UIMA: an architectural approach to unstructured information processing in the corporate research environment

Keyness: Words, parts-of-speech and semantic categories in the character-talk of Shakespeare's Romeo and Juliet

TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems

Efficient Estimation of Word Representations in Vector Space

Deep Relevance Ranking Using Enhanced Document-Query Interactions. ArXiv180901682 Cs

MLflow -A platform for the machine learning lifecycle

Department of Health & Human Services. 45 CFR 46. HHS.gov

Rapid Response to COVID-19: Health Informatics Support for Outbreak Management in an Academic Health System

Use of selfadministered surveys through QR code and same center telemedicine in a walk-in clinic in the era of COVID-19

Rapid Design and Implementation of an Integrated Patient Self-Triage and Self-Scheduling Tool for COVID-19

Electronic Personal Protective Equipment: A Strategy to Protect Emergency Department Providers in the Age of

Alterations in Smell or Taste in Mildly Symptomatic Outpatients With SARS-CoV-2 Infection

CDC. Coronavirus Disease 2019 (COVID-19) -Symptoms. Centers for Disease Control and Prevention

Preliminary Estimates of the Prevalence of Selected Underlying Health Conditions Among Patients with Coronavirus Disease 2019 -United States

We would like to thank Rachel McNeely and Grace Neil for their help with programming and data cleaning and Jean Craig, Katie Kirchoff, and Ekaterina Pekar for their help with data extraction.

The authors have no competing interests to declare.

All authors provided substantial input into the conception and design of this work, participated in drafting and revising it critically, and provided final approval of the version to be published.