key: cord-0743547-d0w8oqp8 authors: Banda, J. M.; Singh, G. V.; Alser, O.; PRIETO-ALHAMBRA, D. title: Long-term patient-reported symptoms of COVID-19: an analysis of social media data date: 2020-08-01 journal: nan DOI: 10.1101/2020.07.29.20164418 sha: 5cba3351cab974fd235a0d08848ff8008e15cac6 doc_id: 743547 cord_uid: d0w8oqp8 As the COVID-19 virus continues to infect people across the globe, there is little understanding of the long term implications for recovered patients. There have been reports of persistent symptoms after confirmed infections on patients even after three months of initial recovery. While some of these patients have documented follow-ups on clinical records, or participate in longitudinal surveys, these datasets are usually not publicly available or standardized to perform longitudinal analyses on them. Therefore, there is a need to use additional data sources for continued follow-up and identification of latent symptoms that might be underreported in other places. In this work we present a preliminary characterization of post-COVID-19 symptoms using social media data from Twitter. We use a combination of natural language processing and clinician reviews to identify long term self-reported symptoms on a set of Twitter users. As countries, such as the USA or Spain go through the perceived second wave of acute cases of COVID-19, those who suffered the condition in the preceding months are reporting concerning long-term symptoms. Acute organ injury during primo-infection includes acute kidney injury in 1 in 5 patients 1 , myocardial injury in 20%-30%, and acute respiratory failure. Long-term sequelae are therefore expected to include chronic kidney disease, heart failure and chronic obstructive pulmonary disease. As healthcare systems remain overwhelmed with the management of acute cases, little attention has been given to long-term COVID-19 sequelae. We mined and manually reviewed social media data from a selective set of Twitter feeds (#longcovid, #chroniccovid) to characterize patient reports of long-term COVID-19 symptoms. Using the largest publicly available COVID-19 Twitter chatter dataset 2 , we used precise hashtags (#longcovid and #chroniccovid) to select tweets relevant to discussions related to the post-COVID experiences of Twitter users. We looked at English language Twitter data from 2020-05-21 to 2020-07-10, allowing >60 days after the start of the pandemic. We discarded retweets that did not have user comments. Any tweets from accounts with unusually high tweeting activity (possible bots) or that only shared other tweets were removed. We then annotated tweets using the Social Media Mining Toolkit 3 , Spacy NER annotator and a dictionary created from the Observational Health Data Sciences and Informatics (OHDSI) vocabulary 4 , which allows the annotated terms to tie into clinical conditions and observations. We identified the final set of tweets with clinical concepts for review. Two clinicians (GS, OA) manually reviewed these tweets to identify patients with COVID-19 and their self-reported symptoms, and to attribute ICD-10 codes to them. A third clinician (DPA) reviewed all decisions and resolved disagreements. Number of symptoms per tweet and person, and frequency (%) of symptoms reported of the total were reported. A total of 7,781 unique tweets were identified from 4,607 users. After annotation, 2,603 tweets were manually reviewed, resulting in 150 eligible tweets from 107 users. A total of 192 reports including 34 distinct ICD-10 codes were identified ( Figure 1 ). The 10 most commonly mentioned symptoms were: malaise and fatigue (62%), dyspnea (19%), tachycardia/palpitations (13%), chest pain (13%), insomnia/sleep disorders (10%), cough (9%), headache (7%), and joint pain, fever, and unspecified pain by 6% each (Table 1) . . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 1, 2020. . Less common symptoms included ear-nose-throat (tinnitus, anosmia, chronic sinusitis, parageusia, aphonia), neuro-psychological (amnesia, neuralgia/neuropathy, disautonomia, visual disturbance, cognitive impairment, and disorientation), myalgia, and skin pruritus/rash. An average of 1.79 codes were reported per person, and 1.28 codes per tweet. Our analysis of patient-reported long-term COVID-19 symptoms matches clinician-collected data recently reported by Carfi et al 5 . Fatigue and dyspnea were the two most common symptoms in both datasets, and chest pain and cough were also reported in the top 5. Other, less commonly reported symptoms include headache, myalgia and dysgeusia/parageusia. Patients reported additional symptoms, including palpitations and insomnia/sleep disorders potentially attributable to neuro-psychological, cardiac or respiratory sequelae. Skin pruritus and rash are increasingly recognised as COVID-19 manifestations 6 . Additional, less recognized longterm symptoms reported in our data include persistent fever, tinnitus, anosmia, and neuropsychological distress. We have shown that researchers can leverage social media data, specifically Twitter, to conduct long-term post-COVID studies of patient-relevant and self-reported symptoms. By leveraging the #longcovid hashtag movement, we found similar reports to those obtained from clinical adjudication 5 , demonstrating the validity of our methodological framework for future research. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 1, 2020. . Manual validation identified additional, not previously reported symptoms that warrant further research, given their importance to patients. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted August 1, 2020. Presenting Characteristics, Comorbidities, and Outcomes Among 5700 Patients Hospitalized With COVID-19 in the New York City Area A large-scale COVID-19 Twitter chatter dataset for open scientific research --an international collaboration Social Media Mining Toolkit (SMMT) Characterizing treatment pathways at scale using the OHDSI network Gemelli Against COVID-19 Post-Acute Care Study Group. Persistent Symptoms in Patients After Acute COVID-19 Cutaneous manifestations in SARS-CoV-2 infection (COVID-19): a French experience and a systematic review of the literature