key: cord-0196209-qwno39qz authors: Storopoli, Jose; Santos, Andre Luis Marques Ferreira dos; Pellini, Alessandra Cristina Guedes; Baldwin, Breck title: Simulation-Driven COVID-19 Epidemiological Modeling with Social Media date: 2021-06-22 journal: nan DOI: nan sha: 02a1367da5b29b4c235b0dd5c6382cc71a9b8b46 doc_id: 196209 cord_uid: qwno39qz Modern Bayesian approaches and workflows emphasize in how simulation is important in the context of model developing. Simulation can help researchers understand how the model behaves in a controlled setting and can be used to stress the model in different ways before it is exposed to any real data. This improved understanding could be beneficial in epidemiological models, specially when dealing with COVID-19. Unfortunately, few researchers perform any simulations. We present a simulation algorithm that implements a simple agent-based model for disease transmission that works with a standard compartment epidemiological model for COVID-19. Our algorithm can be applied in different parameterizations to reflect several plausible epidemic scenarios. Additionally, we also model how social media information in the form of daily symptom mentions can be incorporate into COVID-19 epidemiological models. We test our social media COVID-19 model with two experiments. The first using simulated data from our agent-based simulation algorithm and the second with real data using a machine learning tweet classifier to identify tweets that mention symptoms from noise. Our results shows how a COVID-19 model can be (1) used to incorporate social media data and (2) assessed and evaluated with simulated and real data. Modern approaches to Bayesian modeling emphasize the importance of developing a model before exposing it to actual data but few researchers actually bother doing it, e.g., Stringhini et al. [2020] , Zhang et al. [2020] , Roda et al. [2020] , for the REMAP-CAP Investigators [2020] , Kontis et al. [2020] , Niehus et al. [2020] did not report any sort of simulation or data generating process (DGP) step in their analysis. Simulation of DGPs are especially important for observational studies where fitting a model to data is trivial given the raw curve fitting power of modern techniques so developing against simulations attempts to somewhat separate the model being evaluated from its eventual application to actual data. While lacking the power of a randomized control trial (RCT), a model that performs well across a range of plausible simulations increases the confidence that the fit to actual data is robust and usable for important [Ioannidis et al., 2020] tasks like estimating future trends for both observed and unobserved variables. A secondary benefit of studying simulations is to estimate the impact of model features against simulations that exercise those variables. It is quite easy to determine if a varying rate of infection over time is: 1) recoverable from a simulation that does so; 2) how accurately can the model recover the actual parameter values; and 3) all parameters are available from the simulating DGP whether observed or not. These features quickly identify how the model performs in ways unavailable with real data. Another concern addressed with simulations is the fact that even simple compartment models degrade into chaotic systems under reasonable seeming assumptions such as time varying infection rates [Barrientos et al., 2017] . The flow of presentation is as follows: 1. We provide background on COVID-19 modeling with an emphases on Bayesian approaches (Section 2); 2. We present simulation algorithm, available for download, that implements a simple agent-based model for disease transmission that works with a standard compartment model. The simulation is exercised through a single setting of plausible parameterizations (Section 3); 3. We fit a Bayesian compartment epidemiological model to simulated data and compare internal model states, e.g. the compartment populations, to those of the DPG simulation (Section 3.1); 4. We also fit the model to actual data from Brazil and discuss the challenges to our use case (Section 3.3); and 5. We summarize our conclusions and also address limitations and opportunities for future studies (Section 4). Compartmental models, also called population-based models, are used to model the dynamics of a infectious disease in a population scale. Those models simplify the complex reality of an epidemic by subdividing the total population into homogeneous groups, called compartments. Individuals within the same compartment are considered to be in the same state regarding the progression of the disease. Compartmental models originated in the beginning of the 20th century with the Susceptible-Infectious-Recovered (SIR) model [Kermack and McKendrick, 1927] which splits the population in three time-dependent compartments: the susceptible, the infected (and infectious), and the recovered (and not infectious) compartments. When a susceptible individual comes into contact with an infectious individual, the former can become infected for some time, and then recover and become immune. Some infectious diseases are fatal, so in order to differentiates between recovered and deceased, the Susceptible-Infectious-Recovered-Deceased (SIRD) model [Bailey et al., 1975] was developed. Since COVID-19 can quickly overcome a nation's health system by overloading the need for intensive care unit (ICU) beds [Pinto Neto et al., 2021] , we found the need to include a state that represents terminally-ill patients. Our core model includes also a T state for terminally-ill individuals who have been infected and will unfortunately become deceased. The acronym then becomes Susceptible-Infectious-Recovered-Terminally-ill-Deceased (SIRTD) although we will do experiments with simpler and more complex models. The dynamics of SIRTD are governed by a system of ordinary differential equations (ODE): where: • S(t) is the number of people susceptible to becoming infected (no immunity); • I(t) is the number of people currently infected (and infectious); • T (t) is the number of terminally ill individuals who have been infected and will die; • R(t) is the number of removed people (either dead or we assume they remain immune indefinitely); • D(t) is the number of recovered people that unfortunately died; • N = S(t) + I(t) + R(t) + T (t) + D(t) is the constant total number of individuals in the population; • β is the constant rate of contacts between individuals per unit time that are sufficient to lead to transmission if one of the individuals is infectious and the other is susceptible; • ω is constant death rate of recovered individuals; • d I is the mean time for which individuals are infectious; and • d T is the mean time for which individuals are terminally-ill. Susceptible individuals (state S) will randomly get in contact with infected individuals (state I) and, consequently from this contact, become infected with rate β (equation 1). Once the susceptible individual becomes infected, he/she can infect other susceptible individuals by random encounters and stays infected/infectious for an average of d I days (equation 2). Infected individuals can recover (state R) with probability 1 − ω (equation 3) or become terminally-ill (state T ) with probability ω (equation 4). Finally, terminally-ill individuals will eventually decease (state D) in an average of d T days (equation 5). The model can also be represented in an directed acyclic graph (DAG) in figure 1. The SIRTD model has several assumptions. First, it assumes that population N is constant. Second, every state is populated by homogeneous individuals, i.e., no differences in demographics, social characteristics or health-related variables. Third, the model assumes a random mixing of the population, susceptible are in contact with infectious only governed by chance alone. Fourth, infected will become infectious (they can spread the disease) and will either recover or become terminally-ill. Fifth, infected will also, during the time that they remain infected, potentially infecting susceptible, i.e., no self-quarantine or isolation measures are taken. Finally, recovered are forever immune. One of our main contributions is to propose and execute a simulation-driven modeling. We agree with Ioannidis et al. [2020] that it is important to "careful modeling of predictive distributions ... and continuously reappraising models based on their validated performance" is essential in epidemiological modeling. There is lack of attention to simulation in the recent epidemiology literature, specially related to COVID-19. Searching in Scopus for epidemiology models for COVID-19 in the top peer-reviewed journals, we find that most do not analyze their models with regard to how well they perform in a controlled DGP simulation. For instance, Stringhini et al. [2020] , Zhang et al. [2020] , Roda et al. [2020] , for the REMAP-CAP Investigators [2020], Kontis et al. [2020] , Niehus et al. [2020] all used a Bayesian model and did only inference using the likelihood conditional on data. Despite that, we found some evidence of simulation and care for how the proposed model perform in a controlled setting [Brauner et al., 2021 , Roques et al., 2020 COVID-19 modeling appears amenable to heterogeneous information sources informing modeling and many ideas have been explored. This work was inspired by the CoDatMo's Liverpool Model [Moore and Phillips, 2021 ] 2 that combined 111 calls reporting symptoms to health authorities with weekly death data in a sophisticated SEEIIRTTD model. Liverpool also kindly provided Twitter data in Portuguese filtered for symptoms so we credit them with setting the form of our model and information sources. However the richness and quality of data in emerging countries can be quite different which raises issues around how complex a modeling solution is possible for Brazil. One goal of this paper, currently unachieved, is an assessment of the benefits of model complexity as we compare performance of SIR, SIRD, SIRTD and other models with DGPs that are themselves of varying complexity. Despite several attempts of real-time pandemic monitoring and forecast we found no literature that incorporate social media data into epidemiological models. It is quite common to use epidemiological models for real-time monitoring and forecasting of COVID-19 dynamics but without any social media data [Birrell et al., 2020 , Jersakova et al., 2021 , Altmejd et al., 2020 , Schneble et al., 2020 , Hawryluk et al., 2021 , Loro et al., 2020 , Stoner et al., 2020 . Studies that did use social media data in our explorations were preoccupied with network analyses [Mattei et al., 2021 , Esquirol et al., 2020 , Chire-Saire, 2020 , Cruickshank and Carley, 2020 , semantic meaning [Chopra et al., 2021 , Wicke and Bolognesi, 2021 , Kruspe et al., 2020 , depression and suicide [Cortes et al., 2020] , fake news [Yang et al., 2021 , Shahi et al., 2020 , Singh et al., 2020 , companies' challenges [Patuelli et al., 2021] , drug mentions [Tekumalla and Banda, 2020] , and privacy issues [Dev, 2020] . Furthermore, some studies tried to extract information regarding COVID-19 dynamics from social media but without incorporating this information into epidemiological models. Zong et al. [2020] presented an annotated corpus of 7,500 tweets for COVID-19 events demonstrating the possibility of accurately identifying COVID-19 events in Twitter but with no extensions to COVID-19 dynamics or modeling efforts. In the same line, Kaushal and Vaidhya [2020] trained a natural language processing (NLP) deep learning model to detect COVID-19 related events from Twitter, such as individuals who recently contracted the virus, someone with symptoms who were denied testing and believed remedies against the infection. A similar approach was done by Santosh et al. [2020] in detecting symptoms in Twitter. There is also efforts to combine official COVID-19 data from national and international authorities with social media data [Pu et al., 2020] . One interesting breakthrough came from Gencoglu and Gruber [2020] which used causal modeling to discover and quantify causal relationships between pandemic characteristics and Twitter activity as well as public sentiment and showed that twitter data can successfully capture the epidemiological domain knowledge. We could use both social media and also mobility data in epidemiological model. Mobility data can be easily obtained, for example Avelar et al. [2021] Google's mobility data and a Bayesian epidemiological model to predict deaths. Combining social media data with mobility data also presents some issues. One major obstacle is that a small fraction of tweets are geotagged and some of them have inaccurate location data [Huang et al., 2020, Porcher and Renault, 2021] . To address those gaps, we devised a SIRTD model that uses symptom mentions in social media to better infer and predict the number of infected individuals (state I). Our intent is to demonstrate how social media data, specially symptoms mentions, could enhance simple epidemiological models. In the next section we demonstrate our experiments using both simulated and real data from Brazil. We conducted two experiments. The first experiment was with simulated data where configuring parameters were randomly from reasonable ranges and then used to generate data for model fitting. For this preliminary work we ran a single simulation with our SIRTD model. The second experiment was with real data from Brazil in 2020 where we again run with ou SIRTD model. We followed the Bayesian workflow for disease transmission modeling by Grinsztajn et al. [2021] in which we build a model, fit the model, criticize, and repeat. This cycle is also similar to the Bayesian workflow proposed by Gelman et al. [2020] that includes three steps of model building, inference, and model checking/improvement, along with the comparison of different models. For all experiments we used Stan [Carpenter et al., 2017] : a Bayesian probabilistic programming language for specifying complex statistical models and performing inference using Markov Chain Monte Carlo (MCMC). All the data, source code and Stan models can be found on a GitHub repository 3 . The ODE system of equations described in equations 1, 2, 3, 4, 5 were implemented and solved by a 4th/5th order Runge-Kutta method [Iserles, 2008] using the Dormand-Prince algorithm [Dormand and Prince, 1980 ] 4 with relative tolerance and absolute tolerance of 1e-6 and maximum number of steps h = 1e4. The model can be specified as following. First, the prior distributions specifications. The constant rate of infection β is sampled from a normal distribution constrained to positive values (equation 6) with mean µ β and standard deviation σ β . The constant death rate of recovered individuals ω is sampled from a beta distribution (equation 7) with parameters α ω representing the number of people that unfortunately will become terminally-ill and deceased and β ω representing the number of people that will recover from the disease. The mean time for which individuals are either infectious or terminally-ill, d I and d T , are both sampled from a normal distribution constrained to positive values (equations 8 and 9) with means µ d I , µ d T and standard deviation σ d I , σ d T respectively. The proportion of infected people who will tweet daily about his/her symptoms, Proportion Tweets, while being in state I is sampled from a flat prior distribution for proportions as a beta distribution (equation 10). The model has the following likelihood specifications. Both daily counts of tweets regarding symptoms and cumulative deaths counts are distributed as negative binomial distribution 5 . For cumulative death counts (equation 13), the location parameter is the number of individuals in state D (solved by Stan's ODE solver) and the precision parameter φ which follows an exponential distribution with rate parameter λ φ (equation 11). For daily counts of tweets regarding symptoms (equation 14), the location parameter is number of individuals in state I (also solved by Stan's ODE solver) multiplied by the proportion of infected people who will tweet daily about his/her symptoms, Proportion Tweets, while being in state I; and the precision parameter φ tweets which follows an exponential distribution with rate parameter λ φtweets (equation 12). Deceased ∼ Negative Binomial state D, 1 φ Tweets ∼ Negative Binomial state I · Proportion Tweets, 1 φ tweets In all of our experiments, we set the priors for the model as similar priors that are used in some COVID-19 epidemiological models [Moore and Phillips, 2021] : β ∼ Normal + (2, 1); ω ∼ Normal + (0.4, 0.5); λ ∼ Beta + (1, 2); d I ∼ Normal + (7, 2); d T ∼ Normal + (10, 2); φ ∼ Exponential(5); and φ tweets ∼ Exponential (5). For all of our sampling, we mostly used Stan's defaults settings. This translates to MCMC sampling using Hamiltonian Monte Carlo (HMC) [Neal, 2011] and No-U-Turn-Sampling (NUTS) [Hoffman and Gelman, 2011] with 4 separated chains, each having 2,000 iterations and the first 1,000 (half of the total iterations) being discarded as warm-up and the last 1,000 being used as samples from the underlying Markov chain. We took care to set specific random number generator seeds to make our results reproducible. We also used default's parameters for the NUTS HMC sampler, which means the target Metropolis acceptance rate is 80% (adapt_delta = 0.8) and the cap on the depth of the trees that it evaluates during each iteration is 2 10 (max_treedepth = 10). Our computing environment uses R version 4.1.0 [R Core Team, 2021], Stan version 2.27.0 [Carpenter et al., 2017] , CmdStanR version 0.4.0 [Gabry andČešnovar, 2021] . Our simulation closely mirrors the structure of the SIRTD model described above in part to help debug and better understand the dynamics of the models being fit. As a result the simulated data most likely is too easy for the model to recover but we anticipate complicating the simulation in later versions of this work to break the near isomorphism between the model and the simulated data generating process (DGP). Algorithm 1 is the pseudo-code representation of our agent-based simulation. Starting from everyday we reset the twitter count and then start to simulate each individual independently depending on what compartment the individual is in the current day of the epidemic simulation. If an individual is in the infected I compartment, the individual will tweet about his or hers symptoms with probability λ and will have C daily contacts with other individuals from a population N . If one of those contacts is an individual in the susceptible S compartment, then the susceptible individual will become infected with probability β. Everyday an infected individual will have a change to recover and leave the infected I compartment with probability 1 d I . The infected individual, then can leave either to the terminally-ill T compartment with probability ω or to the recovered R compartment with probability 1 − ω. Finally, if the individual is in the terminally-ill T compartment, the individual will leave to the deceased D compartment with probability 1 d T The chief benefit to simulations is that it forces one to confront the details of the model from a generation perspective independent of the model being created to characterize, in this case COVID-19, the phenomenon of study. It is our opinion that just running a Bayesian model generatively does not satisfy the intent nor yield the benefit of a fully specified DGP as done with a prior predictive check. Exercising the likelihood in this way yields little additional knowledge other than the fact that the priors can be recovered with success. Our agent-based model is nearly isomorphic in parameterization and execution to the SIRTD model we fit it with but even this level of simulation provided insights. In an agent based framework there has to be more thought given about how one agent infects another. For example a person who is I must come in contact with people who are S and those interactions have to be β infectious on average from one day to the next. There are also no fractional people in our simulated world so our solution was to posit some number of interactions C per day for I people with S people with β C chance of being infected. C obviously will not stay constant presumably over their time as I but we ignore that, as we do the possibility that β is greater than one. For our simulated data, actual parameter values were set as following: N = 10, 000; t = 70; C = 10; β = 0.3; ω = 0.1; λ = 0.2; d I = 7; d T = 10; and I 0 = 10. In Table 1 Figure 2 we display the simulated truth for all the compartments in the SIRTD simulated data in dots and the mean generated prediction values as lines, which shows visually how the model can closely replicate trends after inferring the parameter values. In Brazil, events such as Flu Syndrome (FS) and Severe Acute Respiratory Illness (SARI) are countrywide notified since the beginning of the SARS-CoV-2 pandemic. Flu Syndrome cases that seek the health system for COVID-19 testing are registered in the e-SUS Notifica information system, and those who are hospitalized or die due to SARI are notified in the Epidemiological Surveillance Information System of Sivep-Gripe. The SARI surveillance, implemented in 2009 with the advent of the influenza A (H1N1) pdm09 virus pandemic, is carried out in all public or private hospitals in the country that have capacity to provide assistance to cases of SARI [Ministério da Saúde do Brasil (a), 2021]. We used data from the e-SUS Notifica but only restricted our analyses to the year 2020. In Figure 3 we show the daily confirmed cases by PCR-positive results in red and total cumulative daily deaths in blue. To address those gaps, we devised a list of 56 keywords including signs and symptoms compatible with COVID-19, such as flu-like symptoms, body pain, fever, cough, runny nose, anosmia, respiratory distress and other related terms; that were used to webscrape 2,042,775 tweets from June, 10th 2020 to December, 31st 2020 (see Figure 4 ). We annotated 9,600 tweets with binary labels indicating 0 for noise and 1 for signal regarding the mention of symptoms either by the user or by someone that the user knows. Those labeled tweets were used to train a term frequency-inverse document frequency (TF-IDF) [Salton and Buckley, 1988] Random Forest classifier in scikit-learn [Pedregosa et al., 2011] 6 . We achieved achieved 90% accuracy in the test set (80/20 split) (see Table 2 ). The trained classified was then used to predict the remaining unlabeled tweets either with noise (0) or signal (1). We used the aggregated daily counts to generate our twitter symptom mention time series used in the Brazil's COVID-19 model inferences and predictions (see Figure 5 ). Some studies used data from Brazil's social media in epidemiological models of dengue. Albinati et al. [2017] used Twitter data to improve epidemiological models for predicting dengue incidence in real time in Brazil. Souza et al. [2019] detected spatial clusters of dengue risk using Twitter data in two Brazilian cities with more than 1 million inhabitants and the highest dengue incidence rates in 2015. Souza et al. [2015] developed a latent shared-component generative model to predict dengue outbreaks in Brazilian urban areas, also using data collected from Twitter. We also ran our SIRTD model for the Brazilian real data. Since we have only tweets from June 10th 2020 onwards we used official deaths and confirmed data from this date onwards. In a future version of this preprint we will use twitter data since the beginning of the COVID-19 pandemic in Brazil (February 25th of 2020). For the Brazilian data, we used population values from the official last available government data (year 2019). For the initial individuals counts in the SIRTD compartments, we set I 0 to be the number of PCR-positive COVID cases for 10th June 2020, R 0 as cumulative total of PCR-positive COVID cases in 10th June 2020, D 0 as the cumulative deaths in 10th June 2020 and T 0 = 0. We subtracted from population the initials I 0 , R 0 , T 0 and D 0 to get the initial susceptible number S 0 . Since, we cannot compare the real infection rate β and real death rate because of under-reporting in Brazilian data, we cannot compare our model estimated parameters with the ground truth. In Table 3 Figure 6 we show the ground truth as dots and the mean generated prediction values as lines, which shows visually how the model could generate accurate predictions after inferring the parameter values. One of our contributions is to demonstrate how epidemiological models, specifically in the case of COVID-19 pandemic, can be better comprehended and in turn become more robust. For example, in June 2021, Brazil reached the mark of 500,000 deaths and 18 million accumulated cases of COVID-19 [Ministério da Saúde do Brasil (b), 2021]. This has impacted the availability of inpatient beds and other assistance resources in a large part of the country. Thus, the development of reliable predictive epidemiological models could help federated entities better plan of assistance to critical cases. Some limitations should be mentioned: (1) the data was not analyzed considering sociodemographic differences (gender, age group, place of residence, opportunity to access health care, etc.), we did not account for heterogeneous transmission and mortality by age group like Hauser et al. [2020] ; (2) in most real-world situations, the infection rate β varies over time, we did not account for this and we modeled β as a constante parameter over time; (e) different control measures were adopted by distinct municipal, state and federal authorities, at different times, in an attempt to contain the disease, besides there is also political biases, we have not incorporate heterogeneous measures by different government authorities in our model; (4) it was not possible to disaggregate the model to state and municipal levels, due to the lack of reliable geotagged social media data; and (5) real data from infected (and infectious) individuals were not used to verify our models that used real data from Brazil, due to under-reporting of mild cases, given the low capacity for population testing in the Brazil. We encorage future studies to address those limitations. Seroprevalence of anti-sars-cov-2 igg antibodies in geneva, switzerland (serocov-pop): a population-based study Evolving epidemiology and transmission dynamics of coronavirus disease 2019 outside hubei province, china: a descriptive and modelling study. The Lancet Infectious Diseases Why is it difficult to accurately predict the covid-19 epidemic? Effect of Hydrocortisone on Mortality and Organ Support in Patients With Severe COVID-19: The REMAP-CAP COVID-19 Corticosteroid Domain Randomized Clinical Trial Magnitude, demographics and dynamics of the effect of the first wave of the covid-19 pandemic on all-cause mortality in 21 industrialized countries Using observational data to quantify bias of traveller-derived covid-19 prevalence estimates in wuhan, china. The Lancet Infectious Diseases Forecasting for covid-19 has failed Chaotic dynamics in the seasonally forced SIR epidemic model Containing papers of a mathematical and physical character The mathematical theory of infectious diseases and its applications. Charles Griffin & Company Ltd, 5a Crendon Street, High Wycombe, Bucks HP13 6LE Mathematical model of COVID-19 intervention scenarios for São Paulo-Brazil Inferring the effectiveness of government interventions against covid-19 Using early data to estimate the actual infection fatality ratio from covid-19 in france Liverpool covid model Nick Gent, and Daniela De Angelis. Real-time Nowcasting and Forecasting of COVID-19 Dynamics in England: The first wave? medRxiv Bayesian imputation of covid-19 positive test counts for nowcasting under reporting lag Nowcasting covid-19 statistics reported withdelay: a case-study of sweden Nowcasting fatal COVID-19 infections on a regional level in germany Gaussian process nowcasting: Application to covid-19 mortality reporting Nowcasting covid-19 incidence indicators during the italian first outbreak Spatiotemporal dynamics, nowcasting and forecasting of covid-19 in the united states A powerful modelling framework for nowcasting and forecasting covid-19 and other diseases Italian twitter semantic network during the covid-19 epidemic Characterizing twitter users behaviour during the spanish covid-19 first wave Characterizing twitter interaction during covid-19 pandemic using complex networks and text mining Characterizing communities of hashtag usage on twitter during the 2020 covid-19 pandemic by multi-view clustering Mining trends of covid-19 vaccine beliefs on twitter with lexical embeddings Covid-19 discourse on twitter: How the topics, sentiments, subjectivity, and figurative frames changed over time Cross-language sentiment analysis of european twitter messages duringthe covid-19 pandemic Covid-19 emotion monitoring as a tool to increase preparedness for disease outbreaks in developing regions The covid-19 infodemic: Twitter versus facebook An exploratory study of covid-19 misinformation on twitter A first look at covid-19 information and misinformation sharing on twitter Firms' challenges and social responsibilities during covid-19: a twitter analysis Characterizing drug mentions in covid-19 twitter chatter Extracting covid-19 events from twitter Winners at w-nut 2020 shared task-3: Leveraging event specific and chunk span information for extracting covid entities from tweets Detecting emerging symptoms of covid-19 using context-based twitter embeddings Challenges and opportunities in rapid epidemic information propagation with live knowledge aggregation from social media Causal modeling of twitter activity during covid-19. Computation Weekly Bayesian modelling strategy to predict deaths by COVID-19: A model and case study for the state of Santa Catarina Twitter, human mobility, and covid-19, 2020. Simon Porcher and Thomas Renault. Social distancing beliefs and human mobility: Evidence from twitter Bayesian workflow for disease transmission modeling in Stan Stan : A Probabilistic Programming Language Arieh Iserles. A First Course in the Numerical Analysis of Differential Equations A family of embedded Runge-Kutta formulae Odeint -Solving Ordinary Differential Equations in C++ MCMC using Hamiltonian dynamics The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing cmdstanr: R Interface to 'CmdStan Ministério da Saúde do Brasil (a) Emergência de saúde pública de importância nacional pela doença pelo coronavírus 2019 -covid-19 Term-weighting approaches in automatic text retrieval. Information processing & management Scikit-learn: Machine learning in python Enhancement of epidemiological models for dengue fever based on twitter data Where did i get dengue? detecting spatial clusters of infection risk with social network data A latent sharedcomponent generative model for real-time disease surveillance using twitter data Estimation of SARS-CoV-2 mortality during the early stages of an epidemic: A modeling study in Hubei, China, and six regions in Europe