key: cord-0636202-8x0duuh6
authors: Rashid, Md Tahmid; Wang, Dong
title: CovidSens: A Vision on Reliable Social Sensing based Risk Alerting Systems for COVID-19 Spread
date: 2020-04-09
journal: nan
DOI: nan
sha: 7b72de65011af1999b1beea472cca8d95b25abb8
doc_id: 636202
cord_uid: 8x0duuh6

With the spiraling pandemic of the Coronavirus Disease 2019 (COVID-19), it has becoming inherently important to disseminate accurate and timely information about the disease. Due to the ubiquity of Internet connectivity and smart devices, social sensing is emerging as a dynamic sensing paradigm to collect real-time observations from online users. In this vision paper we propose CovidSens, the concept of social-sensing-based risk alerting systems to notify the general public about the COVID-19 spread. The CovidSens concept is motivated by two recent observations: 1) people have been actively sharing their state of health and experience of the COVID-19 via online social media, and 2) official warning channels and news agencies are relatively slower than people reporting their observations and experiences about COVID-19 on social media. We anticipate an unprecedented opportunity to leverage the posts generated by the social media users to build a real-time analytic system for gathering and circulating vital information of the COVID-19 propagation. Specifically, the vision of CovidSens attempts to answer the questions of: how to track the spread of the COVID-19? How to distill reliable information about the disease with the coexistence of prevailing rumors and misinformation in the social media? How to inform the general public about the latest state of the spread timely and effectively and alert them to remain prepared? In this vision paper, we discuss the roles of CovidSens and identify the potential challenges in implementing reliable social-sensing-based risk alerting systems. We envision that approaches originating from multiple disciplines (e.g. estimation theory, machine learning, constrained optimization) can be effective in addressing the challenges. Finally, we outline a few research directions for future work in CovidSens.

Due to the pervasion of Internet connectivity and smart devices, social sensing is an gradually escalating to a new sensing paradigm that utilizes observations by humans and devices on their behalf to obtain information about the physical world [1] . In this vision paper we present CovidSens, the notion of a real-time risk alerting system based on social sensing to guide situational awareness and intervention motives for Coronavirus Disease 2019 (COVID- 19) spread. According to the most recent statistics, there are more than 435,000 confirmed cases of COVID-19 and above of 14,800 death spread across 50 states in US [2] , [3] . Most of the above cases happened within one week's time (i.e., between March 26, 2020 and April 01, 2020) and the current trend seems to be ever increasing [3] . As the outbreak of COVID-19 progresses, circulating information about the spread in an accurate and timely manner has grown ever important. However, with heightening uncertainty and commotion among the general public, the communication of timely and accurate information to intended recipients is a challenging task. While official warning channels and news agencies have served an active role in informing the public about the spread, they often fall short in terms of pace. It is apparent that the official warning channels and news media take a while to confirm and disseminate the information regarding the outbreak of a new disease [4] . By contrast, information propagation across the social media and crowdsensing platforms is inherently faster than traditional news media [4] . For example, during the 2013 Boston Marathon Bombing, news about the first bomb explosion and the arrest of the suspect were posted in Twitter several minutes before news agencies made announcements [5] , [6] . After the onset of Cholera outbreak in Haiti in 2010, the knowledge regarding the outbreak was first obtained through social media, which occurred weeks before officials confirmed the case of the outbreak [7] . Such cases exemplify the importance of social media during emergency scenarios such as now during the COVID-19 outbreak.

The CovidSens concept is thus motivated by two observations during this global crisis of COVID-19. Firstly, people tend to actively convey their state of health and experience of the virus via online social media since the onset of the COVID-19. For instance, at one given day, 6 .7M people talked about coronavirus on social media [8] . Secondly, people report their observations on social media relatively faster than official warning channels and news agencies make formal announcements. As such, knowledge contribution and discovery through social sensing may offer more effective news transmission [4] . Given this premise, we perceive an unprecedented opportunity to leverage the posts generated by the social media users to build a complete analytics framework for gathering and circulating vital information of the COVID-19 propagation.

Let us consider a few tweets posted during the course of the COVID-19 spread across the US in Figure 1 . These tweets express the experiences and observations of individuals about the COVID-19. If such tweets could be utilized to identify regions affected by COVID-19 and the rate of spread of the virus, it might potentially expedite the alleviation of the adverse effects of the virus. In addition, by analyzing the location and movement data from smartphones and social media posts to detect crowds or mass gatherings while respecting user privacy, government agencies and the mass public could be informed about the more risk-prone areas of a city during the COVID-19 outbreak [9] . This could potentially help to divert people away from more crowded locations and hence reduce the spread of the disease. While the CovidSens vision promises opportunities for a robust information distillation as well as risk alert service for the COVID-19 spread, several technical challenges exist in the way of building such a system that would spontaneously gather and distribute real-time development of the disease to the general public. In contrast to traditional disaster response systems (e.g., for floods or forest fires), one unique goal of CovidSens is to obtain knowledge of the dynamics of the disease spread (e.g., inferring the stages of the disease among people). The first challenge is, therefore, to build a social sensing data collection platform that is able to spontaneously obtain the relevant social signals about symptoms, cases, and fatalities of COVID-19 from the online social media users. The second challenge lies in developing reliable data analysis models that can extract the credible information of the disease spread from the noisy, sparse, and unstructured social data contributed by unvetted human sources such as the three tweets presented in Figure 1 . The third challenge exists in handing the huge volumes of social data about the COVID-19 outbreak that varies widely (e.g., across text, image, video, and audio data). The fourth challenge is to circulate the extracted information about the disease spread to the general public in a timely and efficient manner so that they can plan their actions accordingly. The fifth challenge lies in designing an effective alert system that considers the human aspect of the problem (i.e., handling people's reaction to alerts like fear, concern, or ignorance). The sixth challenge is combating the misinformation spread in the social media where people tend to report rumors or falsified facts of the COVID-19 spread.

The CovidSens aims to overcome the above limitations by providing a more reliable and timely COVID-19 monitoring and alerting system for the mass population based on social sensing. We envision an information retrieval and dispatching system for the general public based on data derived from multiple sources (e.g., social media, crowdsourced platforms, Unmanned Aerial Vehicle (UAV)) to quickly and effectively monitor the spread of the COVID-19 using combinations of smartphone applications, UAVs, message boards, or other modes of information dispersal. We expect this service to be important and useful for people who live in or travel to the affected areas, allowing them to take special precautions and be well prepared. The successful development of such systems can potentially help both authorities and general public respond more quickly and efficiently to COVID-19 and eventually help save more lives.

We acknowledge the potential to employ interdisciplinary techniques from estimation theory, online social media analysis, machine learning, AI and mobile phone applications to develop effective CovidSens systems. Research along the realm of CovidSens is important because the COVID-19 is spreading rapidly in many countries worldwide and a timely alerting system that explores the rich real-time information streaming on social media is yet to be developed. The results of this research can pave the way for studying and tackling COVID-19 around the world.

The rest of the paper is organized as follows. In Section II, we discuss a few state-of-the-art works in the direction of CovidSens. In Section III, we explore potential real-world applications of CovidSens. We identify the a few likely challenges in implementing a successful CovidSens system in Section IV. Afterwards in Section V, we highlight a set of research directions for future work aligning with CovidSens to contain the COVID-19 spread. Finally, we conclude our vision of CovidSens in Section VI.

Social sensing is rapidly progressing as a pervasive sensing paradigm where humans are used as sensors to attain situational awareness about the physical world [4] . Examples of social sensing applications include predicting poverty in developing countries [10] , studying human mobility in urban areas [11] , identifying traffic abnormalities [12] , tracking social unrest [13] and disasters [14] , classifying the urban land usage [15] , detecting wildfire [16] , identifying the point of interests in cities [17] , and performing post-disaster damage assessments [18] . A comprehensive survey of social sensing schemes is provided in [19] . Zhang et al. developed a scalable approach to obtain data veracity in social sensing [20] . Xu et al. developed a framework for semantic and spatial analysis of urban emergency events using social media data [21] . Zhang et al. presented a constraint-aware truth discovery model to detect dynamically evolving truth in social sensing [22] . More recently, there is an advent of social-media-driven drone sensing (SDS) approaches that address the data reliability issue of social sensing by integrating social signals with physical UAVs [23] - [25] . While existing social sensing approaches aim to provide pervasive sensing, they are not tailored specifically to monitor the COVID-19 outbreak. Compared to traditional social sensing applications, CovidSens not only requires an inference of the data veracity, but also how the COVID-19 outbreak can progress across regions based on indications from social media posts (e.g., posts about crowded subways could indicate high risk of COVID-19 risk spread). Thus, it remains a critical task to develop a reliable social sensing model that can accurately monitor the COVID-19 spread.

In recent times, disease tracking based on epidemiological data has been an important avenue of research. Several studies have independently explored the feasibility of using social media and crowdsensing for detection, tracking, and analytics of contagious disease outbreaks [26] , [27] . For example, Google launched a real-time influenza surveillance system, namely Google Flu Trends [28] , to monitor influenza spread by analyzing search terms related to illness symptoms. Kalogiros et al. developed Allergymap, a crowdsensing-based disease identification system for allergen season onsets and allergy patient stratification [29] . Krieck et al. studied the possibility of analyzing Twitter data for infectious disease surveillance [30] . Chester et al. [31] carried out bacterial outbreak investigation based on web forum posts about sick participants from a bike race. Despite the advances in disease monitoring techniques, current schemes have not been designed to handle the exponential progression of the COVID-19 pandemic and provide reliable risk alert in the context of CovidSens. Therefore, it entails a more rapid information distillation and processing system that can track the COVID-19 spread in real-time.

While traditional health systems play an important role in alerting the general public about infectious diseases, their slow information progression have necessitated the adoption of automated warning and alert systems [26] . Brownstein et al. contributed a few early works in this domain by developing: i) a series of interactive websites, HealthMap and Flu Near You [26] , [32] , and ii) a smartphone application called Outbreaks Near Me [33] to present vital information about outbreaks of various illnesses around the world. Toda et al. explored the effectiveness of a text-messaging system for notification of disease outbreaks in Kenya [34] . Yu et al. developed ProMED-mail, an early warning system for emerging diseases [35] . Carter studied the possibility of a tweet-based information dispersal system to facilitate the containment of Ebola in Nigeria [36] . The above approaches are known to provide disease warning with reasonable effectiveness. However, it is an even more challenging task to develop a realtime COVID-19 spread indicator for CovidSens that uses both social-media and crowdsourced data, and also transmit the news of the spread to the general public in real-time.

With the emergence of the COVID-19 outbreak, several streams of research have introduced methods to monitor the COVID-19 propagation. Sun et al. [37] proposed the first study that harnesses crowdsourced data from several social media sources to monitor the COVID-19 spread. Schiffmann [38] developed an informative web portal that aggregates news from myriads of news sources to present latest information on COVID-19 spread. The Johns Hopkins Center for Systems Science and Engineering (JHU CSSE) developed an interactive online dashboard to track and present worldwide reported cases of COVID-19 in real-time [39] . An online community of international students and professionals, called 1point3acres, developed a web-based real-time COVID-19 news aggregator to track the state of the spread in the US and Canada [40] . A mobile app has been developed by the Singaporean government to leverage crowdsourced information to locate community transmission of COVID-19 [41] . A key drawback of the above tools is that they possess partial autonomy, requiring some degree of manual efforts to validate the information of the COVID-19 spread before presenting them online [38] , [40] . During this evolving COVID-19 outbreak, delays are undesirable. Therefore, a significant limitation exists in existing approaches to spontaneously track the COVID-19 propagation and disseminate the information to the end users.

In this section, we highlight a few probable applications in real-world scenarios aligning with the CovidSens vision.

In a social-media-driven disease spread indicator (SDSI), social media posts related to COVID-19 are analyzed to attain the state of the spread [37] . An example of an SDSI architecture is illustrated in Figure 2 . Initially, a real-time Twitter data crawler engine collects tweets indicating public opinions about the disease. The tweets are subsequently filtered and labelled into discrete categories based on the topics of discussions.

A few examples of these topics can be: i) what regions are being frequently reported to be infected; ii) the time between people first talking about COVID-19 symptoms to deciding to be tested (i.e., how long the virus takes to show effect in people) [37] , iii) which age of people are expressing about symptoms the most; iv) how rapidly authorities are responding to the stimuli; and v) whether people are talking about other people they know getting recovered [37] , [42] . Afterwards the labeled Twitter data are passed to a tweet analytics and training engine on a backend server. Specifically, the backend server will construct a clean and timely events summary about the COVID-19 spread by distilling relevant and reliable information from the massive amount of noisy, unstructured, and unvetted data feeds. Lastly, a website or smartphone app will interact with end users to provide them warnings or alerts about the disease spread in their vicinity based on their queries. The analytics engine jointly analyzes the data veracity, source reliability, observation bias (e.g., under vs over estimation), as well as the likelihood of large-scale havoc launched by malicious users on social media using novel estimation theoretic, machine learning and AI techniques. Crowdsensing-based disease tracking (CDT) involves sensor networks and groups of people, with mobile devices capable of sensing, collectively sharing disease related information (e.g., early symptoms, nearby infected persons, deciding to selfquarantine) [37] , [43] . CDT is fueled by the observation that individuals tend to proactively volunteer in contributing data about the COVID-19 spread using their smartphones, wearables, or other devices with sensors and connectivity [37] . In contrast to SDSI, CDT is relatively less pervasive and requires active participation of people and physical sensors. However, in return the data is less noisy and is hence more reliable. Figure 3 shows an example of a representative CDT system. A CDT may typically incorporate three main components. The first component is a data collection platform consisting of a network of users with a custom smartphone application to log data and a set of internet-of-things (IoT) devices (e.g., smart heart-rate monitors, activity trackers, thermal scanners). The smartphone application interacts with users and allows them to actively contribute their reports on the COVID-19 if they are willing to. If the users choose to input data, the app lets the users configure at what granularity (e.g., state, county, street, or N/A) they feel comfortable to share their location information. The second component is an analytics framework that applies relevant statistical analysis and machine learning techniques on the obtained data to infer probable regions of infection and safe zones [33] , [43] . The third component is a smartphone application on the end users' mobile phones to visually represent the analyzed geospatial distribution of the inferred regions [33] . The app can obtain the needed information from the backend server based on the users' queries (e.g., checking the risk level of a particular area of interest). In most cases the data collection and representation is carried out in the same smartphone application [33] . Sun et al. proposed one of the earliest crowdsourcing based COVID-19 outbreak detection system [37] . The Singaporean and South Korean governments have launched mobile apps that utilize crowdsourced data to trace community transmission of the COVID-19 [41] . The urgency of the COVID-19 outbreak has necessitated new dimensions for UAV-based health surveillance and alerting (UHSA) systems [44] . With the help of onboard sensors (e.g., cameras, microphones), UAVs are able to gather intelligence remotely during a disease pandemic scenario where human patrol teams and ground units cannot operate due to risks of getting infected. For instance, UAVs can assist in detecting unwanted crowds of people along locked down areas of a city [44] . Figure 4 demonstrates a representative UHSA model for mitigating the COVID-19 spread. The UHSA system responds to emergency requests by individuals through social media posts about unnecessary mass gatherings. Afterwards, the data is gathered in a backend server and processed using social sensing approaches based on statistics and machine learning for analyzing the truthfulness of the data. The information is then updated across nearby regions by raising verbal alerts through speakers installed on the UAVs. UAVs are also dispatched out to different areas of a city to spontaneously scan and obtain situational awareness about the region. Using the onboard sensors UHSA detects if people are breaking the rules during the lock down situation (e.g., by roaming outside, gathering in crowds). The framework may also locate the availability of critical supplies using the UAVs' cameras (e.g., open pharmacy, grocery stores) based on the social media posts. Using the onboard speakers, the people breaking the rules are alerted to return home. One real-world example of UHSA during the COVID-19 ordeal is in California, USA where the law enforcement officials have resorted to utilizing drones for patrolling the state of California during the ongoing lock-down situation [45] . During the COVID-19 crisis in China, UAVs have served multiple roles including postepidemic aerial evaluation, alerting, and relief distribution to affected regions [46] .

In this section, we present a set of prevalent research challenges and opportunities in the development of an effective CovidSens framework. 

During the onset of rampant disease outbreaks like COVID-19, the primary objective of a CovidSens system is to collect information from the general public. However, several difficulties prevail to locate and obtain the relevant posts related to the COVID-19 spread. For instance, while conducting simple keyword based searches on obtained social media data, the desired keywords may indicate various other unwanted things (e.g., while the term "sick" is generally used to indicate people who are not doing well, it may also be used to express sarcasm by certain people). Several recent studies focused on mitigating this issue of data discovery by replacing simple keyword based searches with singular value decomposition (SVD) driven K-means clustering [47] and recurrent neural network (RNN) based textual labeling process [48] . However, such methods still lag behind human perception in terms of accurately scanning for relevant input data. Thus, obtaining a collection of relevant social media data that directs to the right set of information remains an arduous task. Moreover, a great portion of social media data may eventually turn out to be redundant (e.g., retweets) or simply rephrased from a single original post [49] . On top of that, a good amount of social media data is observed to be transient and perishable. For example, people may delete their previous posts and online repositories (i.e., Twitter and Facebook servers) hosting the posts may take them down for undisclosed reasons. In addition to that, social media APIs such as Twitter often impose various rate limitations which can heavily impede the data collection during disease outbreaks [50] . The data collection process for COVID-19 therefore necessitates a tool that can locate, obtain, and store the relevant information from users in realtime across social media channels.

The concept of CovidSens is centered around the noisy and unreliable data generated by the unknown human sources on the social media [19] , [51] . One important task while harnessing social media for CovidSens is to extract trustworthy information from unreliable human sources with unknown source reliability [1] , [25] . We define this as the data reliability challenge in social sensing. Several truth discovery solutions have been developed to mitigate the data reliability problem. For instance, Wang et al. presented a framework to jointly estimate the reliability of data sources and the correctness of the reported measurements in social media posts using approaches from estimation theory [1] , [52] . Zhang et al. built upon the previous framework to address the scalability and physical constraint challenges and employed the improved schemes to real-world social sensing applications [20] , [22] , [53] . Yin et al. developed Truth Finder, a probabilistic algorithm using iterative weight updates to improve the quality of the data in social sensing [54] . While great efforts have been made on developing reliable social sensing solutions, certain limitations hinder these solutions from being applied in CovidSens to track COVID-19. One drawback of traditional social sensing schemes is that they solely rely on the noisy social media data and there no external means of validating the credibility of the input data during the COVID-19 epidemic [22] . Existing methods are also not tailored towards disease outbreak detection, which may lead to prediction of false cases of COVID-19. For example, a person simply posting a symptom of breathing difficulty may not necessarily suffer from COVID-19. It may be required to analyze other traits of the patient based on earlier posts. Hence, it remains an unresolved challenge in CovidSens to develop reliable social sensing models that can explore the uncertainty in the input data and extract reliable signals.

While data collection is an intrinsic challenge in using social sensing for tracking the COVID-19 spread, a greater difficulty exists in processing the rapidly generated incoming signals consisting of multitudes of features or dimensions [55] . This challenge is identified as data modality in social sensing where large amounts of unfiltered and unstructured data with multiple modalities need to be processed [56] . Specifically, data modality refers to the different variety or types of data prevalent in the social media such as text, image, location, audio, and video [57] . Moreover, each type can further encompass different dimensionality as well which makes the data modality challenge even harder. Examples of dimensionality in CovidSens can range along reports of: i) proximity to infected locations, ii) number of suspected cases, iii) number and types of symptoms, iv) intensity of symptoms (i.e., mild, moderate, or severe), v) recovery rate, vi) death rate, and vii) number of self-quarantined cases. Recent social sensing tools primarily focus on analyzing the text data in social media [58] . This trend is advocated by the fact that image data processing involves heavy computation requirement [59] . Consequently, existing methods do not focus on fusing multiple types of data which may potentially generate richer detection of COVID-19 propagation. For example, a person may tweet about having COVID-19, but based on an image posted with the tweet it may turn out that the person's symptoms have actually resulted from an allergic reaction instead [60] . Fusing text with other data such as image and location data may po-tentially yield more accurate prediction of the COVID-19 spread. Therefore, given the sheer volumes of multi-modal data generated by the social media users about the COVID-19 outbreak, solutions need to be developed to efficiently utilize the different modality of data. Moreover, since multi-modal data processing intrinsically demands greater compute power, care must be given to strike efficient an trade-off between detection accuracy and computational complexity. A set of unsolved questions springing from the data modality challenge in CovidSens are: i) how to efficiently fuse the different types of social media data related to COVID-19 in to one unified data stream? ii) How to design algorithms to process a wide variety of social data in real-time for an accurate prediction of the COVID-19 spread? iii) How to speed up the analysis of multi-modal data for faster COVID-19 spread detection by distributing the computation across multiple devices?

One recurring issue in social sensing is the user privacy whereby the personal information of the online users remains at risk of falling into the wrong hands [61] . Geo-location data shared by users can also be used to expose other private information as well (e.g., ethnicity, race, financial status) which social media users do not typically consent to share and are also not required by CovidSens applications. Thus, it has been observed that due to the concern of one's location and private information being exposed, many social media users tend to not share their location information while reporting their observations in the social media [12] . For example, in an independent study involving data collection for disaster related tweets, it was found that less than 10% of the tweets were actually geo-tagged (i.e., contained geographical location of the users). As such, CovidSens applications that heavily rely on the location metadata from the social media posts to provide inference of the COVID-19 spread may underperform when the number of geo-tagged social media are scarce. Recent literature has explored methods to work around this issue by exploiting spatiotemporal social constraints for location inference from social media posts [62] . However, such uni-dimensional approaches that rely on solely on the content of the social media posts may result in high estimation errors for the inferred locations. In order to precisely track the progress of the COVID-19 propagation, it is imperative to obtain the exact locations of the surges. Consequently, it is a challenge in CovidSens applications to design a solution that can mitigate the data scarcity issue which may eventually yield better sensing results for tracking the COVID-19 spread.

With the rapidly evolving circumstances during the COVID-19 outbreak, it is critical to present the information of the disease spread to the end users in a timely manner. This necessitates an information presentation system that can both process as well as present data of the disease propagation in real-time and keep people alerted. In the recent past, several methods have been implemented to present disease outbreak updates to the mass through means of interactive websites [26] , [32] . However, such methods of information distribution and collection solely rely on aggregating knowledge from different news portals and information websites which can lead to potential delays in alerting people about the most recent situation [4] . Due to their structured nature of information crawling and collating, existing web-based techniques cannot be directly applied to social sensing which encompasses unstructured and noisy social data [55] . In addition to that, websites and smartphone applications rely on the constant availability of both the Internet and a smart device, either of which may not be available in all circumstances. Thus, vital information may not reach to all sectors of the population, especially with the elderly and less tech savvy individuals without access to computers and smart devices. Based on these grounds, it remains an open question in CovidSens on how to develop a reliable yet efficient mechanism that can rapidly deliver important messages and information regarding the COVID-19 spread to all segments of the population.

One important aspect to consider while dealing with social signals in CovidSens is the human component. Given the intensifying concerns and panic among the general public during the COVID-19, we acknowledge that people can be overly emotional, sensational, or biased in expressing their opinions in the social media or the crowdsensing applications [63] . Such behaviour can potentially trigger misrepresented or misinterpreted observations and thus yield erroneous disease tracking results. Based on the above concerns, one critical challenge stemming from the human aspect of social sensing can be on deciding how to handle the mood of the population while containing the public concern at desirable levels. Moreover, it is imperative to study the human component closely and model how people react to the information presented to them through the warning and alert systems in CovidSens. Some individuals may turn out to be excessively sensitive and thus care must be taken so as not to develop the grounds for unnecessary panic or civil unrest. For example, during the Ebola epidemic in Liberia in 2014, riots broke out among the residents when officials raised alarms of the outbreak [64] . On the other extreme of the spectrum, we also acknowledge that a certain proportion of the population have a tendency to be oblivious of the circumstances, neglect warnings, and remain excessively calm during this outbreak situation. The challenge of CovidSens is to strike a smooth balance between raising attention and providing assurance: at one end we need to calm people down while informing them of the situation but at the same time we also need to send out the message to remain well-prepared.

With the heightening concern of the COVID-19 spread, just as social media has served as a platform for attaining information, it has also served as the venue for sprouting misinformation. Due to the increased adoption of social sensing as a news source, misinformation spread on social media has remained an inevitable issue [54] . This has caused social media giants such as Facebook and Google to conduct worldwide campaigns to fight the propagation of fake news [65] . Figure 5 illustrates a collection of tweets referring to misinformation during the COVID-19 outbreak. The World Health Organization (WHO) has been forced to reallocate considerable resources to combat swathes of misinformation like these, which may potentially hinder COVID-19 monitoring efforts [66] . This phenomenon has been classified by WHO as an 'infodemic' [66] . Social sensing tools, otherwise known as truth discovery algorithms, are known to under-perform in the presence of widespread misinformation, which is common during disease outbreak scenarios. One obvious measure to address this issue is to acquire ground truth for validating the source reliability and event correctness. However, obtaining such ground truth is delay prone since it requires a significant amount of manual effort, but most importantly it is impractical during the course of virus breakouts where people should restrict locomotion and contact with other people. Therefore, it remains a critical challenge in CovidSens to construct an effective mechanism that can identify and isolate the misinformation spread to generate trustworthy social signals indicating the COVID-19 spread. In this section, we discuss a few potential directions for future work in the realm of CovidSens.

We note that CovidSens relies on noisy and uncertain socialsensing data generated by unvetted data sources to monitor the COVID-19 spread. Thus, one domain for future work can be to mitigate the data reliability challenge for CovidSens applications. We observe that existing social-sensing tools or truth discovery algorithms mainly prioritize on the data veracity or source reliability from the social media data. However, in a social-media-driven COVID-19 spread indicator application, the estimation confidence of a reported event's veracity is also crucial [67] . Consequently, it is important to determine the confidence level with which the COVID-19 propagation is predicted. For example, an inferred age demography with a low estimation confidence can easily lead to an erroneous conclusion on which age of people are most likely to be affected by COVID-19. In particular, further research can focus on rigorously quantifying the uncertainty of output results to evaluate and enhance the performance of the truth discovery algorithms. While the uncertainty quantification is well-studied in statistics and estimation theory, it is mostly overlooked in existing social sensing solutions since the performance of truth discovery algorithms are hard to inspect and humans are more likely to generate the claims with different degrees of uncertainty (e.g., affirmative assertions versus pure guesses) [68] , [69] . Based on this, one probable research direction is to develop a method to determine the confidence levels of detection by quantifying the uncertainty of the results in CovidSens applications.

Current literature on statistical analysis discusses principled approaches based on estimation theory. A set examples of techniques to quantify the uncertainty of the estimation results of the truth discovery algorithms are maximum likelihood estimation (MLE) and Cramer-Rao lower bounds (CRLB) [53] , [67] . While these methods have been tested to operate optimally to provide the desired uncertainty quantification, it stills remain a critical challenge to formulate the truth discovery problems in CovidSens in a mathematically tractable way that would allow the uncertainty estimation tools to be applied upon. We envision that theories from multiple disciplines would be leveraged to cater to the uncertainty quantification problem in the CovidSens applications.

One direction for future work for CovidSens is to combat the misinformation propagation challenge. Therefore, rumor suppression and fake news detection are indispensable for COVID-19 related misinformation spread containment. We acknowledge that rumors and misinformation in social media originates from the behaviour of individuals sharing what others share [70] . Thus, it is beyond the scope of machine intelligence alone to contain the spread of rumors and misinformation entirely. Based on these premises, a few potential research questions can be: i) how to develop techniques that incorporate human intelligence along with machine intelligence to more accurately identify the rumors from true information about the COVID-19 spread? ii) How to investigate and identify the origin behind misinformation sharing from the social media posts? iii) How different demography (e.g., age groups, gender classes) react to misinformation about COVID-19 spread and how to utilize this knowledge to combat the misinformation propagation?

Several existing literature has proposed different fact determining techniques for analyzing and detecting falsified claims and rumors on social media using: i) Bayesian-based heuristic algorithms [54] , ii) analyzing textual evidence with associated images [71] , and iii) considering physical constraints and temporal dependencies of the evolving truth [72] . One new domain of research focuses on unifying the collective strengths of human intelligence (HI) and artificial intelligence (AI) to screen out misinformation in the social media [73] . Such approaches utilize HI-based crowdsourcing platforms such as Amazon Mechanical Turk (MTurk) in combination with existing deep neural networks (DNNs) and machine learning techniques, and can be used to classify social media posts about COVID-19 as veracious or falsified [73] .

A stream of potential research can focus around mitigating the data collection and timely presentation challenges in CovidSens applications. In order to obtain information, traditional news media (e.g., CNN, BBC) rely on dedicated news reporters while social news aggregators (e.g., Digg, Reddit) rely on active voluntary participation of committed individuals. A key drawback of such news collection approaches is that they entrust a central authority (i.e., a news agency or web administrator) to analyze and verify disease outbreaks like COVID-19, which may induce delays in deriving the COVID-19 propagation [4] . In contrast, a decentralized social-sensing based news aggregation and subscription service can potentially accelerate the news collection as well as distribution of information during the global pandemic of COVID-19 [74] . A survey shows that 37% of Internet users promulgated news content through social media posts on Facebook and Twitter [74] . With the proliferation of smart devices and people's tendency to post about nearby cases of people showing symptoms of COVID-19 or feeling ill [75] - [77] , information about probable COVID-19 cases can propagate very fast through the social media. However, as identified earlier, a key hurdle is to develop a system that can spontaneously locate, obtain, and store the data from the social media platforms. Furthermore, after the COVID-19 related information is assembled, a system needs to be developed that can convey the processed information to the mass public. A set of important research questions are: i) how to efficiently filter and organize information contributed by diversified and unreliable sources? ii) How to compile the gathered information to an acceptable degree that each subscriber feels complacent in reading and trusting? iii) How to present the information to less tech savvy individuals with limited knowledge on computers and smartphones? iv) How to sustain the news aggregation and circulation during an Internet downtime?

A possible approach of information collection is to develop a real-time social media data collection and storage engine, such as Apollo [78] . One other potentially effective technique for information aggregation is to develop a dedicated crowdsensing-based smartphone application which allows users to readily report about COVID-19 related observations [33] . Subsequently, a decentralized mesh network based news subscription service can be constructed from the collected data in the mobile app that is able to operate autonomously without a central authority. The service can be used to leverage the rich set of real-time observations of COVID-19 contained in the social data to explore the collective wisdom of common individuals without relying on dedicated news reporters. The entire service may be implemented within the aforementioned mobile app that can both collect the information of the COVID-19 spread from the online users and also present the prepared news to others [33] . This process can virtually eliminate the existence of a central authority, hence reducing delays in information gathering and distribution in a CovidSens application.

CovidSens applications are inherently location data driven and hence a potential domain of research in CovidSens can be to address the location data scarcity challenge from the social media data. Specifically, studies can focus on determining the location of the COVID-19 related report origination points in the absence of the geo-location metadata in the posts. We emphasize that during inferring the event report locations from the social media data, care must be given to respect individual privacy from the system perspective, which if done improperly may lead to serious privacy breaches. For example, while a user's location information may be deduced from the text data in social media, it may also be used to infer other sensitive information such as job, ethnicity, race, financial status. Leakage of these information may place users at risk and lead to loss of confidence in the developed system [61] . Therefore, one important area of research in CovidSense can focus on how to develop privacy-aware location inference tools based on the contextual analysis of social media data that protects the identity and privacy of the users.

Once the user privacy is ensured, a good amount of opportunity exists in designing techniques to leverage the contextual information that is embedded within the text content of a social media post (toponym resolution). Moreover, images contained with posts can also be useful in extrapolating an accurate estimate of the social media report's origination sites [79] . For example, an individual tweeting about COVID-19 symptoms claiming to be from a particular location can be given greater credibility if he or she posts with the image of the place. Another way to obtain the geo-location information of social media data can be to use image-based geocoding where subjects in the background of a posted image are crossreferenced with known landmarks or popular sites to find the location of the image [80] .

People who post about disease symptoms in social media and "follow" other social media users with similar symptoms may be co-located [81] . Intuitively, if one user's location can be determined, the location of the related users may be discovered as well. However, individuals may also reside very far from one another. For instance, two friends showing COVID-19 related symptoms may be located in two different cities. Thus, additional features from the social media data may be analyzed to infer other evidence for being co-located. Rich privacy-aware location inference schemes can be developed that fuse friend-follower networks with the contextual information embedded within texts in tweets to determine the whereabouts of COVID-19 spread [62] , [81] . An ensemble of solutions employing natural language processing (NLP) [82] , deep neural networks (DNNs), and social network analysis can be built to accurately infer the location information from the social media data [73] , [79] .

As identified earlier, one key goal for developing effective CovidSens applications is to address the data reliability challenge stemming from the unreliable social media users. Beside uncertainty quantification, a strand of research to combat the data reliability challenge in CovidSens is to integrate social sensing with physical sensing paradigms (e.g., unmanned aerial vehicles (UAVs) and vehicular sensor networks (VSNs)) to verify the reports connected to COVID-19. Compared to UAVs and VSNs, social sensing has a broader outreach, but suffers from inconsistent reliability. On the other hand, UAVs and VSNs are fitted with arrays of sensors (e.g., temperature, humidity, and air quality sensors, cameras, microphones) [83] that allow them to sense COVID-19 related events with substantial fidelity [25] . However, they are limited in sensing scope and possess partial autonomy [24] . Leveraging the collective strengths of UAVs and VSNs with social sensing can potentially accelerate the discovery of COVID-19 related events. The reliable and high quality measurements provided by physical sensors naturally complement the uncertain estimation and broader sensing scope of social sensing. Driven by the social signals, the mobility and agility of UAVs and VSNs can allow them to be quickly sent to COVID-19 prone areas or hot zones to collect real-time evidence (e.g., people loitering on streets or gathering in larger groups) and ascertain whether the reported cases actually exists before sending out medical teams or law enforcement [83] .

A few possible courses of work can focus on either integrating social sensing with unmanned aerial vehicles (UAVs), namely social drone [23] , or with vehicular sensor networks (VSNs), namely social car [84] to sense the neighbourhood of COVID-19 affected areas for unwanted crowds, open pharmacies or emergency supply stores, and so on. A set of open research questions in these applications are: i) how to leverage the noisy social signals to quickly guide drones and cars to locations of interest? ii) How to accommodate various constraints imposed by the physical world (e.g., deadlines of urgent cases like dying patients, limited availability drones and their limited flight times)? iii) How to leverage the observations collected by the drones (e.g., unwanted crowds) to improve the social sensing process? Probable solutions that holistically solve the above challenges in the context of CovidSens systems are yet to be developed.

In this paper, we introduce CovidSens, a new vision of reliable social sensing-based information distillation and risk alerting systems to monitor the COVID-19 spread and study the transmission dynamics of the contagious disease. We highlight a few key challenges in CovidSens applications including data collection, reliability, modality, presentation, and misinformation spread. By harnessing interdisciplinary techniques, CovidSens can combine the collective strengths of social sensing with machine intelligence as well as human intelligence to perform real-time analyses on the obtained epidemiological data. CovidSense can yield more timely and accurate prediction of the COVID-19 spread which may subsequently be presented to end users through a collection of rich mobile apps and UAVs. We hope this paper will uphold CovidSens as an important avenue for guiding research to tackle the current COVID-19 pandemic around the world.

On truth discovery in social sensing: A maximum likelihood estimation approach

ACM/IEEE 11th Int Information Processing in Sensor Networks (IPSN) Conf

Coronavirus disease 2019 (covid-19) in the u.s

Coronavirus disease 2019 (covid-19) in the u.s

The age of social sensing

Social media and the boston marathon bombings: A case study

Disaster communications in a changing media world

Social and news media enable estimation of epidemiological patterns early in the 2010 haitian cholera outbreak

6.7 million people just mentioned the coronavirus on social media

Ubiquitous sensing for mapping poverty in developing countries

A tale of many cities: universal patterns in human urban mobility

Risksens: A multiview learning approach to identifying risky traffic locations in intelligent transportation systems using social and remote sensing

Crowdsensing with polarized sources

Mood-sensitive truth discovery for reliable recommendation systems in social sensing

Transland: An adversarial transfer learning approach for migratable urban land usage classification using remote sensing

Using social media to detect and locate wildfires

Large-scale point-of-interest category prediction using natural language processing models

Crowdlearn: A crowd-ai hybrid system for deep learning-based damage assessment applications

Social sensing: building reliable systems on unreliable data

On scalable and robust truth discovery in big data social media sensing applications

Participatory sensing-based semantic and spatial analysis of urban emergency events using mobile social media

Constraint-aware dynamic truth discovery in big data social media sensing

An integrated social media and drone sensing system for reliable disaster response

Sead: Towards a social-media-driven energy-aware drone sensing framework

Collabdrone: A collaborative spatiotemporal-aware drone sensing system driven by social sensing signals

Trending now: using social media to predict and track disease outbreaks

Using social media for actionable disease surveillance and outbreak management: a systematic literature review

Interpreting "google flu trends" data for pandemic h1n1 influenza: the new zealand experience

Allergymap: a hybrid mhealth mobile crowdsensing system for allergic diseases epidemiology: a multidisciplinary case study

A new age of public health: Identifying disease outbreaks by analyzing tweets

Use of a web forum and an online questionnaire in the detection and investigation of an outbreak

Surveillance sans frontieres: Internet-based emerging infectious disease intelligence and the healthmap project

Healthmap: global infectious disease monitoring through automated classification and visualization of internet media reports

Effectiveness of a mobile short-message-service-based disease outbreak alert system in kenya

Promed-mail: an early warning system for emerging diseases

How twitter may have helped nigeria contain ebola

Early epidemiological analysis of the coronavirus disease 2019 outbreak based on crowdsourced data: a population-level observational study

Coronavirus dashboard

An interactive web-based dashboard to track covid-19 in real time

What we can learn from south korea and singapore's efforts to stop coronavirus (besides wearing face masks)

Features, evaluation and treatment coronavirus (covid-19)

Situation awareness in crowdsensing for disease surveillance in crisis situations

Vision-based target detection and localization via a team of cooperative uav and ugvs

Cops will start using drones fitted with night-vision cameras

The uses of drones in case of massive epidemics contagious diseases relief humanitarian aid: Wuhan-covid-19 crisis

Combination of singular value decomposition and k-means clustering methods for topic detection on twitter

Structured prediction models for rnn based sequence labeling in clinical text

Linguistic redundancy in twitter

Up and running: Learn how to build applications with the Twitter API

Maximum likelihood analysis of conflicting observations in social sensing

Using humans as sensors: an estimation-theoretic perspective

On robust truth discovery in sparse social media sensing

Truth discovery with multiple conflicting information providers on the web

Big data and information distillation in social sensing

Data cleaning: Overview and emerging challenges

(big) data in a virtualized world: volume, velocity, and variety in cloud datacenters

On opinion characterization in social sensing: A multi-view subspace learning approach

Image parallel processing based on gpu

Privacy-aware edge computing in social sensing applications using ring signatures

Where are you from: Home location profiling of crowd sensors from noisy and sparse crowdsourcing data

Garbage in, garbage out: data collection, quality assessment and reporting standards for social media data use in health research, infodemiology and digital disease detection

Early epidemic dynamics of the west african 2014 ebola outbreak: estimates derived with a simple twoparameter model

Google and facebook take aim at fake news sites

Misinformation will undermine coronavirus responses

Social edge intelligence: Integrating human and artificial intelligence at the edge

On credibility estimation tradeoffs in assured social sensing

Confidence-aware truth estimation in social sensing applications

Detecting misinformation in online social networks using cognitive psychology

Fauxbuster: A content-free fauxtography detector using social media comments

Exploitation of physical constraints for reliable social sensing

Crowdlearn: A crowd-ai hybrid system for deep learning-based damage assessment applications

Online news on twitter: Newspapers' social media adoption and their online readership

Corona virus (covid-19) tweets dataset

What is the people posting about symptoms related to coronavirus in bogota, colombia

Covid-19: The first public coronavirus twitter dataset

Towards fact-finding for social (human-centric) sensing

Geo-location inference from image content and user tags

Joint people, event, and location recognition in personal photo collections using cross-domain context

Fusing text and frienships for location inference in online social networks

Location identification for crime & disaster events by geoparsing twitter

Help from the sky: Leveraging uavs for disaster management

Socialcar: A task allocation framework for social media driven vehicular network sensing systems