key: cord-0766469-kbhovcox authors: Sousa, José; Barata, João; Woerden, Hugo C van; Kee, Frank title: COVID-19 Symptoms app analysis to foresee healthcare impacts: Evidence from Northern Ireland date: 2021-12-20 journal: Appl Soft Comput DOI: 10.1016/j.asoc.2021.108324 sha: e92deab29b5c33be5f5045fc636593c63382e8e0 doc_id: 766469 cord_uid: kbhovcox Mobile health (mHealth) technologies, such as symptom tracking apps, are crucial for coping with the global pandemic crisis by providing near real-time, in situ information for the medical and governmental response. However, in such a dynamic and diverse environment, methods are still needed to support public health decision-making. This paper uses the lens of strong structuration theory to investigate networks of COVID-19 symptoms in the Belfast metropolitan area. A self-supervised machine learning method measuring information entropy was applied to the Northern Ireland COVIDCare app. The findings reveal: (1) relevant stratifications of disease symptoms, (2) particularities in health-wealth networks, and (3) the predictive potential of artificial intelligence to extract entangled knowledge from data in COVID-related apps. The proposed method proved to be effective for near real-time in-situ analysis of COVID-19 progression and to focus and complement public health decisions. Our contribution is relevant to an understanding of SARS-COV-2 symptom entanglements in localized environments. It can assist decision-makers in designing both reactive and proactive health measures that should be personalised to the heterogeneous needs of different populations. Moreover, near real-time assessment of pandemic symptoms using digital technologies will be critical to create early warning systems of emerging SARS-CoV-2 strains and predict the need for healthcare resources. J o u r n a l P r e -p r o o f Journal Pre-proof implications at the meso level (specific population groups) of a metropolitan area. Strong Structuration Theory [10, 11] provided the theoretical basis for analysis. Semantic networks were created using self-supervised machine learning with node-significance measured as betweenness, interdependence as weight connectivity, and network structural change entropy based on betweenness [12, 13] . The remainder of this paper is presented as follows: The following section presents background theory, namely, mobile health adoption in COVID-19 pandemic management and the need to address local contexts of disease progression for effective public measures. Subsequently, the method is explained, followed by the results obtained by modelling COVID-19 data for Belfast and three evaluation episodes. A discussion follows, including the implications for theory and practice. Finally, the conclusions, limitations, and opportunities for future research are presented. This structure follows the publication schema suggested by [14] for the design science research (DSR) paradigm [15] . Mobile technologies adopted in COVID-19 pandemics have followed two trends. One focuses on contact tracing and the other on remote monitoring/assistance of patients. Such apps emerged in all corners of the world [5, 6] , collecting large amounts of data that the general population can now use (e.g., information or realtime warnings about contact with infected individuals) and researchers worldwide [3, 4] . Nevertheless, a digital health "implementation process is likely to be challenging and resource-intensive" [16] , and public health authorities are not yet J o u r n a l P r e -p r o o f Journal Pre-proof utilising the full potential of near real-time data to support decision-making, for example, modelling mobile data when lab tests are unavailable or scarce. Disease symptoms are essential to understand the severity of COVID-19 [17] , but COVID-19 is not a socially neutral disease [18] . For example, "[o]lder, age, male sex, comorbidities, [and specific health related symptoms] predicted critical care admission and mortality. Non-white ethnicity predicted critical care admission but not death" [19] . Moreover, "people with complex needs, vulnerable populations, and marginalised groups are at increased risk from covid-19 and the health effects of containment strategies" [20] . As multiple outbreaks and waves reveal, this "syndetic pandemic" [18] is highly dynamic and challenging to contain. New methods that integrate diverse data [21] are necessary to allow near real-time monitoring of COVID-19 at the individual and at the community or social group level. The recent advances in epidemiology using mobile data and artificial intelligence are significant. For example, Menni et al. [4] suggest "that loss of sense of smell and taste could be included as part of routine screening for COVID-19 and should be added to the symptom list currently developed by the World Health Organisation". Going beyond the relevance of a specific symptom, Menni et al [4] further state that a "combination of symptoms, including anosmia, fatigue, persistent cough and loss of appetite, (…) together might identify individuals with COVID-19", which is consistent with [3] who found that "individuals with complex or multiple (3 or more) symptomatic presentation perhaps should be prioritized for testing". However, these authors also conclude J o u r n a l P r e -p r o o f Journal Pre-proof that additional research is necessary to combine symptoms and predict COVID-19 incidence and progression. mHealth solutions are supporting significant epidemiologic studies. For example, in the UK [22] , one of the studies using mobile app data found six main clusters of COVID-19 symptoms predictive of different probabilities of intensive care need. According to the authors, the need for respiratory support ranged from 1,5% in the less severe cluster of symptoms to 19,8% in cluster 6, the most dangerous condition. The first two clusters are similar to flu and have little risk for health care support, while cluster 3 adds a new combination of symptoms: loss of smell, headache, loss of appetite, diarrhoea, chest pain, and sore throat. This cluster has a 5x higher probability of a hospital visit and a consequent impact on public service response [22] . This work inspired new studies exploring the role of near real-time symptoms monitoring and the efficient management of the healthcare response. However, existing results are not conclusive, and few studies have combined self-reported symptoms and the characteristics of the population at the (meso) city level, which could allow a more fine-grained perspective on the disease's social determinants (e.g., localities with similar health-wealth indicators). COVID-19 is being extensively studied with AI at the micro-level (e.g., individual diagnosis [23, 24] ), and many studies provide a macro vision of symptoms, country-level performance, and dynamics [3, 25, 26] . However, examining COVID-19 mobile data at a meso level, that combines the social characteristics of a specific region and its complex interrelations or entanglements (e.g., hot spots, J o u r n a l P r e -p r o o f Journal Pre-proof healthcare capacity constraints, quarantine efficacy) needs near real-time data and knowledge visualisations able to assist professionals (e.g., public health staff). Mobile apps are emerging as a quasi-testing tool. Structuration theory [27] suggests that structures and agents are inseparable, and both are necessary to understand a social phenomenon. According to this social theory, although the micro (e.g., an individual) and macro (e.g., country or continent) levels of analysis are essential, the meso level of analysis is equally important [11] . According to strong structuration theory (SST), the four elements that must be considered are: (1) the external structures (context where the action takes place), (2) the internal structures represented by conjunctural networks of agents (humans and technology), (3) the actions, and (4) outcomes of the action [11] . This theory has helped us understand "conjunctures" and their application in healthcare, particularly regarding technology adoption in practice [11] . Therefore, we considered it a suitable lens to understand the influence of local networks, linking position and practice concerning COVID-19. Modelling complex systems in uncertain environments requires a capacity for data-driven learning within the system [28] , and the modelling and visualisation of significance and interdependence. Network concepts and tools are a vital part of addressing such problems [29] . A complex network is a structure of connected (linked) elements (nodes) that allows the development of knowledge representations of the behaviour of techno-social systems [30] . Therefore, creating complex networks to represent meso-level relationships of COVID-19 offers a J o u r n a l P r e -p r o o f Journal Pre-proof promising framework for advancing our knowledge of symptom prevalence and the creation of new tools to support public health interventions. Our work follows the design science research approach, including the activities of building the artifacts (models), evaluating the results according to different metrics, and producing relevant and justified theoretical knowledge [31, 32] . Figure 1 outlines the work according to the DSR grid proposed by [33] . Tests are essential to assist public health decisions but do not evaluate severity. Healthcare facilities need tools to predict patient admissions. Understand the possible relationship between regional deprivation and COVID-19 severity. Delimitation to the Complex networks. Structuration theory. Modelling approach to foreseen COVID-19 severity at a regional scale. [34] . The data modelling process uses semantic networks to visualise the interdependence of complex sociotechnical structures [29] . This research has adopted a self-supervised statistical machine learning methodology to develop data abstractions modelled as a network of significance and interdependence, generating semantic knowledge. The approach identifies J o u r n a l P r e -p r o o f Journal Pre-proof relevant nodes of a complex network and their relationships, using information entropy based on betweenness [12, 13] . The steps to create the network models include: Selecting the environment to be modelled. Creation of data abstractions or data reduction. Each variable considered in the network is reduced to a composition of the feature name and its qualitative value or to the most usual value (mode) and classified as up (U) when it's above and down (D) when it's below (1). Production of statistical learning models from the data reduction process presented in step 2. The resulting structure is visualised as a network of significance interdependence using betweenness, communities, and connectivity weights. Measurement of the entropy of the network and filtering the most relevant variables using their betweenness value [13, 35] . The methodology was developed using R and Python in the Zeppelin framework and integrated with Gephi [36] to provide the structural measurements and network visual representation. The context is described by producing an edge list created by concatenating the string name with its value as described by the following algorithm: The result is a network of nodes with colours representing communities and lines (thicker lines represent a more relevant interaction between nodes) that can be interpreted by non-experts in mathematics or statistics (as happens in more complex representations) and support a near real-time visualisation and attention structure [37] for COVID-19 analysis. This methodology also facilitates machine learning enabled measurement of change. While the concept of entropy has been used and studied in different network contexts, the current use is significantly different as we considered a data-driven emergent network describing symptoms as a complex adaptive system [38] . We consider network entropy from the formula given in [18] , [19] integrating normalized betweenness values (Equations 1 and 2). Information entropy in complex adaptive systems [35, 38] can measure the system complexity and is an important measurement to describe the structural change of a complex network. It has been used in different contexts [13, 39] . The The initial dataset relates to the period between 22 nd March and 15 th April 2020, i.e., aligning with the early adoption of the app (more daily inputs), with a total of 1,702 updates. Fig. 2 J o u r n a l P r e -p r o o f The health-wealth analysis is presented in the following sub-sections. West Belfast was the only region that did not report breathing difficulties (node DBREATHING_NO has the higher significance in the symptoms network). As we found in clusters 4-6 presented in [22] and in the evaluation phase using the data from [17] , breathing difficulties are one of the most dangerous symptoms. Moreover, this model also reveals that most users did not report fever or cough. The most significant population of confirmed patients (EILLNESS_YES) seems asymptomatic (no breathing difficulties and did not stop most of their activities). Therefore South Belfast presents a more complex scenario when compared to West and North regions, but less concerning than East Belfast. The most severe symptoms are weakly connected (red and purple communities). The analysis of symptoms revealed an interdependence between two particular symptoms: loss of taste/smell, and muscle/joint pain -a pattern that was particularly evident in three of the regions. We could not find a uniform relationship between symptoms and social factors in each region, which suggests that different aspects, such as age or comorbidities may be more related to the severity of the disease. This type of visualization may be helpful both to public health authorities and general practitioners as it provides greater detail regarding localities that may pose a risk to service capacity. The results were compared with the Northern Ireland Statistics and Research Agency report [40] , which presents the characteristics of the population in each of Northern Ireland regions. West Belfast has the lowest percentage of people over 65 years old, followed by Belfast South (below 15%). On the other hand, Belfast East (with the highest percentage between 40 and 65 years) and Belfast North has the highest percentage of its population over 65 yrs. Deprivation measurements on the four regions are presented in Table 1 . Although previous research points to a relation between past/current pandemic periods and deprivation measures in specific localities (e.g., where there is a higher prevalence of comorbidities or where there may be difficulties in social distancing within more deprived communities), such as the work of [18, 20] , other studies are inconclusive in that regard. For example, another study [19] states that Three evaluation strategies were used following the FEDS framework suggested by [42] for studies that aim to design new artifacts (e.g., models) and support decision-making in sociomaterial contexts [43] . First, we conducted unstructured interviews with four public health experts in NI to discuss the possible impact of predicting COVID-19 infections using AI models, compared with lab testing. Our purpose was to evaluate: (1) the model's comprehensibility, (2) whether public health teams were already using similar techniques, and (3) the approach's potential. According to the feedback received, this model could be used to probe the evolution of the pandemic in specific locations lacking sufficient testing data, allowing more efficient use of testing and enhancing health evidence regarding disease progression and the emergence of new variants. We have evaluated technical risk and efficacy [42] using a different approach. We wished to confirm that our results, using complex networks, would be equivalent to other techniques. Therefore, a modelling of the symptoms in disease stages using the proposed self-supervised modelling was conducted using the data published in [17] . The results are illustrated in Fig. 4 and confirm the (c) Deceased The modelling process revealed a strong interdependence between fever and cough, and between fever and fatigue, in the group of those who had not recovered. The interdependence between age and fever increases to the age of sixty years compared to the group of recovered patients. The complexity of symptoms increases significantly in this model, including: lack of appetite, difficulty walking, or muscle pain. Finally, we evaluated the evolution of test results and the analysis of app data ( Figure 5 ). Journal Pre-proof Although insufficient to conclusively determine the accuracy of "digital testing" supported by mobile app data, there are interesting insights in the selected period. Mobile technologies in a health ecosystem are precious tools that can play a central role in managing emerging threats, such as the actual COVID-19 pandemic. Firstly, they provide easy access to near real-time assessment of specific population segments. Secondly, they provide sources of data which are crucial to location analytics. Third, the use of symptom tracking apps could contribute to encouraging protective health behaviours. Although we did not find a relationship between disease severity and social deprivation (at an area level), it is important to acknowledge that factors such as the level of knowledge of COVID-19 symptoms, and adherence to preventative behaviours, are related to the risk of spread of infection [44] . Recent studies, for example, [3] , have shown how mobile technologies can assist in high-level analysis of epidemiologic patterns in Wales and Scotland. Our work adds to this research by revealing an approach to data-driven machine learning, which produces knowledge about symptom clusters and stratification as well as its context-specific significance and interdependence. The mobile data revealed less complex patterns of self-reported symptoms of COVID-19 in regions with younger populations, more significant health deprivation, and higher unemployment rates. Over time, the analysis of complex symptom networks may provide insights into trends, particularly when tests are unavailable, and help highlight communities that should be prioritised for testing. This paper presents a semantic network approach to model COVID-19 entanglement using mobile data to extract explicit (e.g., self-reported symptoms) and implicit knowledge (e.g., location, social factors, trends). Our proposal extends past research, including a health-wealth layer and AI self-supervised learning capacity to profile symptoms in specific contexts. Modelling social and health-related data at the meso level is essential to understanding the dynamics of the virus in the community, complementing, or even replacing test results (e.g., Lateral Flow or PCR tests) when these are unavailable. Adopting artificial intelligence techniques to address the challenges of COVID-19 with mobile data is still evolving. Our work reveals that intelligent COVID-19 self-reported symptom data analysis can assist mounting an appropriate public health response and complements existing official data and lab test results. Until now, self-reported mobile data has mostly been helpful after the fact (e.g., tracking contacts or warning the user about symptoms that deserve attention). Our Ireland population. We have only evaluated the most noticeable relationships between the network nodes in this dataset. Moreover, deprivation has been attributed at a regional level, based on the user's location, rather than based on the individual characteristics of each user (e.g., income and education). A natural limitation of the approach that we have used is the existence of asymptomatic cases, which can be identified by laboratory tests. The adherence of the population to the app use is another crucial aspect to consider. Therefore, using AI techniques in mobile data is complementary and most valuable in evaluating the progression of symptoms severity (which is difficult to do with Lateral Flow or PCR tests) and providing projected utilisation of healthcare facilities. The number of records is considered sufficient to (1) evaluate the accuracy of the visualization approach and (2) reveal its capacity to represent COVID-19 entanglement based on mobile data. However, the predictive value of our model needs additional research. Despite the alignment of our results with other models [17, 22] and the apparent interest in revealing trends of COVID-19 when lab tests are unavailable (or to inform testing strategies in the case of limited testing capacity), AI and machine learning methods should be used in parallel with J o u r n a l P r e -p r o o f Journal Pre-proof traditional testing and epidemiological techniques to support public health decision-making. There is scope to add more elements to the semantic networks that we have developed besides demographic and deprivation measures, for example, a day-byday modelling comparison with reported cases and/or mobility. Comparing symptom patterns across the most relevant economic sectors in the region (e.g., construction, retail) could also provide interesting results. Inspired by the three replicability questions proposed specifically for DSR by [47] , the following suggestions are put forward. Predicting COVID-19 in China Using Hybrid AI Model Forecasting COVID-19 daily cases using phone call data Rapid implementation of mobile technology for real-time epidemiology of COVID-19 Real-time tracking of selfreported symptoms to predict potential COVID-19 Information Technology-Based Management of Clinically Healthy COVID-19 Patients: Lessons From a Living and Treatment Support Center Operated by Seoul National University Hospital A review on the mobile applications developed for COVID-19: An exploratory analysis A review of modern technologies for tackling COVID-19 pandemic COVID-19 and artificial intelligence: protecting health-care workers and curbing the spread Going viral -Covid-19 impact assessment: A perspective beyond clinical practice Structuration theory Theorising big IT programmes in healthcare: Strong structuration theory meets actor-network theory On efficient use of entropy centrality for social network analysis and community detection Approximate entropy as a measure of system complexity Positioning and presenting design science research for maximum impact Design Science in Information Systems Research The use of digital health in the detection and management of COVID-19 Estimates of the severity J o u r n a l P r e -p r o o f Journal Pre-proof of coronavirus disease 2019: a model-based analysis The COVID-19 pandemic and health inequalities A clinical risk score to identify patients with COVID-19 at high risk of critical care admission or death: An observational cohort study Using socioeconomics to counter health disparities arising from the covid-19 pandemic Self-Building Artificial Intelligence and Machine Learning to Empower Big Data Analytics in Smart Cities Symptom clusters in Covid19: A potential clinical prediction tool from the COVID Symptom study app Adaptive Feature Selection Guided Deep Forest for Classification with Chest CT A Novel Medical Diagnosis model for COVID-19 infection detection based on Deep Features and Bayesian Optimization α-Satellite: An AI-driven System and Benchmark Datasets for Dynamic COVID-19 Risk Assessment in the United States Assessing countries' performances against COVID-19 via WSIDEA and machine learning algorithms The constitution of society: Outline of the theory of structure Modelling and forecasting of COVID-19 spread using wavelet-coupled random vector functional link networks Complex systems: Network thinking Journal Pre-proof 1194-1212 Predicting the behavior of techno-social systems Research perspectives: The anatomy of a design principle Design and natural science research on information technology The DSR grid: six core dimensions for effectively planning and communicating design science research projects Influential nodes identification in complex networks via information entropy Complex Network Analysis in Python: Recognize -Construct -Visualize -Analyze -Interpret Applying global workspace theory to the frame problem Studying complex adaptive systems Entropy methods in guided selforganisation Ethnic and socioeconomic differences in SARS-CoV2 infection in the UK Biobank cohort study FEDS: A Framework for Evaluation in Design Science Research Design science research contributions: Finding a balance between artifact and theory COVID-19 risk perception, knowledge and behaviour in South Africa COVID-19: the gendered impacts of the outbreak Pries-Heje, Projectability in Design Science Research Toward replication study types for design science research