key: cord-0615052-atd4cxg4 authors: Marc'ilio-Jr, Wilson E.; Eler, Danilo M.; Garcia, Rog'erio E.; Correia, Ronaldo C. M.; Rodrigues, Rafael M. B. title: Visual analytics of COVID-19 dissemination in S~ao Paulo state, Brazil date: 2020-06-25 journal: nan DOI: nan sha: 344486921dbc108cb05d1d395f7cfabd8dbb4c44 doc_id: 615052 cord_uid: atd4cxg4 Visualization techniques have proven to be useful tools to support decision making and to cope with increasing amount of data. In order to help analyzing the progression of number of COVID-19 cases, we propose a new visual analytic tool. We use $k$-nearest neighbors of cities to mimic regions and allow analysis of COVID-19 dissemination based on the comparison of a city under consideration and its neighborhood, moreover, such analysis is performed based on periods of time, which facilitates assessment of isolation policies. We validate our tool by analyzing the progression of COVID-19 in regions of S~ao Paulo state, Brazil. The novel coronavirus (SARS-CoV-2), or simply COVID-19, has already infected more than 6.2 million people around the world and caused over 350 deaths by June 1st, 2020. While understand the biological aspects of such virus is essential [1, 2, 3] , it is necessary to monitor the evolution in number of cases in cities and regions to provide information to decision makers to think about strategies of isolation policies according to the risk of dissemination. Moreover, cities must be aware of how the isolation policies are affecting the contamination by COVID-19, as well as to be aware of growth in the number of cases after leaving isolation. To monitor COVID-19 dissemination, we propose a visual analytics tool to help the analysis of the growth in the number of confirmed cases in the cities of São Paulo state, Brazil. Our tool is designed to perform local and regional analysis of the cities by inspecting the period of analysis defined in days. We argue that analyzing a period will allow to follow the evolution of number of confirmed cases more easily since one can observe the number of cases accumulated for a period and not for the whole occurrences. Besides defining a period of time, our perspective about the regional analysis could be fundamental for decision making due to disease dissemination patterns, that normally follows a hierarchy spreading from grater cities to their neighborhood. So that, besides presenting the number of cases in a city, we also present the number of cases in its region. The comparison between a city to its region allows to verify if a region has a main focus of COVID-19 dissemination or if various points of dissemination occur. For example, if a city under consideration has bigger number of cases than its region, it stands out in the influence of the region; on the other hand, if a region stands out over a city under consideration, the region can have one or more cities that could influence cities with fewer confirmed cases. Finally, if the number of cases of the city in analysis is too close to the number of cases of the region and such number is considered high, such region could be interpreted as a region with high dissemination of COVID-19. To validate our methodology, we provide several case studies by analyzing cities in the São Paulo state, Brazil. Besides highlighting the cities in the map, our tool also summarize risk of dissemination by using a radial visualization. The risk of dissemination is interpreted by the slope of the curve of number of cases in the time window. Note that we are focusing on the dissemination risk rather than the number of cases itself. In the radial visualization, the city in analysis arXiv:2007.04299v1 [cs.HC] 25 Jun 2020 is mapped to a circle while a donut chart is used to map the neighboring cities. We use color saturation to indicate risk of dissemination, that is, darker colors will represent cities with higher risk of dissemination. Our system is freely available at: https://covid19.fct.unesp.br/analise-regional/ This paper is organized as follows: in Section 2 we briefly delineate some relate works; Section 3 presents the hierarchical spreading of COVID-19, from which our methodology in based; Section 4 shows the proposed visual analytics tool; analyzes using the tool are presented in Section 5; in Section 6 we discuss some aspects of the technique; we conclude our work in Section 7. Using data to detect and quantify health events is a useful strategy to understand disease outbreaks. Usually, the strategies applied to this end use data mining or visualization techniques to monitor events related to a disease in interest. Visualization-based strategies for monitoring dissemination of diseases account from the fact that graphical representations can enhance ability to identify patterns and tendencies of data. In this case, it is better to look at visual variables, such as position, color, or area, than tables or reports to identify tendencies of growth and many other patterns. The literature presents many examples of systems using visualization techniques to enhance analysis of disease dissemination, such as the work of Hafen et al. [4] , where they delineate a strategy to detect outbreaks based on monitoring pre-diagnostic data of emergency department chief complaints. Although using simple line curves, the visualization components helped at identifying patterns in the data. HealthMap [5] , on the other hand, uses geolocation of media reports to integrate outbreak textual data in a single resource. The system is designed to help at extracting useful information and to summarize unstructured data of disease reports, facilitating analysis by decision makers. Another interesting approach is to use what-if strategies and visualize the outcomes depending on the decision alternatives applied when dealing with disease outbreaks [6] . Other strategies employing visualization tools are using heatmaps to analyze patterns of hand-foot-mouth disease [7] , employing intelligent graph visualizations and reordered matrices to understand influenza dissemination paths [8] , or visualizing the effect of decision measures implemented during a simulated pandemic influenza scenario [9] . From the data mining perspective, it is usually interesting to contrast social network post related to diseases with officially reported cases. These approaches are based on the strength of relationship between officially reported cases and the searches on the web or post on social media using words related to the diseases [10, 11, 12, 13] . For instance, the majority of the works use web and social media to detect events of Influenza-like Illness [10, 14, 15] . A good example of using data-mining techniques to detect disease outbreaks is the dutch system Coosto [16] , which uses Google Trends and social media data to detect outbreaks based on a cut-off criterion. Other works also have shown that Twitter data is highly correlated to disease activity [17, 18] , such as predicting Dengue cases [19] or using Twitter-based data to automatically monitor avian influenza outbreaks, showing that one-third of outbreak notifications were reported on Twitter earlier than official reports [20] . In this work, we provide a visual analytics approach to monitor evolution of dissemination of COVID-19 to facilitate analysis of dissemination during and post isolation. We use visualization techniques and analysis based on time windows to help analysts to monitor how the situation of neighboring cities and can affect the dissemination of COVID-19 to a city in analysis. In this section, we explain the hierarchical spreading behavior of COVID-19 1 , which we use to define the regions of each city. The basic idea of the hierarchical spreading of COVID-19 is that cities with confirmed cases disseminate infection to its neighboring cities. In this case, the neighboring cities are defined as the region of influence of the city with confirmed COVID-19 cases. Note that regional cities (greater cities) usually are most likely to disseminate COVID-19 to their regions due to greater number of inhabitants, more job opportunities, more culture access, and other social aspects that could attract people from neighboring cities. Fig. 1 illustrates the hierarchical spreading, where a city with a confirmed case is represented by an orange circle. In this case, the neighboring cities, retrieved using k nearest neighbor search, will represent the region of influence. In this work, to augment the region of influence, besides the k nearest cities, we also consider the reversed k nearest cities. That is, to generate a region for an analyzed city A, we merge the cities that are being influenced by A and the cities that could influence A. Our visual analytics approach has the main objective to help decision makers in analyzing the situation of a city based on disease dissemination. So that, besides information of a city in interest, our tool provides information of the situation of the its region, i.e., the neighboring cities. To monitor the dissemination curves and help analysis based on the number of infections by COVID-19, we delineate the following requirements for our visual analytics approach: R1: facilitate comparison of the situation between a city and its region (neighboring cities); R2: visualize evolution of number of cases as users change a time window, as well as contrasting it with accumulated number of cases since the notification of the first confirmed case; R3: visualize the dissemination curve to check if it is increasing or it is flattening; R4: quickly understand the situation of the region of a city in analysis. First, it is necessary to define which cities are part of the neighborhood of a city under consideration. Our strategy to define the neighborhood follows the hierarchical spreading scheme of COVID-19, as explained in Section 3, that states that a city with confirmed cases influence (i.e., can disseminate) its neighboring cities. This neighboring cities are retrieved using the k nearest neighbors algorithm. In our case, the neighborhood of a city A will be the union of the k nearest cities to A and the cities that have A in their k nearest cities set. Given that, we are able to analyze a city based on its dissemination as well as its region. In the following, we present how we accomplish each requirement. Fig. 2 shows the tool used to monitor evolution of COVID-19 in the São Paulo state, Brazil, as well as the dissemination risk based on cities neighborhoods. The tool is divided mainly in two components. The first component (A.) is used to monitor the dissemination risk through various visualizations for a city in analysis: first, the evolution of the number of cases for the whole period starting from February 27th is provided at the top left; second, we provide a visual representation of the dissemination risk by summarizing the region of the analyzed city at the top center of the component. The dissemination risk is depicted by the color saturation -darker colors represent cities with more critical situations -, which are mapped from the angle formed by the slope of curve of COVID-19 cases, that is, the color saturation maps the angle below line segment formed by the point (a, n a ) and (b, n b ), where n a is the number of cases in the first day of the time window (a) and n b is the accumulated number of cases in the period (from a to b); third, we provide the curves of the number of cases in the central area of the component. The first line of curves indicate number of cases for the whole period since the first notification of confirmed cases while the second line of curves indicate the number of cases only inside the time window, which could be useful to assess how isolation policies are affecting the dissemination in a period of time -note that the graph is generated based on the number of cases in the first day of the period (a) and the accumulated number of cases for the period (from a to b). Finally, the representation of time window (period of analysis) is shown at the bottom of the component. The second component (B.) is used to assess how the region in analysis is arranged in the geographical map, in which the color also indicates the dissemination risk of a city. To facilitate comparison between a city and its region, users can use the curves of infection for the city itself and the region. With such visualization, it is possible to understand if the city is being influenced by its region, when the number of confirmed cases is greater for the region, or if the city is B. Cases for the period in the sliding window Time window Analyzed city and its neighborhood influencing the region, when the number of confirmed cases is greater for the city. An example of a city influencing its region, both in the time window and the total number of cases is illustrated in Fig. 3 , note that the number of cases in Presidente Prudente is by far greater than in its region. Figure 3 : Comparison of the dissemination curves between a city and its region. R2: Visualize the evolution in a time window Using a time window, i.e., focusing analysis in a specified number of days helps at monitoring the evolution of dissemination in chunks of time, as well as facilitates the comparison among cities. In this case, questions such as which city respond better to isolation policies and how the isolation policy in a certain period of time affected the dissemination of a posterior period of time can be answered. Fig. 4 illustrates how using the time window for analyzing only the confirmed cases inside the period help us to visualizing flattening the curve of the Birigui city. R3: Visualize the increasing and flattening of the curves Although the curves with the total number of notification are sufficient to communicate how a city or region is performed on a period of time, the curves consisting in only reported cases in a period of analysis help users at focusing only on the increasing or flattering aspects of the curve, as shown for the requirement R2 and the example presented in Fig. 4 . Such approach is also useful when contrasting the curve slope with isolation indices in previous periods of time. R4: Quickly understanding the situation of a region To promptly visualize the situation of a region, we formulated a glyph to encode the slope of the dissemination curves in a period of time. In our visualization, the neighboring cities are represented by a donut chart and the analyzed city is represented by a concentric circle, as shown in Fig. 5 . We use color to represent the angle formed by the slope of the dissemination curves. That is, the greater the increase in the number of cases, the darker the color. Note that, the colors encode the increase, not the number of cases. In this section, we inspect the dissemination evolution of various cities in the São Paulo state, Brazil. We used a time window of 20 days to assess the following periods: from April 19th to May 9th, from April 21th to May 11th, and from April 26th to May 16th. Presidente Prudente Fig. 6 shows the evolution of the confirmed cases of Presidente Prudente. Up to May 16th, Presidente Prudente had 89 confirmed cases as shown on the left of the figure. However, it is interesting to note how was the acceleration of confirmed new cases in the city in the period of April 19th to May 9th, April 21th to May 11th, and April 26th to May 16th. This city in particular has been giving the lowest isolation indices in the São Paulo statewe highlight in red the mean of the isolation indices in the period. It is worth noticing that besides presenting a critic situation in the city itself, Presidente Prudente as being a regional city influence a lot on its neighborhood. Fig. 7 shows how the number of cases in the city grows as the time window moves (observe the circle in the center of the donut chart), besides the number of different cities presenting confirmed cases. For instance, the increasing number of cases in the region can be seen by looking at the curves in the period of time. Finally, see how Caiabu maintain the risk of dissemination lower than the other cities as the time pass by. Martinópolis After inspecting Presidente Prudente, we could notice that such regional city plays great influence on its neighborhood according to risk of dissemination. In this case, given the low isolation index of Presidente Prudente, it could be useful to understand how neighboring cities can respond to such risk. Taking the city of Martinópolis as an example we can see from Fig. 7 in the radial representation that social distancing policies helped the city to maintain the number of cases by decreasing interaction with Presidente Prudente -see how in the first period (from April 19th to May 9th) the city was the second in the risk of dissemination while in the last period of analysis (from April 26th to May 16th) the city lost position to many others. Fig. 8 illustrates the evolution of number of confirmed cases for the city of Martinópolis. While in the three different period the number of aggregated cases for the region grows larger, mainly due to the increasing in number of cases in Presidente Prudente, the number of cases in Martinópolis only increases by one confirmed cases in the period between April 26th to May 16th, reaching only four confirmed cases. Figure 8 : Assessment of the city of Martinópolis and its region. By comparing the curves of number of cases between Martinópolis and its region, the city seems to cope well with the increase in cases in the region. Alfredo Marcondes Unlike the majority the cities in the region of Presidente Prudente, Alfredo Marcondes did not present any confirmed cases by the time of analysis. It is interesting to note that although Presidente Prudente has a lot of influence on this city (as it has on the others), Alfredo Marcondes were able to not present any confirmed cases. However, being inserted in such complicated area besides being influenced by Prudente Prudente mainly by commuting patterns, the city government could use such information to aware population of the risk around the city. Figure 9 : Dissemination curves for the city of Alfredo Marcondes in the period from April 26th to May 16th. In the region, Presidente Prudente seems to influence the others by presenting high increase in the number of cases. In this section, we analyze the number of cases and the risk of dissemination by comparing two neighboring cities, Araçatuba and Birigui. Fig. 10 shows the curves for the city of Birigui. From contrasting the curves in the bottom with the isolation index, we can see that how it affects on the dissemination of the COVID-19. From the second to the third period, it is possible to note the curve of cases starting to present a plateau. Similarly as for the city of Birigui, in Fig. 11 we show the dissemination curves and the isolation indices for Araçatuba. In this case, the effect of the isolation is even more noticeable. Araçatuba has a more critical dissemination curve, which can be explained -together with other reasons -by the low isolation index, that was 44% +/-0.05 from April 17th to May 7th. Finally, due to the augment to 48% +/-0.055 from April 19th to May 9th, it is possible to see a little flattening in the curve from April 21th to May 11th. In this section, we aim to analyze a city inserted in a critic neighborhood, that is, a neighborhood presenting high dissemination risk due to the rapid increasing number of cases of its neighboring cities. Here, we analyzed three period: from March 23h to April 12th, from April 13th to May 3rd, and from May 1st to May 21th. Fig. 12 shows how was the situation in the period from March 23th to April 12th. We can see that among all neighboring cities, only five cities presented confirmed cases, in which that was 16 confirmed cases. On the right, it is possible to see how these cities are arranged in the map. The situation in the regions takes a really change in the period from April 13th to May 3th, as seen in Fig. 13 . In this case, we see a lot of cities presenting confirmed cases, led by the cities of Americana and Limeira. The city of Americana, for example, was only the penultimate city with the highest risk of dissemination from March 23th to April 12th (see Fig. 12 ). For instance, this period was responsible for increasing 122 of the 140 confirmed cases by May 3rd. To understand why the cities of Americana and Limeira present such increasing number of confirmed cases, Fig. 14 shows their curves for the period in analysis together with the isolation index of an earlier period, i.e., from March 23th to April 12th. In this case, we see that although the isolation index of Americana was greater than Limeira's, the city presented more cases in the period, even with a lower population -Americana has approximately 230 hundred inhabitants while Limeira has approximately 300 hundred inhabitants. The answer for this question can be found by analyzing the neighborhoods of both Limeira and Americana. Backing to the Santa Gertrudes's region, we finish by analyzing the last period, from May 1st to May 21th. Fig. 16 shows the situation of the region up to May 21th. The first thing to notice is the curve inclination of the cases in the period, where it is possible to see the exponential pattern that COVID-19 shows according to dissemination. Then, we can see that two other cities (Araras and Cordenópolis) notified almost the total of their number of cases in this period. Such pattern can be explained by the influence that the risk of the region play in these cities. Figure 15 : Comparing the neighborhoods of Americana and Limeira. For earlier periods, Americana showed higher isolation indices than Limeira, however, it is inserted in a more critical region. Finally, the higher isolation index for the both earlier period of analysis for Americana, made it possible to present lower number of infections in this critical period, as shown in Fig. 17 . In this section, we aim to analyze cities that are closer to the São Paulo state capital (São Paulo) and other regional cities. In this case, for earlier periods, readers will see that the curves seem flattened, however, this is due to the scale used to convey the number of cases. Earlier periods are influentiated by the number of cases reported by the analyzed regions in this section. For instance, analysts can recall the donut charts to visually investigate the evolution of number of cases. Santos By March 29th, the city of Santos did not present any confirmed cases, besides that, its neighborhood was not presenting critical situation if we look at the number of confirmed cases in Fig. 18 . Although the situation was clearly not very critical at such moment, the isolation indices could indicate difficult periods ahead. The low isolation indices is even more serious if taking into account the geolocation of these cities, that are very close to the capital, São Paulo. Moving the time window seven days further, i.e., analyzing the period from March 16th to April 5th, we can note a lot of change in the reported number of cases. First, 191 of the 194 cases were reported in this period, in which the city of Santos reported 72 cases only in six days, as indicated in Fig 19, in contrast for a greater period for Santo André. Fig. 20 shows the situation of Santos' neighborhood through the days. The donut chart and the color code reveals that the number of cases reported in the periods of analysis (20 days) show an increasing pattern, which configure that such region does not reached the curve maxima. This is an important information that could be used to guide decision makers on isolation policies since it seems that the applied isolation policy up to now is not being effective enough. Fig. 21 shows the number of cases curves and the donut chart encoding the increase of the curves in the region of Ribeirão Preto for the region from March 23th to April 12th. This is an example where the city in analysis is the one that influences its neighborhood, i.e., while Ribeirão Preto presented 43 cases in the period, each one of the other cities presented only two confirmed cases. number of infection that the cities in its region combined. Additionally, the increase in number of cases seems to have a slow pace until May 3th, which cannot be said for further periods, as we will see in the following. Finally, Fig. 24 shows that the COVID-19 dissemination for Ribeirão Preto has not presented a decrease. Instead, the number of cases seem to be increasing rapidly, which can be dangerous due to low isolation indices presented by the cities with more critical curves -see Fig. 24 the isolation indices for Ribeirão Preto and Sertãozinho. Fig. 25 shows how the overall curve of number of cases is similar between Ribeirão Preto (discussed in the previous section) and São José do Rio Preto are similar. In both situations, we can see a suddenly increase in the number of cases indicated by a red line segment. As for Ribeirão Preto, the city of São José do Rio Preto is the most influential in its neighborhood, as shown in Fig. 26 for the period from April 16th to May 5th. The image shows that the cases in the period for São José do Rio Preto are three times greater than the accumulated cases for the all the cities presenting confirmed cases. Advancing the time window, Fig. 27 shows the situation from May 5th to May 22th. Here, besides the rapid increase in the number of cases in the following days from May 5th in the city of São José de Rio Preto, we can note a rapid increase in the number of cases in the neighborhood as well. While such pattern can be explained by the interaction between the cities and consequently the dissemination of COVID-19 from São José do Rio Preto to its neighborhoodsee how the increase in the neighborhood occurs latter than in São José do Rio Preto -, other explanations could be accumulated number of COVID-19 tests that were delayed and reported in such period. Fig. 27 also suggest that the neighborhood would maintain the contamination low, however, it is not what happens in the following days. The curves suggest that for both the city and its neighborhood that is an increase in the number of cases, and we cannot realize any plateau in the aggregated number of cases of the neighborhood. However, it is important to emphasize how the city of Jaci went from presenting the most critical curve in the neighborhood (see the donut chart in Fig. 27 ) to only four cases in this period -such pattern can be the result of isolation policies but unfortunately we do not have data to confirm. São Paulo Finally, we analyze the evolution in number of cases of São Paulo (capital) and its neighborhood. By June 10th, São Paulo has already reported 80457 cases of COVID-19, which in the most critical situation in the whole state (even in the whole country of Brazil). Here, we summarize the evolution of number of cases for São Paulo and its neighborhood for various periods. Firstly, from Fig. 29 shows the situation for São Paulo and its neighborhood from February 26th to March 16th. Note that only five cities in the neighborhood presented confirmed cases, with one case notified for each one. Advancing for the period from March 14th to April 2th, a few more cities in the neighborhood start to present confirmed cases with rapid increase, as shown in the donut chart. From this period until June 10th, the situation in São Paulo and in its neighborhood do not change according to the increase in the number of reported cases. Fig. 30a and 30b show how the dissemination of COVID-19 still continues at a rapid pace in the region of São Paulo. The cities in the donut chart of Fig. 30a are very populous, besides having a lot of interaction with themselves. On top of that, the isolation indexes are not good, as seen in Fig. 31 which could aggregate even more the situation. Throughout the results section, we could demonstrate the usefulness of our visual analytics tool to understand the dissemination of COVID-19 in cities of interest by analyzing the influence of cities on its region and vice-versa. It is important to emphasize that our tool helps at analyzing the evolution while is not mainly focused in the number of cases a city or a region may present. In this case, our tool acts more as a mechanism to draw attention to cities or regions that present an increasing number of confirmed cases, so that, we believe that it could be employed even after infection by COVID-19 is controlled, as well as to be employed to monitor dissemination of other diseases. While we defined the region of a city as cities with spatial proximity, it is important to stress out that this may not reflect the reality in some cases. For example, for a regional city, a region may be defined as the set of cities that are influenced by or influence cities according to some aspects, such as citizens that commute from smaller cities to greater ones to work. Visualization techniques can help at discovering patterns that sometimes would be difficult to perceive by looking only to raw data. In this work, we employed visualization metaphors to analyze the evolution of the number of cases of COVID-19 in the São Paulo state, Brazil. Our methodology consisted in visualizing the dissemination based on time windows and contrasting the evolution of number of cases in the periods of analyzes with the isolation indices of the cities. Throughout few analyzes, we showed how our visualization design can help to analyze the situation of a city according to the number of cases in a time window and in relation with the situation of its region. We showed that our methodology were able to emphasize how the isolation index benefit cities regarding the dissemination, even when these cities are inserted in critic regions in the sense of the number of cases. We hope that our methodology can be used by decision makers to monitor the evaluation of the number of cases in cities and regions in order to quickly respond to dissemination risks. Clinical features of patients infected with 2019 novel coronavirus in wuhan, china. The Lancet Single cell rna sequencing of 13 human tissues identify cell types and receptors of human coronaviruses High expression of ace2 receptor of 2019-ncov on the epithelial cells of oral mucosa Syndromic surveillance: Stl for modeling, visualizing, and monitoring disease counts HealthMap: Global Infectious Disease Monitoring through Automated Classification and Visualization of Internet Media Reports Development of a quick look pandemic influenza modeling and visualization tool Visualized exploratory spatiotemporal analysis of hand-foot-mouth disease in southern china Visual analytics of spatial interaction patterns for pandemic decision support A pandemic influenza modeling and visualization tool Twitter improves influenza forecasting Towards detecting influenza epidemics by analyzing twitter messages Predicting flu trends using twitter data National and local influenza surveillance through twitter: an analysis of the 2012-2013 influenza epidemic Combining search, social media, and traditional data sources to improve influenza surveillance Detecting and predicting emerging disease in poultry with the implementation of new technologies and big data: A focus on avian influenza virus Social media posts and online search behaviour as early-warning system for mrsa outbreaks Dengue surveillance based on a computational model of spatio-temporal locality of twitter Sensitivity of the dengue surveillance system in brazil for detecting hospitalized cases Dengue prediction by the web: Tweets are a useful tool for estimating and forecasting dengue at country and city level The assessment of twitter's potential for outbreak detection: Avian influenza case study This work was supported by Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) grants 18/17881-3 and 18/25755-8.