key: cord-0666628-vwd6tho6
authors: Maurya, Chandresh Kumar; Jain, Seemandhar; Thakre, Vishal
title: Digital Contact Tracing for Covid 19
date: 2021-05-22
journal: nan
DOI: nan
sha: 342700e6dd6f6b60b1a81ea54369379dae5cc856
doc_id: 666628
cord_uid: vwd6tho6

The COVID19 pandemic created a worldwide emergency as it is estimated that such a large number of infections are due to human-to-human transmission of the COVID19. As a necessity, there is a need to track users who came in contact with users having travel history, asymptomatic and not yet symptomatic, but they can be in the future. To solve this problem, the present work proposes a solution for contact tracing based on assisted GPS and cloud computing technologies. An application is developed to collect each user's assisted GPS coordinates once all the users install this application. This application periodically sends assisted GPS data to the cloud. To determine which devices are within the permissible limit of 5m, we perform clustering over assisted GPS coordinates and track the clusters for about t mins to allow the measure of spread. We assume that it takes around 3 or 5 mins to get the virus from an infected object. For clustering, the proposed M way like tree data structure stores the assisted GPS coordinates in degree, minute, and second format. Thus, every user is mapped to a leaf node of the tree. We split the"seconds"part of the assisted GPS location into m equal parts, which amount to d meter in latitude(longitude). Hence, two users who are within d meter range will map to the same leaf node. Thus, by mapping assisted GPS locations every t mins, we can find out how many users came in contact with a particular user for at least t mins. Our work's salient feature is that it runs in linear time O(n) for n users in the static case, i.e., when users are not moving. We also propose a variant of our solution to handle the dynamic case, that is, when users are moving. Besides, the proposed solution offers potential hotspot detection and safe-route recommendation as an additional feature, and proof of concept is presented through experiments on simulated data of 10M users.

Contact tracing is the problem of identifying users who have come in contact with a user within a certain distance for a specific amount of time. This problem came into the limelight during the COVID-19 pandemic, which is thought to spread from humans to humans. The main problem is that users are asymptomatic and can pass the virus to other users unknowingly for weeks or months. In such a case, it is challenging to identify users who came in contact with a particular user, given the users dynamic and complex movement patterns. Another prevalent problem during the pandemic situation is that users are interested in knowing the hotspot areas to avoid them while visiting. Therefore, there is a growing need for a recommendation of a safe route for travel.

Current solutions for contact tracing problems can be categorized into (a) human-based approach, (b) Bluetooth (BT)-based approach, and (c) GPS-based approach. In the first approach, police personnel are deployed to follow the user's movement trail and find out users who were present in close proximity of the particular user for a specific duration. This approach is costly, cumbersome, and time-consuming [13] . In the second approach, each user is asked to install an app such as India's Aarogya Setu app or Google-Apple privacy-preserving app. Such apps use BT signals for exchanging data between devices. There are certain limitations of these apps. For example, the limitations of Aarogya Setu are: (a) it does not tell if we were in contact with a user a week or month ago who recently got diagnosed with COVID-19, (b) it also does not tell which areas are hotspots and should be avoided, (c) it is based on self-assessment only, and hence reliability is a concern.

The solutions for hotspot detection are currently based on the government's rules and regulations. For example, sites like COVID tracker 1 mark geographical areas as hotspot-based on the directions received from the government. They do not provide real-time hotspot information, such as alerting the user that they are moving through hotspot/containment zones specially in villages where monitoring is difficult. Further, the existing solutions lack the feature of recommending safe routes for travel, such as Google map.

To overcome the limitations of the existing solutions, we propose an assisted GPS location-based solution that receives the assisted GPS coordinates of the user in the backend and without forcing the user to keep on Bluetooth time and again. This solution does not require any user interventions and sends the assisted GPS coordinates to the cloud periodically. Received coordinates are mapped to the proposed M-way like tree data structure in degree, minute, and second (DMS) format. Once the mapping is over, the proposed clustering algorithm executes to track the history of users who spent at least minutes with other users. Our solution is scalable and can find users who happen to make a contact event in 17 mins among a population of 10M users.

Major differences between our solution based on assisted GPS and BT-based solution for contact tracing are: (i) our approach can provide location information where the user got potentially infected whereas BT-based approaches fail to do so since they do not provide location information. Why this information may be crucial is that let us suppose that the user visits a dairy milk shop every day besides tons of other places. On one day (s)he got COVID +ve. In this case, (s)he might be interested in knowing where (s)he got the virus so that they can inform their family members to be vigilant. In this scenario, the BT-based approach has no clue of the location, (ii) assisted GPS-based solution is centralized where BT-based one is decentralized, (iii) our solution is more robust in the sense that BT-based solution communicates with many heterogeneous devices. In contrast, ours does not, (iv) BT always needs to be turned on, which usually user forget or do not turn on for power saving purposes. As for assisted GPS is concerned, once the app is installed, it will keep using the location sharing in the background and does not ask the user to turn it on time and again. This option is not available in BT-based solutions currently, (v) the scalability of BT-based apps is also a concern since these apps can communicate to only eight other BT devices at a time. Further, note that we are using assisted GPS which is more reliable than GPS, where location information is calculated using satellite signals, mobile towers, and wifi beacons. Therefore, our solution will work in indoor settings as well. From now on, whenever GPS is mentioned will mean assisted GPS. To emphasize to our readers that our solution offers an alternative to BT-based solutions and does not replace them.

In short, we make the following contributions:

• Propose a solution for contact tracing that runs in the time linear to the numbers of users i.e. ( ). the space complexity of the solution is also linear in the number of user identifiers, i.e., ( ). • Our solution can handle static as well as dynamic case. That means if users stay at the same location (static case) or the users are moving together (dynamic case). • The proposed solution relies only on location sharing information, which is the property of most android applications such as Google news, maps, etc. • Our solution can additionally provide hotspot information (areas of large gatherings) and recommend a safe route for travel.

Contact tracing and regular monitoring of infectious diseases (COVID-19) are essential for public health. Utilizing emerging technologies such as remote sensing, internet-based surveillance, telecommunications, infectious disease modeling, Global Positioning System (GPS), IoT devices, and mobile phones, such contagious diseases can be monitored, prevented (by generating alarms after real-time predictions), and controlled [2] . New approaches to deal with such epidemics are recently categorized with the newly coined term such as digital epidemiology [16] . Some works have appraised the characterization of the human mobility patterns by using Call Detail Records (CDRs) for understanding and modeling epidemics [9] . The authors discovered the opportunity to utilize individual mobility proxies to describe commuting flows and predict influenza-like illness diffusion. Though depending on the human mobility data source, their predictive accuracy about epidemic invasion timing and propagation of patterns differed. Mastrandrea et al. [14] utilizes wearable sensors to collect contacts among students and compare the results with the personal diaries' contacts. The authors also compare how such epidemic diseases spread using two different contact networks (from diaries and sensors), which displays a notable alteration in dynamics. Contact tracing is one of the very useful measures which primarily focuses on potential next-generation cases. It has been proved a highly successful strategy if the numbers of infectious epidemic cases are low, or at early stages of the outbreak, mainly, the disease is asymptomatic (still infectious), because it offers a way by which such individuals are easily recognized. Broadly, there can be two approaches for modeling contact tracing [10] : (1) Populationbased modeling is the top-down approach used to depict the disease dynamics on the system level, which is typically used to analyze research related matters from a macroscopic perspective. (2) agentbased modeling, which is a bottom-up technique and deals with every individual agent, each with their infection states and own movements. It is usually used to evaluate adaptive and heterogeneous behaviors. The latter approach is more realistic. However, it can be computationally expensive. Few research works [1, 15] have introduced the stochastic model used to reduce a deterministic approach for attaining the fundamental dynamics of the epidemics and other associated measures. Mostly, previous models deal only with generic networks. To measure the precision of the trace contact models, there is a need to account for the network of contacts. For example, Huerta and Tsimring [8] present a stochastic model to estimate the effect of random screening and contact tracing as part of the epidemic control strategy in complex networks. This work indicates that a significant outbreak can significantly be reduced via tracing contacts at a low additional cost. A similar approach is also used by Farrahi et al. [5] .

Ferretti et al. [6] state that isolation and contact tracing are being practiced but not preventing the COVID-19, which is why the same high number of asymptomatic infected individuals remain undetected, and cases continue to spread. Therefore, the authors propose mobile apps to trace previous mobile contacts and show mathematically that epidemics can remain even when not all population use the application. Hellewell et al. [7] also propose similar inference through the simulated model. In most scenarios, highly effective case isolation and contact tracing are sufficient to control a new outbreak of COVID-19 within three months; however, only 79% of the contacts are traced. Nevertheless, such conditions create smartphone-based tracing, which is far from a realistic solution. One of the first efforts through mobile phones to trace the contacts was the FluPhone application developed by Cambridge University [21] which uses Bluetooth-based wireless signals to estimate the physical contact. It also asks the users to report on flu-like symptoms to appraise the risk of infections. Next, the Singapore Government also developed a mobile app called Trace-Together for COVID-19 2 . This also uses Bluetooth technology, and it had already been utilized to control the disease spread. Few similar works also focused on privacy as the Pan-European Privacy-Preserving Proximity Tracing (PEPP-PT) [18] . Finally, Google and Apple have teamed up to design, develop and integrate a similar approach into the Android operating Systems and iOS. The proposed solution is more efficient and ubiquitous to users. Aarogya Setu app 3 from the Indian government works on location sharing and Bluetooth. Secondly, it tells us that we are in contact with a COVID +ve person based on Bluetooth proximity. There are certain limitations of Aarogya Setu. For example, it does not tell if we were in contact with a person a week or month ago who recently got diagnosed with COVID-19. It also does not reveal which areas are hotspots and should be avoided. It is based on self-assessment only, and hence reliability is a concern. Recently, Mahapatra et al. [12] present a digital contact tracing solution based on a dynamic graph streaming algorithm. Concretely, they use an index-based adjacency list to store graphs whose nodes are users and edges are close contacts. However, they do not mention if they use Bluetooth or GPS to find direct and indirect close contacts. The proposed solution runs in ( | |) where , , | | denote the number of contacts, level of tracing, and the number of infected users, respectively. All Bluetooth (even GPS) based apps for contact tracing suffer from a severe drawback: they may be subjected to high "false alarm". For example, when two users are standing opposite side of a wall, Bluetooth-based contact tracing apps will signal that they are within the infection-proximity [19] . There have been reports of delayed notification (1+ days) to individuals regarding contacts with COVID +ve patients. Such delays can create depression in users later on. Further, many of these apps do not show hotspot areas nor recommend a safe route. To ameliorate/minimize some of the above problems and provide additional features, our solution relies on location sharing via GPS/assisted GPS such as magnetic fingerprints and the BLE beacon RSSI (Received Signal Strength Identifier) nearby, for localization [17] in an indoor setting.

As discussed in the introduction section, our solution for contact tracing relies on the location sharing data, i.e., GPS location of the users collected through our app. The naive solution will be to compute the Euclidean distance (or Haversine distance as shown in (1), where and for ∈ {1, 2} are lat/long of two users, if one wants to be more exact) every mins for finding the contact event.

=2 arcsin

A contact event is defined as when two users are within a distance of meter for at least mins. The naive solution for contact tracing requires computing pairwise distances and has the complexity of ( 2 ) for users and hence not feasible for large . Therefore, we discuss a data structure which is M-way like tree data structure for storing GPS locations (latitude/longitude) as shown in fig. 1 . In other words, we map latitude/longitudes to the proposed tree as discussed 2 https://www.tracetogether.gov.sg/ 3 https://www.mygov.in/aarogya-Setu-app/ later in details. Further, latitude and longitudes are mapped to two separate trees for parallel computation of the contact distances. How do we arrive to the direct distance between two users from latitude and longitude distances is presented in detail in section 3.2.

Note that the M-way like tree is not the same as k-d tree which is binary tree and requires comparison for insertion. Whereas M-way like tree involves look-up operation and no comparison. In this spirit, it is a kind of multi-level hashing. A GPS coordinate which consists of latitude and longitude is usually expressed in the degree, minute, and second format (DMS), e.g., (28°50 ′ 30.12462"). For latitude and longitude, degree ranges in [0, 1, . . . , 89] and [0, 1, . . . , 179] respectively while minutes and seconds vary in [0, 1, . . . , 59] for both which are shown in the top 3 gray levels in fig. 1 . For notational brevity, we do not show degree (°), minute (') and seconds (") symbols in the fig. The last gray level is the partition of the seconds into equal parts. As the most interesting property of M-way like tree data structure is the partition at the leaf, i.e., partition every second of GPS (which is equal to approx. 30 meters) to equal parts so that = 30/ . Here, is our contact distance which is a tunable parameter ( = 5 in our case). The benefit of the partition is that all users within a distance of meters on latitude will fall into the same leaf (bucket) and form natural clusters. For example, consider = 5 and hence = 6, then the intervals for the fractional part of the seconds (which varies in [0,1]) at the leaf node will be [0, 1/6), [1/6, 2/6), . . . , [5/6, 1]. Users 1 and 2 whose latitudes are (28°50 ′ 30.12462") and (28°50 ′ 30.05462") respectively will map to the same leaf node because their fractional part of the second (.12462 and .05462) will map to the same leaf and thus are in the same cluster. The approach presented above misses users who are within contact event distance but fall in the neighboring clusters. For example, users 1 and 3 whose latitudes are (28°50 ′ 30.12462") and (28°50 ′ 30.17462") will map to different clusters but are within the contact event distance = 5 . An example of fig. 2 with four users. Out of these four users, two fall in one cluster, and the other two fall in a different neighboring cluster. To capture missed cases during the mapping of lat/long to the tree, we perform pairwise distance computation. For example, we need 4 distances computed (

in the worst case (worst case occurs when half of the users fall in one cluster and the other half fall in the neighboring cluster). Had we not mapped users to the tree, 2 computations are required with quadratic complexity ( 2 ) and not preferred for large . To see this in fig. 2 , instead of 6 pairwise distance computations in our running example, we needed only 4 pairwise distance computations in the worst case and hence saving two distance computations. Such a saving becomes paramount for large .

We collect the GPS locations using our in-house app and send it to the cloud every /2 mins for tracking users at least for mins ( = 5 mins in our experiments). That means we map GPS locations to M-way like tree structure every /2 mins. We have two trees for every interval of /2 mins: one at time step /2 and the other at . At this step, we perform the intersection of leaves, and the ID of the users found in the intersection is stored. This method guarantees that two users in the same cluster have been in contact for at least /2 mins. It also ensures that it will capture all users who have spent at least mins together within meter distance. One limitation is that it might miss users who have been in contact for 3 or 4 mins. Since our goal is to find all users who came in contact for at least 5 mins or more, we can ignore such corner cases.

It is not necessary to send GPS coordinates every /2 mins. However, to guarantee that two users make contact for at least mins for the infection to occur, it is necessary to send GPS data every /2 mins as shown in fig. 3 for = 5 mins. We can see in the fig. 3 that if two users met at time 1 = 1 and remain in contact until 2 = 6 (Δ = 2 − 1 ) then we can see that we have sent their data two times (at = 2.5 and = 5 mins). As a result, we can capture the contact event. On the other hand, if we are sending data every 5 mins, it is not possible to capture the aforementioned contact event. This approach guarantees that all users whose contact event duration is at least 5 mins will be definitely captured. However, it may miss contact events of duration 3 or 4 mins.

We want to compute the distance = as shown in the fig. 4 . That is, all users at a distance from user . For small distances ( < 10 or 15 m in our case), we can safely assume that the points and lie on Euclidean coordinate system. Therefore, we can draw a simple circle at whose radius is (otherwise, it is great circle). Main challenge: computing is easy (we can take Euclidean distance or Haversine to be more precise). But, Euclidean computing distance for a large number of users (n) is computationally challenging. It takes ( 2 ) for pairwise distances. When = 1M, it takes 10 12 distances to compute and storing them requires ( 2 ) space. Thus, pairwise Euclidean distance computation is impractical since we want a scalable solution where can be as large as 1.3B. Therefore, our main idea is to avoid computing directly and instead compute and distances, which are lat/long distances between and .

Pythagorean theorem says that 2 = 2 + 2 . Thus, if we can compute lat/long distances, we can find . For = 5m. We can find either (lat, long)=(3m,4m) or (4m, 3m). That means we find all users who are 3m away from user in question on latitude and 4m on longitude or vice-versa. Choosing (lat, long)=(3m,4m) gives a more accurate solution since computing distance along longitude depends on the accuracy of the distance along latitude [20] . Further, notice that we map latitude/longitude of a user to two separate M-way like trees. That means we create a tree for mapping latitude and a tree for longitude. Such trees are created every /2 mins, and users making contact events are found by the intersection of leaf nodes. More details of the procedure is described in the next section. Approximating diagonal distance via distances along latitude/longitude does not always hold. In the previous example, when ( , ) = (3 , 4 ), we are finding users who are within 3m (4m) along latitude (longitude) and clustering them through mapping their latitude (longitude) on the M-way like tree data structure. However, the catch here is that there is a possibility that users within the contact distance along latitude but far away along longitude (or vice versa) are mapped to the same leaf node. As shown in fig. 5 , users at locations B and C will fall in the same leaf node when we map longitudes to the tree; however, locations Intersection of leaf nodes Figure 6 : Finding users in close proximity through intersection of corresponding leaves of two M-way like tree in the static case B and C can be miles away along latitude (parallel lines passing through B and C but not shown in the figure) . This can degenerate into a case when many users are mapped to the same leaf node of either tree. Such a situation can be avoided by running our proposed methodology over each city instead of the whole country for contact tracing. What will happen is that the probability that many users align to the same latitude (longitude) simultaneously for a short duration will be very low due to population dynamics. Therefore, we assume that a leaf node will not have many users mapped to it in the worst case. In the next section, we present solutions when the users stay at the same location for at least mins (called the static case) and moving (called dynamics case). Though dynamic case subsumes the static case, we bifurcate the mapping into two cases for speed-up purposes.

A case is static when users who participate in a contact event are not moving at least for mins. A static case can be easily handled by mapping lat/long to the M-way like tree every /2 mins. The intuition is that two users who stay within a distance of m for at least mins will map to the same leaf node of the M-way like tree. We then find the intersection of the two M-way like trees so obtained (intersection of two corresponding leaves to be more precise since users are static). Users in the intersection will be our output candidates (see fig. 6 where leaves from two trees are shown facing each other). Illustration of the static case: In the fig  6, lat/longs are mapped to two trees as discussed before. The user 1 , 2 and 3 were at a location at time (top leaf node). After + Δ (Δ = 2.5 mins), users 1 , 3 and 3 were found at the same location resulting in the leaf node shown in the bottom. When the intersection is performed, 1 , 2 and 3 are found to be present at the same location. That means, user 1 , 2 and 3 participated in a contact-event. The entire process of the static case is described step by step in Algorithm 1.

The dynamic case is when users move together in close proximity (within a radius of m). Due to the movement, their lat/log are continuously changing, resulting in their mapping to two different (non-corresponding) leaf nodes in the M-way like tree. To find out where users in close contact map to the leaf require performing the intersection of each leaf node against all nodes in the latter tree. This is computationally intractable, requiring ( 2 ) intersection. Therefore, we solve this issue via an approximation. First, we discuss when users are walking on foot (movement in a car can be handled similarly). On avg, a pedestrian speed is 1.4 m/s. That means, in 5 mins, they walk around 420 meters which maps to 14" in lat/long. Therefore, we perform the intersection of each leaf against all leaves in the latter tree which are 14" left and right of the corresponding leaf as shown in the fig. 7 (left/right is considered since users can move left side or right side of the current latitude. Similarly, up/down movement might happen for longitude). In either case (static/dynamic), once we find users who made contact along the latitude and longitude, we need to compute the distance AB (the actual contact direction) as discussed in section 3.2. To build the final contact list, we proceed as follows. Firstly, we create a set of pairwise contacts from the contact list found by the intersection operation as mentioned previously. This step is repeated for the longitude contact list as well. Secondly, the two sets are intersected to generate the final contact list, i.e., the list of users who actually made a contact event along the direction AB. This operation cost ( ) (due to sets being a hashmap) where the hidden constant is the length of the longest contact list, which is bounded above by a constant since a user can not meet more than a certain number of users in (=5) mins. From the contact list, we can easily create a contact vector of each user and store in an adjacency matrix as discussed in the next section. The entire process of the dynamic case is described step by step in Algorithm 2.

Our methodology can be used to detect potential hotspot areas in a city. Note that there is no unanimous definition of what constitutes a hotspot in epidemiology. As per [11] , hotspot can be defined in three ways: (a) as areas of disease elevated occurrence or risk, (b) as areas of frequent disease emergence or reemergence, and (c) an area of elevated transmission efficiency. We take the third definition that is based on the assumption that large gatherings can stimulate the rate of transmission. Therefore, we identify areas in a city where large gatherings are happening along with COVID +ve cases and predict those areas as potential hotspots. The potential hotspot information can help safeguard users and discourage them from freely moving in those areas. Note that our method does declare gatherings as potential hotspot if there are no +ve cases in that gathering (cluster). Our assumption is that either COVID +ve user is surrounded by normal users who are not quarantined, and in some cases where COVID +ve user might also be moving, say coming back from a testing center. So, hotspot may contain COVID +ve users who are moving. In order to locate areas of a potential hotspot, we proceed as follows. Contact list of each user obtained from the contact tracing methodology presented in section 3 from the last 14 days is stored in the form of an adjacency matrix as shown in (2).

(In actual implementation, it can be implemented using a hashmap where keys are user ids and values are queue/list which stores the ids of users in the order the contact happens). We call each row as a contact vector for that user. We can easily maintain the contact vector by adding /deleting from the queue/list the new/old contacts. Note that the adjacency matrix stores users who came in contact with other users. It does not store the COVID +ve users. In other words, we rely on official data from government authority to declare a user as a COVID +ve based on the diagnostic report. Once we receive such information, we can mark that user as a COVID +ve by setting a bit along that row or by maintaining another bit vector which is indexed by the user id as shown in fig. 8 . To detect a potential hotspot area, we provide a reference GPS location. This can be a city location as searched by users interested in knowing hotspot areas. The reference GPS is mapped to the lat/long tree created in that period. Assuming a 10 km radius from the reference point though, this can be dynamically adjusted according to the city area. Firstly, all the users are identified who are within the 10 km radius from the reference point. The 10 km of distance maps to around 5.39 minutes in lat/long. Therefore, all the users in the leaves, which are 5.39' left and right of the reference point, are identified. The user ids of such users are mapped to the bit vector ( fig. 8 ) to find out users who are COVID +ve within our search area (10 km, for example). Using the adjacency matrix, we also identify users who came in contact with users who are found COVID +ve in the search area. Such users might form susceptible users. Further, user ids of the susceptible users are cross-tallied in the search area (since users who came in contact with the COVID +ve user in the search area days ago may have moved to a different city). Finally, we have built a set of COVID +ve users and suspected users. We can mark COVID +ve and susceptible users on the city map, and if the number of COVID +ve users exceeds a threshold, it can be marked as a potential hotspot city.

Based on the potential hotspot areas, we can recommend a safe route for travel. Users currently use google map/Apple Maps as a primary navigation medium for traveling from one place to another. However, these maps have limited information about whether a route is safe for travel. For example, unless someone marks specifically that a particular route is blocked or traffic is slow, current navigation systems have limited information to tell users in advance that route is not safe. This situation was observed during the recent exodus of people from red zones such as Mumbai/New Delhi. People were following google maps to travel to their hometowns. However, the map took them to other hotspot cities on the way. As a result, they have to take a long detour once they reached the hotspot city on the way because entry into the city was not allowed. In such situations and many others, such as avoiding potential gatherings that may be a potential hotspot, our approach can recommend a safe route, thereby avoiding hotspot areas in advance so that users can plan their travel accordingly.

Suppose a user wants to travel from source location to target location as shown in fig. 9 . If the user follows Google map, it might show the green route since it is the shortest path. However, it may take the user to the hotspot city since it does not have potential hotspots information in real-time. Following our previous approach, we have the information of potential hotspots based on the previous section's discussion. Therefore, we remove the nodes representing the hotspot (red node in the fig. 9 ) as well as edges incident to it and run the Dijkstra's algorithm [19] [4] on the remaining subgraph. The algorithm produces the shortest path in the subgraph (marked in blue color). This path does not fall in red zones/hotspot areas and thus safe for travel. We verify our approach on a toy dataset and show its efficacy in the experiment section. The safe route recommendation algorithm is described in Algorithm 3. In the above algorithm, is the input graph, and are the sources and destination nodes (cities), and is the list of nodes marked as a hotspot thus not safe for travel.

In this section, time and space complexity analysis of various operations is presented.

Time complexity of the mapping a GPS coordinate to a tree leaf is (ℎ) where ℎ is the height of the tree. In our case, ℎ = 4, so mapping GPS coordinates to tree leaf is a constant time operation (1) . Further, append operation of the users into the list attached to the leaf node will have a time complexity of (1). After mapping, we need to find a new cluster of users missed in the mapping process as discussed in section 3.2. Such users can be put into new clusters by sliding a window of size 2, moving over adjacent pairs of clusters, and computing Euclidean/Haversine distance for each pair of users from each cluster. Time taken by this operation is ( ℓ), where and ℓ are the sizes of two clusters and (= 90 * 60 * 60 * 6) is a constant. As discussed previously, the size of the clusters can not grow unbounded in the worst case due to population dynamics. Usually, and ℓ are in the range of 1 to 100 since within a circle of 5m radius; we assume no more than 100 users can stay. Further, the operation of finding missed users can be computed efficiently using parallelization (presented in the empirical section).

Time complexity of the intersection of leaf node (clusters) in the static case is more straightforward. We need to perform the intersection of ( ) corresponding leaf nodes in two trees obtained at time intervals of /2 mins. Here is as defined previously. Intersection operation in the dynamic case is involved and incurs extra overhead due to one leaf being intersected with 168 (=28"x6) leaf nodes in the other tree for the pedestrian case. Consequently, time cost goes up by a factor of 168 in the dynamic case compared to the static case. Further, note that the intersection operation in both the cases (static and dynamic) can be efficiently parallelized since one leaf node can be intersected with other nodes in parallel. Such parallelization tricks are used in the empirical section to reduce the runtime cost of the intersection operation.

Space Complexity: Total number of internal nodes for mapping latitudes is 90+90 * 60+90 * 60 * 60+90 * 60 * 60 * = 329490+32400 . Where is the number of partitions of a second. Take = 6 for = 5,we have total internal nodes as 2273490. Assume each integer takes 8 bytes. The space complexity of an empty tree is approx we maintain a total of 16 trees before we can flush the trees and reuse the space. Hence our approach takes roughly 4GB of space for generating contact vectors compared to 55GB in [12] .) In our implementation, we take latitude ≃ (7°, 37°) and longitude ≃ (68°, 97°) for India for which empty tree takes ≈39 MB when representing internal nodes as four-level dictionary keys and leaves as the empty list in the python programming language. We store app IDs or some other useful global identifier such as IMEI number into leaf nodes to track each user. Thus space complexity for storing user id into leaf will be ( ) where is the number of user ids. Therefore, the dominating part will be used for storing user ids, and it scales linearly with the users.

One crucial point to consider is that the leaf node is implemented as a list or array which has the cost of (1) for append operation. However, a growing list beyond specified size incurs extra overhead. However, we must mention that any list does not hold more than a constant number of users since each leaf maps GPS range of 5 meters and in a circle of 5-meter radius, we assume that not more than 100 users can stand.

In this section, we show the working of the proposed methodology for contact tracing, hotspot detection, and safe route recommendation on simulated data.

To show the scalability of the contact tracing approach, uniformly at random GPS coordinates in the range latitude ≃ (7°, 37°) and longitude ≃ (68°, 97°) are generated for 2/4/6/8/10M users. We will map these GPS coordinates to the M-way tree like data structure as discussed and compute the contact vectors. Our goal is to estimate the time it takes to generate the contact lists. The serial and parallel versions for static and dynamic cases are implemented in C++ with openMP [3] (for multi-threading). The experiments are conducted on a personal computer with Intel(R) Core(TM) i5-7500U CPU and 8.0 memory (RAM) running Ubuntu operating system. The results are shown in the table 1. We have several observation from the results in the table. Firstly, mapping GPS to tree takes at most 80 secs for 10M users using serial implementation, whereas at most 86 secs in the parallel implementation. A little increase in time could be due to thread scheduling overhead in the latter case. In other words, mapping GPS to trees via parallel implementation is not recommended, and serial implementation suffices to practical cases. Secondly, compared to mapping GPS coordinates to trees, performing intersection of leaves is time-consuming. Thirdly, intersection time using parallel implementation achieves dramatic speedup. For example, parallel implementation in the dynamic case for 10M users gains 4x speedup. Finally, generating the final contact vectors for 10M users takes around 16.5 mins (990.87+85.78 secs). In other words, if we want to track contact events every 5 mins, we need to maintain four sets of lat/long trees before we can flush the data in the trees and reuse them again, thereby reducing the memory footprint.

7.1.1 Baselines. One of the baselines used to compare the runtime of the contact tracing approach is by performing pairwise comparison (the naive solution). The result from the serial and parallel implementation of the baseline is shown in Table 2 . It is obvious that our approach beats the naive solution by a significant order of magnitude and favors practical implementation.

To verify the approach discussed in section 4 for hotspot detection, we first test it on 20 users and mark them on google map as shown in fig. 10 . All the users, the COVID +ve patients, and the suspected users (users who came in contact with COVID +ve users but still having no symptoms) are plotted on the city map. The figure also shows that at time 0 , users ids 1, 3, 9, and 14 are COVID +ve. The fig. 10 (b) shows that user 6 who came in contact with user 9 (a COVID +ve), has been suspected and marked with the yellow color. Similarly, user 16 becomes susceptible at time 2 and so on. Indeed, the figure shows how the infected/suspected users may be moving in the city to provide real-time information of danger zones, benefiting normal users to plan accordingly. If the number of infected/suspected users in the map relative to the reference point exceeds a certain threshold, then the area around the reference point can be marked as a potential hotspot area. Alternatively, any area comprising a suspected COVID-19 positive patient can be declared as a potential hotspot.

One concern related to privacy is that we are marking users at the individual level. To avoid such a thing, we can draw a circle of larger radius such as 10 or 20 meters to conceal the COVID +ve users' identity. Alternatively, we can group nearby users and form large clusters to make the identity indistinguishable at the individual level. We can see that Google map recommends a route from Pune to New Delhi, which passes through Indore city, a hotspot area and not allowed for travel. On the other hand, our approach recommends a route that avoids the hotspot city.

To show the scalability of the proposed approach, the plot of runtime with respect to the number of users considered in a hotspot region is shown in fig. 10(f) . The graph clearly shows that the runtime is following a near-linear trend. Therefore, our approach to detect hotspots in real-time may be useful.

The evaluation of the safe route recommendation approach is done in the following way: 1) The proposed method is compared with an existing route Recommendation system, that is, the Google Map, in terms of the total cost of the path. The total distance from the source to destination constitutes the total cost and whether it is a Hotspot Zone free path or not. The efficacy of the proposed approach is analyzed on a real World dataset, which is extracted from Google Map. For the experiment, we have considered major cities in India (Indore, Bhopal, Chennai, Mumbai, Delhi, Bangalore, Lucknow, Hyderabad, Pune, Kolkata).

We have compared our proposed approach with the current Google map recommendation, considering Indore as a Hotspot City, Source as Pune, and Destination as New Delhi. Google maps recommend the path which passes through the hotspot city(Indore) as shown in Fig. 11 (a) with a total cost of 1455 Km whereas, the path displayed by the proposed algorithm is an alternate route where the intermediate city is Bhopal(non-hotspot city) instead of Indore with a total cost of 1540 Km which is displayed in Fig. 11(b) . Indeed, the path recommended by our approach is not optimal. Still, since our objective is to avoid a hotspot city, our approach is suitable for choosing a safe route rather than an optimal path.

The proposed approach for contact tracing seems plausible based on the initial experiments on simulated data. Safe route recommendation and potential hotspot information further add new features to our method. We are planning to release the app in the future after verification on the real-user study.

On the Construction of Some Deterministic and Stochastic

New technologies in predicting, preventing and controlling emerging infectious diseases

OpenMP: an industry standard API for shared-memory programming

A note on two problems in connexion with graphs

Epidemic contact tracing via communication traces

Quantifying SARS-CoV-2 transmission suggests epidemic control with digital contact tracing

Feasibility of controlling COVID-19 outbreaks by isolation of cases and contacts

Contact tracing and epidemics control in social networks

Challenges and Potential Opportunities of Mobile Phone Call Detail Records in Health Research

Epidemic models of contact tracing: Systematic review of transmission studies of severe acute respiratory syndrome and middle east respiratory syndrome

What is a hotspot anyway? The American journal of tropical medicine and hygiene

Ranjan Chattaraj, and Soumya Banerjee. 2020. Dynamic Graph Streaming Algorithm for Digital Contact Tracing

Georgios Kambourakis, and Igor Nai Fovino. 2020. Demystifying COVID-19 digital contact tracing: A survey on frameworks and mobile apps. CoRR abs

Contact patterns in a high school: a comparison between data collected using wearable sensors, contact diaries and friendship surveys

Contact tracing in stochastic and deterministic epidemic models

Digital epidemiology

Indore Navigation Mobile Application using Indore Positioning System (IPS)

The problems with contact-tracing apps

Precision of coordinates -OpenStreetMap Wiki

Fluphone study: Virtual disease spread using haggle