key: cord-0433595-pr8flgd2 authors: Biswas, Arpita; Bannur, Shruthi; Jain, Prateek; Merugu, Srujana title: COVID-19: Strategies for Allocation of Test Kits date: 2020-04-03 journal: nan DOI: nan sha: 41ce72cb8df0bc21b693f002e6957342c589f7b9 doc_id: 433595 cord_uid: pr8flgd2 With the increasing spread of COVID-19, it is important to systematically test more and more people. The current strategy for test-kit allocation is mostly rule-based, focusing on individuals having (a) symptoms for COVID-19, (b) travel history or (c) contact history with confirmed COVID-19 patients. Such testing strategy may miss out on detecting asymptomatic individuals who got infected via community spread. Thus, it is important to allocate a separate budget of test-kits per day targeted towards preventing community spread and detecting new cases early on. In this report, we consider the problem of allocating test-kits and discuss some solution approaches. We believe that these approaches will be useful to contain community spread and detect new cases early on. Additionally, these approaches would help in collecting unbiased data which can then be used to improve the accuracy of machine learning models trained to predict COVID-19 infections. 1. The known cases are mostly self-reported. Thus, the testing happens only among the symptomatic cases and misses asymptomatic cases that do not report their contact with confirmed cases. 2. Asymptomatic patients who get infected via community spread are missed. 3. The data collected using the current testing strategy is heavily biased towards patients with symptoms of infections. This makes the data inappropriate to be used directly for training an infectionprediction model. In particular, the asymptomatic cases are the most important factors for community spread. Since these cases are hard to detect, the current testing policies may not be adequate to contain the spread of the virus [14] . Thus, it is important to allocate a separate budget, K test-kits per day, targeted towards (a) detecting new cases early on and also (b) collecting unbiased data. We formally define the test allocation problem in Section 3. Next, in Section 4, we propose some strategies for allocating a fixed number of test-kits across the population each day. We assume that, on each day i, a budget of K i is declared for containing community spread, where K i represents the number of test-kits available on the next day, i.e., (i + 1) th day. We consider the problem of selecting a set of K i individuals on that day, and recommend them to take a test (for detecting whether he/she is infected). We assume that there is a symptom-checker app, where individuals can fill in their personal information (such as age, gender), along with medical history (diabetes, cardiovascular diseases), and travel information (country traveled within the last 14 days). The individuals are assumed to use the symptomchecker app at any time of the day, and the symptom checker is required to decide whether to recommend testing to an individual. Note that this data could be filled by the individuals on their own or by field-level health workers (ASHAs [1] ) or even obtained from legacy health survey datasets. The symptom-checker coordinates 1 with the local testing facilities to estimate the number of test-kits available (for additional recommendations) on the next day. There can be two modes of recommendation: 1 The coordination between symptom checker app and local testing centers would facilitate the symptom checker to allocate a dedicated time-slot to each individual for taking the test. This would prevent large gatherings at the test centers. In this paper, we do not tackle the problem of allocating fixed time-slots to individuals. However, this remains an interesting direction for future research. 1. Offline mode: Recommendation can be delayed till the end of the day, after all the users have registered the symptom checker on a particular day i. Online mode: Recommendation is made as soon as the individual enters his/her information in the app. The selected individuals are then supposed to undergo a test 2 . It would typically take one more day to observe the test-result (infected or not). Goal: Given a budget of K i test-kits, design a strategy to select at most K i individuals each day such that the observations (infected or not) for those individuals help in improving the predictive power of an infection-prediction model. This goal is aligned with the need to observe enough number of individuals having distinct feature vectors. Thus, such a strategy is implicitly targeted towards collecting unbiased data as well as detecting new cases early on (even among asymptomatic cases) which in turn would help in containing the community spread. Let X all represent the entire population of the country (irrespective of whether they have filled-in their details in the app). Let X denote the set of all individuals who fill in their details in the symptomchecker app, possibly on different days. Clearly, X all is a superset of X. Let each individual x ∈ X be denoted by a vector of d features x = (x 1 , x 2 , . . . , x d ). These features capture demographic and healthrelated information for each individual. If an individual x take the test, then the label y(x) denotes the test-result. y(x) = 1 denotes that x is infected and y(x) = 0 denotes that x is not infected. Thus, X can be represented as a matrix with rows representing individuals and columns representing features as well as the label y. Let X i be the set of individuals who fill in their details in the symptom checker app on the day i. Let S i ⊂ X i be the set of K i individuals who are selected. We assume that K i << X i (for most practical scenarios). Note that the y(x) labels for all x ∈ S i will eventually be available. Let S be the set of all the individuals who took the test. Note that for all x ∈ S the corresponding y labels 3 are observed. On the other hand, the labels y(x) will be unavailable for all x / ∈ S, which is the case for most individuals in X and X all . In this work, we focus on the offline mode and propose some approaches in Section 4. We briefly discuss the online mode and other important considerations in Section 5. In the offline mode, the symptom checker is allowed to delay the recommendation of testing until the end of the day. This mode of operation would require the users to register before a specific time of the day (say, 9P M ) to be eligible for testing on the next day. Let the set of all eligible users on the day i be denoted as X i . Within 10 − 15 minutes of this deadline (i.e., by 9 : 15 PM), the selected K i individuals (denoted as S i ⊆ X i ) would then be notified about getting the test done on the next day, i.e., i + 1 th day. In this section, we propose four approaches, namely (a) randomization across four disjoint buckets, (b) stratification based on features, (c) budget delayed contextual bandits, and (d) utility-based active learning solution. These solutions are aimed at selecting K i individuals among the population X i who fills in their details in the symptom checker on a day i. The first two approaches (a and b) are easy to implement and can be quickly adopted for allocating test-kits, while the next two approaches (c and d) are more sophisticated and yield better results in long term. 2 Ensuring the individuals take the test with the right incentives, mandates, and ethical safeguards is yet another direction for further study. 3 In certain cases, an individual might be asked to take a test on multiple days for accuracy. That is still counted as a single test and the associated label is assumed to be positive if any of the tests are positive. A naive approach is to assign each individual in population X i to one of the following four mutually disjoint groups. X (1) : symptomatic cases with risky contact or travel history (i.e., contact with confirmed cases or international travel), X (2) : asymptomatic cases with risky contact or travel history, X (3) : symptomatic cases without any risky travel or contact history, and X (4) : asymptomatic cases without any risky travel or contact history. Split the K i test-kits among four buckets: allocate a budget of K X (j) to each group j ∈ {1, 2, 3, 4} such that depends on how much importance should be given to the group X (j) while allocating testing-kits. Then, randomly sample from each group j based on the budget allocation for that group. Note that testing should remain mandatory for all those in the critical group (symptomatic cases with either recent travel history or contact history). Since there is an overlap with the mandatory cases, the budget K i can be adjusted accordingly. In this approach, we wish to select both the under-represented and over-represented groups in the population X i to the same extent as that in the larger population X all , especially when the features are not distributed evenly across the set X i (as observed by symptom-checker). Let us assume that each person in the population has a feature vector x which includes age as one of the features. A naive random test allocation strategy is the one in which we sample uniformly the population who have filled out the symptom checker (i.e., X i ). In this allocation strategy, the distribution across the sampled features would mimic the distribution across X i . For example, if 90% of the rows of X i contain individuals less than 60 years old, then the random allocation strategy is likely to select individuals who are less than 60 years old and might miss out on the individuals above 60 years. We wish to avoid this scenario and ensure that there is adequate representation for the 60+ population in the chosen sample. To sample across all features according to the distribution of features in X all , we adopt the stratification approach. In this approach, we divide the population space (both X and X all ) into groups (i.e., disjoint subsets) based on the values of some selected features (represented as F ). If we know the joint distribution of the population over the set of features in X all and X i , we can use them to adjust the weight of each individual according to the group he/she falls in. This means that, if the overall population X all is uniformly distributed across all groups and an individual x ∈ X falls in a group with fewer samples in X i , the weight, or probability of selecting that individual, will be higher than for group with many samples of X i . Formally, we define the overall population as X all , where each individual is represented by a vector . . , d} be a selected set of features based on which the population is grouped. The joint distribution of the features in F over the population X all be denoted as P X all (x f = a f ; ∀f ∈ F ) where a f is the value that a feature f ∈ F can take. Similarly, let P Xi (x f = a f ; ∀f ∈ F ) be the joint distribution for the population X i (set of individuals who have reached out to the symptom check on day i). We can use these values to assign a weight [15] on an individual Moreover, we also increase the weights of individuals who are at a higher risk. To find the utility values, let us assume that there is an external function U (x) that outputs the utility associated with testing an individual x; see Section 5.3 for more discussions on U (·). April 2, 2020 The weight on each individual x can be computed as This weight would then be used to perform a weighted random sampling of K i individuals from the pool of X i individuals. Example: let us consider two features, namely, age and gender, for each individual in the population 4 . We can denote the joint distribution of age and gender over the entire population by P X all (Age, Gender), which can be obtained using the information provided in the National Health Profile (see Table 1 for state-wise distribution of age and gender or Table 2 for the overall distribution of age and gender) and the Census. We wish to select samples from X i such that the selected set of samples is representative of the population X all (either for a given state or for the overall country). Now, we divide the population X i into groups, for example P Xi (Age < 20, Gender : F emale), P Xi (Age < 20, Gender : M ale), P Xi (20 < Age < 40, Gender : F emale), P Xi (20 < Age < 40, Gender : M ale), etc. These values represent the density of the population that falls within that bucket. When selecting among the population, we will weight the samples with the inverse of these values. Similarly, we would use P X all (·) Putting all these together, we provide a simple pseudo-code below. [3] . • the Python package numpy.random.choice generates a weighted random sample of a fixed size from a given array. In this approach, we combine the classifier learning with "informative" data point selection. The algorithm's goal is to iteratively refine the classifier so that the cost of missed detections or false positives is similar to the best you could have done in hindsight. In a contextual bandits setting, the decision-maker observes a context, makes a decision by choosing one action from a number of alternative actions and later observes a reward associated with that decision. The goal is to maximize the average reward. In our case, the context is information about the user: their age, geolocation, prior health conditions, symptoms, travel history, contact history, etc. The actions are: to test a person or not. The reward or cost is: a) cost of missing detection (i.e. not testing a person when she/he is actually infected) and b) cost of false alarm (i.e. testing a person when she/he is not infected). In general, this algorithm constructs a confidence ball around every point and its prediction (i.e. where the person should be tested or not). The final decision per point is sampled from a probability distribution over both the outcomes: positive, negative. Here is a tutorial and an implementation of this algorithm: https://vowpalwabbit.org/tutorials/contextual_bandits.html Above mentioned, Vowpal Wabbit [17] is an open-source library that implements online and offline training algorithms for contextual bandits. Offline training and evaluation algorithms are described in the paper titled "Doubly Robust Policy Evaluation and Learning" [4] by Dudik et al. For implementation, we can bootstrap the algorithm using the stratification approach on the dataset obtained until day i − 1. After this, we model the problem as a contextual bandits problem day i onward. The main components of the problem are: • Context: information about an individual, such as, age, gender, geolocation, prior health conditions, present symptoms, travel history, and contact history, represented as a vector x with d dimensions. • Actions: to test a person (arm 1) or not (arm 0). This is a 2-arms bandit setting (A = {0, 1}). • Feedback: Whether a person is positive (Y = {0, 1}). • Reward/Cost: reward/cost of testing a person when he/she is positive C(x; a = 1, y = 1) and when he/she is not positive C(x; a = 1, y = 0). We can fix cost/reward of not testing to be 0 (i.e. C(x; a = 0, y) = 0 for both y = 1, 0. • Policy: takes a context x and returns an action a with probability P(a|x). We denote by P i (), the policy used on day i. • Budget: the number of available test-kits, say K i . • Decision to optimize: learn a policy that selects K i individuals to maximize the expected reward Now, let X i represent the set of new data points collected on the day i for which we need to decide the subset S i that should go for testing. Recall that size of S i should be bounded by the budget K i . That is, |S i | ≤ K i . Also, let Z i−1 be the set of points for which we had asked for testing and some section of them are tested and their results are available (we assume that the feedback y, whether or not the selected individuals are positive, is delayed by a day). That is, Z i−1 = {(x e , y e ), ∀e} where y e is the test result of individual with feature vector (context) x e . The algorithm works as follows: 1. Update: We use Z i−1 and the cost function to train (or re-train) a classifier and update scoring function f () using standard contextual bandit update methods. f (x) gives a score for testing individual x; higher score implies one of the two scenarios (a) the individual is more likely to be positive or (b) the algorithm is confused and is trying to build confidence around x. We apply updated function f on each individual x ∈ X i to get their scores f (x), x ∈ X i . These risk probabilities and the associated costs act as input to a contextual bandit algorithm [17] and it outputs the probability of recommending testing, P i (a = 1|x). The next task is to select a total of K i individuals on day i. We select using a weighted random sampling with recommendation probabilities P i (a = 1|x) (as described in the pseudocode above). In this approach, the goal is to find a set of most "informative" data points (people) so that getting them tested (i.e. labeling them) would lead to a significant increase in risk assessment solution performance, i.e. the classifier that predicts if a person is likely to be positive and hence should be tested. There are two types of solutions to this problem: • Uncertainty based exploration: Determine which user's infection status is most uncertain according to the current classifier's scoring function and ask for their labels (see, for example, the paper by Tong and Koller, 2001 [16] ), but this method will suffer in our case with high bias in the initial classifier. • Disagreement based exploration: In this approach, if any two classifiers in the hypothesis class disagree on whether a person is likely to be infected or not, then that person should be tested. However, this algorithm is somewhat challenging to implement. One version of the algorithm is implemented here [9] . The challenge in the online mode is that individuals arrive online one-by-one, and one needs to take an immediate decision on whether to recommend them for testing. To do that, we need to design a rate-limiting mechanism [7] . At a high level, a rate-limiter limits the number of actions that can be performed within a particular time window. To apply a rate-limiter mechanism, let us divide a day into six time-slots of 4 hours each and limit recommending only α t fraction of K i at a particular time-slot t. The value of α t can be estimated by computing the proportion of individuals who registered at a particular slot on the previous day, i.e., For an individual x who arrives during the time-slot t on the day i, we compute the score (probability or utility of recommendation) using one of the four approaches defined in the offline mode. If there were less than (α t * K i ) individuals on the previous day, then recommend testing to x. If not, then compare x's score with that of the (α t * K i ) th highest scorer of the previous day among those who arrived in slot t. If x scores higher than the (α t * K i ) th scorer on the previous day, then recommend testing. Apart from rate-limiting mechanisms, there exist other sophisticated techniques that can be applied to facilitate the online mode of operation. The test-allocation problem that we consider assumes that the symptom checker app coordinates with local testing centers to estimate the number of available test-kits on the next day. Such coordination is also important for allowing the symptom checker to allocate a dedicated time-slot to each individual for taking the test. This would prevent large gatherings at the test centers. In this paper, we do not tackle the problem of allocating fixed time-slots to individuals. However, this remains an interesting direction for future research. In Section 4.2, we assume an external function U (x) that outputs the utility associated with an individual x. This utility function has to be chosen based on the health policy directives [8] . Note that this utility value can be computed using all the available information (e.g., location, symptoms, occupation) associated with an individual, including features that might not have been used for stratification. When the budget K i itself forms a big fraction of the overall test kits, it is important to address the care and containment aspects along with data collection requirements. Let M denote the current risk assessment model. Then U (x) set to the probability of infection p M (y = 1|x) or a monotonic function thereof is a good choice since it ensures that the high risk cases are assigned more test-kits while allowing a non-zero allocation for the low risk ones. When the high risk patients are already guaranteed tests and the goal for the K i budget is to improve the infection assessment model for a binary decision, then it would be appropriate to choose U (x) to be the entropy of the conditional distribution H(p M (y|x)) or a similar uncertainty measure. If the intent is to estimate a well-calibrated infection distribution, then assuming a uniform U (x) = 1 for all x would be suitable. In a country with a very large population (such as India having 1,376 million people), it might be extremely difficult to manufacture extra test-kits in bulk. Thus, it is important to study how to optimize the use of these few extra test-kits. A recent study [5] shows that it is possible to use lesser test-kits for testing more people in a situation where most of the test results turn out to be negative (number of positive cases in South Korea is only around 2.4% of the total tests performed). The article [5] mentions that researchers at the German Red Cross Blood Donor Service developed a procedure called mini-pool where they mix 5 trial samples in a pool and test it. If the test for the pool is negative, then all the individuals are declared negative. If the pool is tested positive, then each sample is tested individually. More sophisticated techniques can be used based on this binary search method. Though this utilization problem is orthogonal to the problem we consider in this work, there are solutions that may help to choose the budget K i , where K i can be much more than the number of available test-kits, and thus would help to test a larger set of individuals. Coronavirus: South Korea seeing a 'stabilising trend CS 4740: Introduction to Natural Language Processing Doubly robust policy evaluation and learning Pool testing of SARS-CoV-02 samples increases worldwide test capacities many times over Novel Coronavirus COVID-19 (2019-nCoV) Data Repository Rate-limiting strategies and techniques Revised Strategy of COVID19 testing in India Active learning for cost-sensitive classification Introduction of community transmission of Covid-19 likely in India: ICMR study It's Just Everywhere Already': How Delays in Testing Set Back the U.S. Coronavirus Response The New York Times. The Coronavirus Outbreak The New York Times. They Were Infected With the Coronavirus. They Never Showed Signs February 26 Selection Bias: Origins and Mitigation Support vector machine active learning with applications to text classification Contextual Bandits Reinforcement Learning The authors thank Mohit Kumar and Arun Iyer for extremely useful discussions.