key: cord-248301-hddxaatp
authors: Howard, Daniel
title: Genetic Programming visitation scheduling solution can deliver a less austere COVID-19 pandemic population lockdown
date: 2020-06-17
journal: nan
DOI: nan
sha: 
doc_id: 248301
cord_uid: hddxaatp

A computational methodology is introduced to minimize infection opportunities for people suffering some degree of lockdown in response to a pandemic, as is the 2020 COVID-19 pandemic. Persons use their mobile phone or computational device to request trips to places of their need or interest indicating a rough time of day: `morning', `afternoon', `night' or `any time' when they would like to undertake these outings as well as the desired place to visit. An artificial intelligence methodology which is a variant of Genetic Programming studies all requests and responds with specific time allocations for such visits that minimize the overall risks of infection, hospitalization and death of people. A number of alternatives for this computation are presented and results of numerical experiments involving over 230 people of various ages and background health levels in over 1700 visits that take place over three consecutive days. A novel partial infection model is introduced to discuss these proof of concept solutions which are compared to round robin uninformed time scheduling for visits to places. The computations indicate vast improvements with far fewer dead and hospitalized. These auger well for a more realistic study using accurate infection models with the view to test deployment in the real world. The input that drives the infection model is the degree of infection by taxonomic class, such as the information that may arise from population testing for COVID-19 or, alternatively, any contamination model. The taxonomy class assumed in the computations is the likely level of infection by age group.

The quaranta giorni or forty day isolation by the Venetians, as the name implies, was a measure applied to incoming ships [12] which evolved into containment practices to handle recurrent epidemics 1 . At the time of writing, owing to ubiquitous world travel, COVID-19 'quarantines' or lockdowns keep millions of people around the world confined mostly to the home for months before an 'easing of measures' gradually re-opens society.

The 2020 lockdowns in response to the COVID-19 pandemic enforce in different ways. Argentine and Spanish lockdowns are strictly policed with citizens required to make written application for outings. In contrast, in north western Europe and the United States, lockdows are not as strict, with Denmark and the United Kingdom entrusting their citizens not to infringe lockdown rules, and liberal Sweden choosing not to lock down in an official sense but instead practicing a small number of restrictive measures.

The manner of lockdown and how to exit a lockdown are problems that are short of informed solutions. It is useful to discuss the generic lockdown problem as requiring an innovation capable of overcoming the following trade-off: generally, the longer and more extensive is a lockdown the more effective becomes society's ability to prepare hospital facilities to control the spread of the disease but the higher are important negative factors: loss of personal freedoms; damage to the economy; poorer personal psychology; social unrest; abusive relations; undetected crime; higher incidence of other disease because people are too scared to visit the emergency room or doctor and negative effects on care of both vulnerable and elderly. 

The idea is to enable lockdown whilst minimizing many of its negative consequences, e.g., to personal psychology, economics and health. Indeed, it would be nice if life could continue as normal while in lockdown, a seemingly contradictory statement. Inspiration comes from a crude approach taken by governments to ensure that those who venture outdoors are fewer in number. Panama [9] used the last number of an identity document to assign two hour time slots to venture away from the home for essentials and, as Spain eased measures, it allowed certain age groups to go out at different times of the day. The open literature also explores changing a general lockdown to a number of partial ones [11] .

The solution presented here is for citizens in lockdown to enter into smart-phone, handset, tablet or computer, a schedule of the places that they wish to visit on that day, or future dates, together with a rough idea of the part of the day they would prefer for such outings to take place. A method of optimization, in this proof of concept this is a Genetic Programming [7] method, takes these requests and simulates the outings by means of an infection model, to discover a nearly optimal allocation of precise time slots for visits that reduce the likely hospitalization and death numbers. It then communicates the time allocations to citizens on their devices such that they will carry out the journeys more safely than at any times of their choice. The solution is proactive.

Specifically, it is useful to discuss the problem within the implementation of this proof of concept. A data file captures all requests as shown on the left side of Figure 1 . The visit requests on each of three consecutve days can be many, and each is denoted by a three symbol key. Requests are separated by the colon symbol. The first letter concerns a broad time of day requested for the visit limited to:

M wishes for the visit to take place during 'morning' hours; P wishes for the visit to take place during 'afternoon' post meridiem hours; N wishes for the visit to take place during 'night' hours;

A anytime: does not mind at what hour on this day. and the second letter is the desired place of visit, limited here to six types of establishment: F may represent a 'supermarket' selling food; C may represent a sports 'club'; P could represent a 'park'; D may stand for 'doctor' a doctor's surgery centre; R may stand for 'restaurant'; S could represent a 'social' establishment; the third symbol can be 1 or 2, because there are only two of each type of establishment, or a total of 12 establishments available for visits.

For example, one such request is from Person ID 20 who wants to go to Doctor's Surgery 2 at any time of day. The day is divided into eight two hour slots. After analysis of all request, by working with an infection model, the computer generated optimization minimizes the total risk of COVID-19 infection, hospitalization and death. It allocates the time slots to the requests and the solution is communicated back to citizens. In this case Person ID 20 is allocated a time slot within the constraint that they imposed as shown on the right side of Figure 1 . There are other data fields such as age and health in the left side of the figure that are discussed further in this text.

The proof of concept simulations cover three consecutive days denoted as: Monday, Tuesday and Wednesday, involving 1704 visits as requested by 282 people. A day consists of eight two-hour visitation slot periods available to schedule the visits. Details of the data for the purpose of the approximate reproduction of results is as in Table 1 . The data is an entire fabrication but common sense governed its choices: older people carry out relatively fewer outings than the young, and are in a poorer state of health.

The degree of health of a person is a number that ranges from 1 to 10. In any future real-world application of this research, the general health of a person, which is a measure of their immune system response to the pandemic and assumed to abate the probability of hospitalization or death could be gathered from patient records. As seen in later sections, this level of health combines with infection and plays a pivotal role in determining which solutions are better than others. The idea is to reduce hospitalizations and fatalities.

The solution must simulate the infection process cumulatively and longitudinally in time as people go from place to place to optimize the allocation of time slots. This requires as component a model of COVID-19 infection. The proof of concept develops a model based on person to person transmission based on simple probabilities of meetings between people in a confined location and is necessarily a very basic model but one that illustrates the potential of the solution.

Two models are presented: a partial infection probability developed for this work and a simple standard probability model. Both are simple and would need to be improved by epidemiologists for their real world application.

The idea of a 'partially infected' individual is developed here because it will only be possible in some average sense to know who might be infected 2 . The idea is to assign a 'partial infection level' to all members of a taxonomy class of interest, and then simulate how they will infect others and further infect each other. 2 if we knew precisely who was infected they could be isolated and the solution becomes unnecessary  age   20  30  40  50  60  70  80 total  number of people  35  65  49  43  27  43  20 If an infected individual I is in close proximity to a susceptible individual S then the possibility exists of transmission of the disease from I to S. Without loss of generality the assumption is made that every such encounter will result in infection transmission. An infection probability based on a count of such encounters between S and one or more I can be expressed as p n where n is the number of infected who may come into contact with susceptible in a fixed interval of time that denotes the duration of a visit, e.g., an hour or two. More sophisticated relations should consider this time dimension and may model it with Poisson distributions but such complications are ignored for the purpose of this presentation.

If the location of the encounters is for example, a store of a certain physical size, and s is the number of sublocations of that are available to visitors that together comprise the walkable area of that store, they are sub-units of area small enough such that people may position themselves and come into close proximity of each other, then a simple count of probability for such encounters between S and one or more I leads to: p n = 1 − ((s − 1)/s) n . A property of this relation is the convergence to one as n grows large: n → ∞ then p ∞ → 1.0 meaning that a non-infected person will surely come into contact with one or more infected at that store. Moreover, also p 2 < 2p 1 such that if doubling I then the probability of infection for S is increased but never doubled.

In driving a contact based infection model, simulations must assume a certain number of persons are infected a priori at the start of the simulation. This presents a challenge because it requires assumptions of who is infected and why, and also a huge number of stochastic computation and an averaging of results. As each person is different and not a clear member of a taxonomy class 3 , for example, people may be of the same age but of different health level granting them less resistance to the infection, the computations would necessitate carefully balancing assumptions about who should be assumed to be infected to drive the computations.

The partial infection model that is presented, while not being itself a perfect solution, does not have this 3 the idea will not be unfamiliar to mathematical biologists and epidemiologists [6] onerous requirement and simplifies the computational effort requirement for this proof of concept study because it can take an idea about how infected are members of the population, assigning a probability of infection to all members of that taxonomy grouping in order to drive the infection simulation.

The concept of a partially infected individual is a modelling tool. Each person P j is represented by a vector of size two, P j = (S j , I j ) with S j + I j = 1. For example a person with ID label j = 1 that is forty percent infected is represented by S 1 = 0.6 and I 1 = 0.4 and another with ID label j = 102 who is one percent infected by S 102 = 0.99 and I 102 = 0.01.

Consider the meeting at the store of n p persons of which n have some degree of partial infection. The resulting infection pressure, pI n , that is the resultant partial infection probability of the encounter that is brought about by the infection contributions of the n partially or fully infected persons, can be obtained for two cases:

S max is the maximum S j out of all n persons. With the exception of the owner of S max each I k of the other partially or fully infected persons is multiplied by S max and by a multiplicative numerical constants g j soon to be discussed. These product terms are summed to obtain the probability of infection for the encounter. When n p = n, the number of such product terms is therefore n − 1. However, when n p > n, S max is considered to be from one of the fully susceptible persons in n p and thus S max = 1. Now the number of such product terms in the sum is n. A formula to compute pI n given n p = n with every participant infected or partially infected, is developed as

and for the case n p > n with S max = 1 as

the constants g j emanate from simple overlap counts in probability trees as in appendix B.2 and this author's earlier technical communication [4] . Deliberately the products are arranged or ordered so that the largest g j corresponds to the largest I j . This represents the worst infection case which subsumes all others. Once the probability of infection pI n is calculated, the partial infection of all participants in n p is updated for encounter v in readiness for the next visit v + 1 as follows:

As an example, assume s = 4 and n p = n 

The infection probability indeed follows a similar simple 'probability count' philosophy as in the previous section but as obtained with a Monte Carlo process that generates the probabilities. Each evaluation involves q trials for one fully susceptible S and twenty fully infected I persons and it counts the times when the susceptible is co-located with any infected. For example, it does this on 40 separate occasions, i.e., once these forty trials are computed for the one susceptible and the twenty infected with results as in Figure 2 . The procedure checks to see whether a susceptible and one or more infected shared an encounter from those forty, (q = 40 sampling). It is however fashioned in a more elaborate way to try to account for differing times spent in different sub-areas. The random sampling is made to fit into 7200 seconds of time (two hours) but the random number indicating the location is chosen to weigh certain locations more. It crudely captures that some areas of the store or establishment are more popular than others. For example, browsing a magazine rather than simply walking through some area of the store. Random numbers are clustered to represent three types of areas: one where the person spends 2 seconds or another where the person spends 30 seconds, and yet another where the person spends 102 seconds in the selected area. The procedure is outlined in Appendix A. The constants in the figure are calculated prior to the run and give the probability of susceptible getting infected depending on how many people are infected in the store. All visits are of the same duration so no attempt is made to model probability of infection through time with a Poisson process. This model of infection transmission is once again a very simple one. When n > 20 the probability of infection is assumed to be equal to one. Note that a lower or higher rate of infection can be obtained by changing the sampling from 40 to a smaller or larger number.

At any meeting place and time, three types of person can participate in the encounter: I or infected; S or susceptible and R or immune (recovered). No action is taken if fewer than two people participate n p < 2. No action is taken if all are immune. No action is taken if all are infected. No action is taken if all are susceptible. No action is taken if there are no infected. No action is taken if there are no susceptible. Otherwise the aforementioned infection probability is selected for the number of infected and that real number is multiplied by the number of susceptible and then truncated to obtain the integer number that will become infected. In no particular order that many susceptible are labelled as infected.

The method adopted to optimize the allocation of requested visits is a Genetic Programming (GP) scheme developed by this author to discover a set of precise numerical constants that serve as coefficients of a polynomial representing the direct solution of the one dimensional homogeneous convection diffusion equation [5] . This and other puiblications [3] [1] demonstrate that standard Genetic Programming trees are capable of computing very precise real numbers when and if needed. When GP trees are evaluated they deliver a vector of real numbers of self-determined variable-length. Table 2 lists the functions and terminals that comprise the GP tree. They manipulate two pointers P R and P Z to the output variable length vector of numbers. They also make use of two working memories, two real numbers, m 1 and m 2 . Terminals are small and large numbers that become arithmetically manipulated. Certain functions write to the variable length result vector, shrink it or expand it as the GP tree evaluates. The result of evaluating a GP tree is a variable length vector of real numbers. These numbers can be of any size and can be positive or negative. A subroutine then operates on these numbers to bound them as positive real numbers in size between 0.0 to 1.0 [5] . For example, if a number in the vector is -21.27625 it now becomes 0.27625. Also if a number is 29.00000 it now becomes 0.0001.

How is this vector of real numbers used? Consider 1704 visitation requests of Table 1 . The real numbers vector is probably much smaller. This is consulted from left to right as when a child reads words letter by letter. Consider the outing request by a person is AD2, e.g. person 20 Monday in Figure 1 wishing to go to Doctor's Surgery 2 at any time of the day. The day is divided, into eight two hour slots of time. As each number ranges from 0.0 to 1.0 consider that this might be 0.2763. This number would indicate prescription of the third slot of time since 2/8 < 0.2763 < 3/8. The allocation for that visit complete, the next real number in the variable length vector gets consulted to deal with the next visit request.

As the variable length vector of numbers is usually shorter than the total number of visits requested, when the last element of the vector is reached it cycles back to the first number in the vector and continues until allocations for all 1704 visitations are dealt with, allocations of time as in Figure 1 

= m 2 SetMem2(2) = R and m 2 = (m 2 + L)/2 Table 2 : Constituent functions or building blocks of GP trees in [5] are of four varieties: number; arithmetic; record; and memory with number of arguments as shown in brackets: if (2), one is L or'left' input and the other R or'right' input (corresponding to left and right branches of the subtrees below the node) and (1) is a tree terminal or leaf. The resultant variable length vector of constants r has elements r j that must never exceed j = P max . Two pointers are manipulated P R and P Z current position and vector length respectively. Working memory locations are two: m 1 and m 2 . In the above 'increment' means to increase by one, and 'decrease' to decrease by one. The symbol '=' denotes what the function returns to its parent node in the tree.

small to moderate sized vectors. Practical use of the method may require a small number of additional strategies. On some problems for a small number of initial generations the Darwinian fitness is set as the vector size. It stimulates production of large output vectors. When a desired variable length vector output size is reached by all members of the population the fitness is reset. From then onwards the fitness is the problem's Darwinian fitness, typically a measure of solution error. The procedure is often not necessary but can become useful when required solution complexity or dimension is very high. Once seeded in this way the solution vector will grow or shrink as crossover and mutation operations on the GP tree create new and improved GP trees.

As the method makes use of GP trees and standard GP, then all that we know about standard Genetic Programming including modularization options such as ADFs [8] , and subtree encapsulation [10] are applicable. As will be revealed in the numerical experiments, the approach works, discovers good solutions, and appears to be quick on a portable computer, the compiled c++ Visual Studio 2019 executable delivering solutions in seconds for the proof of concept experiments.

Regardless of the infection model used the procedure is similar. There will exists a taxonomy of persons by some parameter(s), for example, age group. Information about the likely degree of infection for different taxonomy classes is input to the computations. As time progresses and visits take place, people get infected. At the end of the three days, account is taken of who is infected and their prior health level. This calculates how many people in the proof of concept study will become infected a few weeks hence and how many may die. There are many assumptions but if correctly taken they drive Genetic Programming to discover allocations that reduce hospitalizations and fatalities. Central to Darwinian fitness are such calculations of who and how many will perish or fall very ill and use the Intensive Care Unit (ICU). In this implementation taxonomy is assumed to depend on age group.

COVID-19 testing is unlikely to become universal and frequent for all citizens but limited testing and other measurements are sufficient to serve as indications of the degree of infection likelihood for sectors, partitions, of the population. The proof of concept uses age as taxonomy. Figure 3 reveals how this works. If testing reveals that more people of age groups A or B have COVID-19 than of age groups C or D, then percentages are entered at the start of numerical experiments. For example, 20=0.03; 30=0.01 means that the 20-year-olds have a suspected level of COVID-19 infection of three percent, while the 30-year-olds are suspected to have one percent. Each simulation proceeds from Monday to Wednesday and for each day from morning to night (8 time slots). At each time slot all twelve establishments are considered and calculations of partial infection update the partial infection level of all. At the end of the day, a calculation is made to determine who should go into self-isolation and participate no longer in the simulation. The rules to identify those persons are presented in the left side of Table 3 .

At the very end of the simulation another procedure calculates the total number of hospitalized in the Intensive Care Unit (ICU), denoted by the symbol N H . These are those in hospital who eventually recover but who never the less put pressure on the health service. The procedure also calculates those who unfortunately pass away, denoted by the symbol N D . The rules to determine these two numbers are presented in the right side of Table 3 . Note the assumption is made here that both the state of health and the age of the subject correlate similarly but are assumed to be separate causal factors.

The GP Darwinian fitness function F F is the measure of solution goodness. It combines N H and N D weighted by some desirability constant W C . F F is by choice a negative quantity because the evolution is set to maximize or make bigger this quantity, with a perfect score being zero implying N H = N D = 0:

All of the numerical results in this proof of concept use W C = 0.65, reflecting desire to reduce fatalities. where W 3 > W 2 > W 1 > 0 reflects that the mortality of the young from COVI-D19 is low and that of the elderly is very high. However, it suffices to show by the small worked out example of Figure 4 (and as seen in the figures) that lower average partial infection does not necessarily reflect in smaller values of N H and N D .

In this type of model persons can only become fully infected, I. Hence, the rules to determine N H and N D are different and only apply to those infected. Additionally, the decision to self-isolate depends on the number of days that the person is infected and their health level. Consider that some of the people in the input data file are already fully infected. Table 4 shows the decision schema for self-isolating persons as well as for deciding on N H and N D after the simulation is complete. Note the assumption is again made here that both the state of health and the age of the subject correlate similarly but are assumed to be separate causal factors. The fitness measure for these computations is also equation 4.

The numerical experiments compare the solution to three round robin uninformed allocations. The first of these comp1 sends all those: (a) requesting 'morning' visits or 'any time' visits to the first morning time slot, the 8:00 There is also a third uninformed allocation by round robin for comparison, comp3. It is a round robin of three. However, as the morning has only two time slots this sends two thirds of the requests for case (a) to the first slot and the other half to the second slot. For cases (b) and (c) it uses all three time slots distributing the visitation requests evenly. Comparison could be made to other allocations but these are broadly representative of what would transpire in the real world without the benefit of the solution and in conditions of partial lockdown or no lockdown. It turns out, as expected, that comp1 results in the highest number of hospitalizations N H and deaths N D for the model problem and comp3 in the lowest as revealed in comparative experiments presented in this section.

Each experiment follows common GP practice executing a large number of parallel independent runs (PIRs). Each PIR differs in its initial random seed (uses PC clock timer) seeding the population randomly and differently for each PIR. PIRs can have different population sizes of GP trees. All runs use 80 percent crossover and 20 percent mutation to generate new GP trees. This is a steady-state GP with tournament selection of four individuals to select a strong mate for crossover, whereby two GP trees swap branches, or the winner of the tournament simply mutates, with a kill tournament of two to choose the weaker GP tree to replace. The maximum possible tree size for GP is set at 2000 and if this is exceeded then the shorter side of the crossover swap is taken. The maximum variable length vector size is set at 10,000 but never remotely approached: excellent solutions have vector sizes of between 10 and 100. It is an 'elitist' GP for it does not destroy the fittest solution, i.e., solutions do not have a lifespan inside a PIR. All PIRs implement standard 'vanilla' GP with no parsimony pressure or any other non-standard approach. This proof of concept study does not employ explicit reuse: ADF [8] or subtree encapsulation [10] .

When a better solution emerges during the execution of a PIR, this is separately stored. A PIR typically only for persons who become infected self-isolation conditions outcome rules age days infected health age health actions outcomes 20 produces up to around ten of which two are highly fit and interesting. As subsequent PIRs produce more solutions, all become ranked by fitness, a global list of solutions, from which the user can select one to inspect. For each solution, full details of the participants to all visitations, infection levels, numbers self-isolating, in ICU and sadly passed away can be inspected, as well as the variable length vector of real numbers that is the solution and visitation schedule and identity, age and level of infection of all participants recovered, in ICU or deceased. For the partial infection experiments it also computes partial infections by groups: young, middle aged and elderly every four hours. The comprehensive result panes in figures in this section merit closer inspection. As PIRs build, a Pareto front picture can emerge of non-dominated solutions involving the two criteria N D and N H . Table 5 shows the vast superiority of the solution to the uninformed round robin allocation schemes. The superiority is marked and so there is no need to check this against other possible random allocation schemes. It makes common sense that the Genetic Programming solution is always superior. Note that with s = 6 the problem becomes easier than with s = 4 because the chances for meetings between people are lower. This can be seen to be true for both round robin allocations and Genetic Programming solutions. It also appears to be that the superiority of Genetic Programming solutions over round robin allocations is correlated with higher s. This is probably because the Genetic Programming has more degrees of freedom to discover a better solution. Therefore, even if the intuitive idea is that with a smaller density of people there should Table 5 : Summary of results attained by the numerical experiments that made use of the partial infection model. With s = 4 the solution is three times better than a round robin allocation and with s = 6 it is ten times better.

be less need for the solution, this is not always the case. Note that in the cases where s = 6, Genetic Programming managed to discover an allocation where N D = 0, with no deaths. Perhaps such a counter-intuitive conclusion can be discerned that the solution is needed more so when the problem seems simpler. Close inspection of the figures referred to in Table 5 reveals that the system of PIRs delivers a number of Pareto non-dominated solutions to choose from. Also, there are solutions that achieve an identical result in terms of N D and N H but which are quite different. Then one is also able to inspect the age and health of the fatalities, again this may be a tertiary factor that could come into play when selecting among the many equivalent solutions. Finally, it can also be discerned that the average partial infection level and its final levels as shown in the figures do not always correlate with lower error and higher Darwinian fitness.

A different set of computations to the real world problem is included here for completion. It pertains to knowing who is infected a priori and trying to optimize what could have happened had their visits been scheduled differently. It could either be useful to a retrospective study or, alternatively, such computations can potentially address the same real world problem but only by the carrying out of a plethora of experiments with different seeds of infected individuals, assumed infected according to some taxonomy knowledge of level of infection in the population, and then averaging the results in some way to prescribe safer allocations of visits, as alternative to the experiments (presented in the previous section) with the partial infection model.

A set of 15 persons, as shown in Figure 6 , out of the 282 in the proof of concept data, as described in section 3, come already fully infected a priori to drive the computations 4 . This is about 5.3 % of total people, and they undertake 105 visits or 6.2 % of the 1704 total visits that appear in the proof of concept dataset. Individuals with good health levels are chosen as already infected. For completion, the figure also shows six people who have immunity and therefore cannot become infected in the computations. They are included in computations but there is no need for them.

It is envisaged that the optimization problem by GP is challenging. Hence, the infection model is implemented at different values of q (see section 4.2 and appendix A) to allow the solution room to make gains but also to understand the dynamics under various levels of contagion (perhaps representing adherence to social distancing and use of masks). There are eight types of combinations of the events that can befall a person:

1. person comes to the simulation already infected and does not use the Intensive Care Unit (ICU); 2. person comes to the simulation already infected and does make use of the ICU (adds to N H ); 3. person comes to the simulation already infected passes through the ICU or not and dies (adds to N D ); 4. person comes to the simulation immune (has had the disease before and has recovered) and stays that way; 5. person comes to the simulation susceptible and does not get infected; 6 . person comes to the simulation susceptible, gets infected but does not use the Intensive Care Unit (ICU); 7. person comes to the simulation susceptible, gets infected and does make use of the ICU (adds to N H ); 8. person comes to the simulation susceptible, gets infected, goes or not to ICU and dies (adds to N D ); Figure 7 illustrates how to interpret that part of the result figures pointed to in Table 6 . As there are generally no cases of infected ending in ICU or dying only six result summaries show as in the figure.

The results of all numerical experiments that use the full infection model are as in Table 6 . The difficulty with this infection model is that it multiplies the probability of infection by the number of susceptible and then it simply truncates the result to the lower integer. Moreover, it simply goes down the list of susceptible infecting the first bunch it sees up to that integer number. This gives the solution opportunities to play games. For example if q = 30 then p(9) = 0.947 and if there is only one susceptible and nine infected then the susceptible will not become infected.

Notwithstanding the weaknesses of such computations and their erroneous assumptions this can be said about the Table 6 figures. As q is increased then the GP search becomes much harder. Best solutions come from bigger GP populations run for longer (longer numbers of generations). Also the advantage of the solution over the round robin assignments is less valuable than at lower q values. This is to be expected because at high q the disease is far more contagious and there is not much room for the optimization. Note that with q = 30 all round robin solutions have the same N H and N D possibly indicating little room for schedule optimization by the solution. Table 6 : Summary of results attained by the numerical experiments that made use of the full infection model. With q = 4 ( see Appendix A and the text), the solution is between two and three times better than simple round robin allocations.

Large improvements in both lower mortality and lower use of the ICU are available with the solution for the proof of concept data described in section 3 in computations with both infection models. The improvements as compared to round robin allocations in tables: Table 5 and Table 6 clearly show it. The parameters s and q respectively in the two numerical experiments have a similar but inverse influence as the former gives the number of possible sub-locations that can be occupied while the latter is the number of checks performed inside the Monte Carlo calculation that detects co-presence. However, even at low s and at high q, the worst possibilities, the solution keeps outperforming round robin by a significant margin. At high s and low q the solution really shines and outperforms round robin by a very considerable margin.

It augers well because with good cleaning at locations, washing of hands and even moderate social distancing, the rate of infection is expected to be low (corresponding to higher s and lower q in a sense). Of course, a more serious real-world model needs to be considered in further research. Such a model would need to consider:

1. is it correct that a difference in infection rates exists between taxa? if so, what is this taxonomy and how to construct it? is it possible to discover this taxonomy through COVID-19 testing and other data collection?

2. is it possible to develop a reasonably accurate infection model for certain stores and places that people wish to, or need to, frequent?

3. the infection model must also include a dynamic related to object contamination and transmission through contact of surfaces and objects; 4. travel by public transport to such locations also needs to be accounted for by the model; 8. is the round robin a fair reflection of how people wish to go out in the unrestricted normal case? are there invariant principles gathered from mobile phone roaming data that could inform how and when people go out?

9. what is the effect of non-compliance on the solution? could the solution account for it and still be gainful? 10 . the solutions arrange into Pareto non-dominated sets involving N D and N H : what is the difference between these and also between equivalent solutions, same score of F F ? is there a difference between such solutions of further interest?

11. the solution is designed to work even with very little idea or precision about the rates of infection and contamination as it aims for an improvement rather than precise values. How valid are these assumptions?

12. the solution out pefoms round robin but how does it fare in terms of infection rate against strict lockdowns, or exiting lockdown with phased lockdown strategies suggested in [11] ?

13. can economic, psychological and other benefits of the solution be quantified to understand the cost benefit analysis of adopting it versus strict lockdown policies?

14. will people be willing to undergo numerous lockdowns waiting for the availability of a vaccine if the solution is adopted?

Genetic Programming is known to scale well with problem size. However, even if millions of people were considered there would be a degree of clustering that could be treated differently by different discovered solutions. The partial infection model is developed here for the first time. If there is something similar in the literature then this author does not know of it. It must be tested and developed better. It is possible that the pessimistic approach of multiplying the terms of worst case is not entirely reasonable. Lastly, how well infections can be avoided will depend on the number of visitations, the number of establishments visited, the number of people and the frequency of visits. In this research Genetic Programming was handed a tough challenge as the number of visitations was in the order of 2000 and the times slots very few per day. Also the number of establishments was only a few.

In general, it is said that washing one's hands is far more effective than social distancing. It is probably true that contamination is more important than person to person transmission. Contamination can be easily incorporated into the model. Although the model is incipient, it is considered something few have considered if any. Most research is involved in exploiting data sources to predict infection levels or explain the disease. This contribution is different in nature because its deployment could generate data and would also need only tendency of disease data. This contribution is markedly different from contact tracing approaches that are reactive. The work described here and its implementation would be proactive but it could also inform and be informed by contact tracing. Sophisticated relations can be determined in function of the number of encounters between a susceptible and one or more infected that also incorporate time of exposure modelled as a Poisson process [2] 6 . An important characteristic of such relations is that as the number of infected individuals n grows then p n increases but not linearly, so that for example one can expect p 2 < 2p 1 .

This proof of concept assumes a trivially simple infection function based on counting co-locations between S and I. These increase with the number of infected but the ratio of these to the total possible encounters never exceeds one, that is, as n → ∞ then p ∞ → 1.0. The essence of this behaviour can be crudely emulated with Monte Carlo or, as discussed here, by counting on simple probability trees of Figure 8 . We ignore dependency on time of exposure to assume all visits to places are of similar duration. The Genetic Programming approach that uses such infection functions, however, is general and able to incorporate any candidate infection function.

From this figure, first, consider the case of two individuals only: P1 and P2. Individual P1 is susceptible while P2 is infected. The number of encounters between these two, the red boxes at the P2 level in the figure, or where both individuals share a same location, is four and the total number of possible outcomes is sixteen. Hence, the opportunity for an encounter taking place and therefore for infection, assuming each individual should have an equal presence for each of the four locations and spend the same amount of time at any visited, is p 1 = 0.25. Next consider the case where three individuals participate with P1 susceptible and P2 and P3 both infected. Now we recognize opportunities for all three individuals to exist at the same location, and also opportunities for P1 to share a location with either infected P2 or P3. If we count such opportunities we arrive at p 2 = 28/64 = 0.4375 and we verify that p 2 < 2p 1 . For the particular scenario of Figure 8 the coarse emulation leads to a simple relation to obtain the probability of encounter for any n that is: p n = 1 − (3/4) n . Considering rather large n it tends to one: p 20 = 1 − (3/4) 20 = 0.9968.

We now introduce perhaps what may be an unusual idea. The real world objective of this initiative is to leverage off COVID-19 testing. Imagine that upon testing for COVID-19 the incidence is found to be higher in age groups 20 and 50. It is unlikely that COVID-19 testing will be undertaken by all people all of the time. Moreover, tests are still unreliable. Hence, we need to work with partial knowledge. An option to explore might be to work directly with the probable levels of infection that are informed by the testing for different groups of people as organized in some taxonomy: for example, consider the probability of infection to be 5 percent and 8 percent respectively and zero in other age groups? 7 . It motivates use of a 'partial infection' but why? If we knew who was or was not infected we would let them out or keep them in isolation. However, as we cannot test every single person, it is useful to assign to them uncertainty, a probability level 8 . In such a case, we must consider that any of the people in Figure 8 namely P1, P2 or P3 may be partially infected (and partially susceptible). Figure 8 shows the encounters between persons, i.e., two people sharing one location (1, 2, 3, 4) at the same moment in time, thus coming into contact with each other. We are especially interested in two encounters: P1-P2 and P1-P3 as we can assume that one person P1 is susceptible while the other two are infected. Note that under permutations assigning I and S there is no need to count encounters P2-P3 as these would represent two infected, and neither of interest are encounters between two susceptible. The thirty two encounters that determined the probability of infection result in the union that gives twenty eight encounters in the figure Note that in spite of P3 already carrying a small probability of infection, its new infection level 0.4247 < 0.4375. This is because others were not one hundred percent infected and the result was close to but less than 0.4375.

Here is a third example, consider the interactions between three people with partial infection: Notice that P1 and P3 have an infection level 0.2568 > 0.2500 this is because the people were already one percent infected.

An example involving four people is illustrated with the help of Figure 9 , consider the interactions between three individuals with partial infection: Note from previous examples that if we simply identify the participant with the highest component of S then that will be the one that indicates the highest contribution. In this case that is P1 and so SIII should be highest, and it is and is 0.5708. This number is very close to p 3 = 1 − (3/4) 3 = 0.5781. It is a higher value because all participants contribute a significant level of infection. Now we compute the new partial infections for our four participants: These factors can be seen to be in descending order. If n p people participate in an encounter but the number of partially infected people n is less than n p then we assume one fully susceptible individual with I = 0 and S = 1 will meet with the n partially infected individuals and thus S = 1 is used in the formula. The calculation follows this ten step procedure (pseudo-code): 

identify S max and that participant with the highest value of S v

prepare the products (S max I v j g j ) and sum them to get the infection level

10. use it to update infection levels of all n p participants: I v+1 = pS v + I v

Predicting the structure of covert networks using genetic programming, cognitive work analysis and social network analysis

Genetic programming of the stochastic interpolation framework: convection-diffusion equation

Genetic Programming visitation scheduling in lockdown with partial infection model that leverages information from COVID-19 testing

Genetic programming solution of the convection-diffusion equation

Differential susceptibilty epidemic models

Genetic Programming: On the Programming of Computers by Means of Natural Selection

Genetic Programming II: Automatic Discovery of Reusable Programs

.'s Organización Internacional para las Migraciones Panamá. Cuarentena total en suelo panameño

Evolving modules in genetic programming by subtree encapsulation

A phased lift of control: a practical strategy to achieve herd immunity against covid-19 at the country level. bioRxiv

The origin of quarantine

A Pseudo-code for the Monte Carlo routineThe following pseudo-code with q = 40 produces the numbers of Figure 2 : do m, 20 times (iterate over 20 infected) nInfection[m] =0 (counts the encounters for this many infected) end iteration index m do 100,000 times (use a big number for good accuracy) do k, q times (typicaly q is between 10