key: cord-0490632-1jz0ewxf authors: Greferath, Marcus; Roessing, Cornelia title: Group testing via residuation and partial geometries date: 2022-02-09 journal: nan DOI: nan sha: 65a19bb54e031c169af960e553d66e87983ab1a3 doc_id: 490632 cord_uid: 1jz0ewxf The motivation for this paper comes from the ongoing SARS-CoV-2 Pandemic. Its goal is to present a previously neglected approach to non-adaptive group testing and describes it in terms of residuated pairs on partially ordered sets. Our investigation has the advantage, as it naturally yields an efficient decision scheme (decoder) for any given testing scheme. This decoder allows to detect a large amount of infection patterns. Apart from this, we devise a construction of good group testing schemes that are based on incidence matrices of finite partial linear spaces. The key idea is to exploit the structure of these matrices and make them available as test matrices for group testing. These matrices may generally be tailored for different estimated disease prevalence levels. As an example, we discuss the group testing schemes based on generalized quadrangles. In the context at hand, we state our results only for the error-free case so far. An extension to a noisy scenario is desirable and will be treated in a subsequent account on the topic. During the initial low-prevalence phase of a pandemic like the current SARS-CoV-2 pandemic, where a new pathogen has started spreading, it is helpful to use a technique called group testing in order to identify infected individuals, particularly when testing is complicated or otherwise expensive. The technique goes back to initial ideas by Dorfman [2] in 1943, whose challenge was to test WWII troops for syphilis based on an insufficient number of tests available. As the rate of infected soldiers was rather small (far below 10%), it turned out to be possible to significantly reduce the number of tests exploiting the following idea: Test a group of samples in a way that each test is used for a pool of specimen from a variety of samples. Dorfman correctly observed, that it was possible to identify a small number of infected participants out of a larger batch without having to medically check each individual for the infection in question. His method was further refined by Katona [3] who spread the specimen over several test. This method may be considered as a first approach to non-adaptive group testing.s To organize group tests we follow Katona [3] and divise a table (a binary matrix) for tests and participants showing which participant's specimen is contained in which test. For the test, an (unknown) vector of samples is multiplied by the test matrix using the arithmetic of the Boolean semi-field B 2 . The resuling test vector (syndrome) contains information about the infection pattern of the initial vector, and the goal is to reconstruct this pattern from the syndrome. In this paper, we show how this approach to group testing is closely related to the mathematical discipline of Residuation Theory. Its advantage is, that it comes with a natural choice of decision scheme (syndrome decoder) that generically works for a large class of infection patterns. Adopting and refining conditions and techniques discussed in [5] , we suggest to optimize the quality of the testing task by using incidence matrices of a class of finite geometries. These bear advantages due to certain intrinsic regularity, sparsity, and symmetry properties. We will state our results merely for the case where the tests are error-free, where an adjustment to noisy scenarios is devoted to a subsequent article. Our approach is clearly not tied to a particular pathogen, nor does the pathogen itself have to be viral. Its novelty lies in the (re-)establishment of an ordertheoretic approach in conjunction with the application of finite geometries to design non-adaptive pool testing strategies. In contrast with a set-based combinatorial approach to be found in [5] , the Boolean semifield B 2 will play a main role in this paper. On the 2-element set {0, 1}, it is the only nontrivial (yet meaningful) alternative to the binary field F 2 , its operations are described in Figure 1 . An interesting aspect of B 2 , that appears trivial on a first glance: it comes with an order relation ≤, naturally defined by 0 ≤ 1, and with a negation mapping B 2 −→ B 2 , x → x where 0 = 1 and 1 = 0. This involution has the properties x + y = x y and xy = x + y which are well-known as de Morgan's laws. The commutative idempotent monoid B n 2 (equipped with componentwise addition + and identity 0) may be considered as a B 2 -vectorspace with 1 x = x and 0 x = 0 for all x ∈ B n 2 . We have the distributive laws λ (x + y) = λ x + λ y (λ + µ) x = λ x + µ x satisfied for all λ, µ ∈ B 2 and x, y ∈ B n 2 . We will naturally extend order ≤ and negation to B n 2 and omit references to n where-ever confusion is impossible. The reader may readily recognize that what we are talking about is known from Boolean Algebra. The partial order on B n 2 is given via x ≤ y if and only if y = x + y. In fact, defining multiplication · componentwise with identity 1 on B n 2 , we obtain the mentioned Boolean Algebra as the 6-tuple (B n 2 , +, ·, 0, 1, B 2 ). We say, the elements b 1 , . . . , b ∈ B n 2 be linearly indepen- The Hamming distance is defined as the function Following general conventions, we extend this function additively to B n 2 and obtain that In a similar fashion we define the Hamming weight w : As common in coding theory, we observe w(x) = δ(x, 0), and, at least as important as the previous, The Hamming disk of radius d centered in x ∈ B n 2 will be denoted by Particularly the disks B δ (0, d) and B δ (1, d) will play a prominent role in our discussion. The following concepts are slightly deviating from what is standard in the literature. For more information see [5] and references given there. Definition 2.1: Let H be an n × k-matrix over the semifield B 2 . For a natural number d consider the following two properties: d-Rev: If x ∈ B δ (0, d) and y ∈ B n 2 , then xH = yH will imply x = y. Property d-Rev is a logically stronger version of dseparability introduced in [5] , which says, that the mapping [5] , and one of its immediate consequences is that any set of d + 1 rows of H is linearly independent in the sense defined above. We will show that conditions d-Rev and d-Dis are equivalent. Proof: Assume, the n × k-matrix H satisfies d-Dis, and let x ∈ B δ (0, d) and y ∈ B n 2 be given, such that xH = yH. If x = y, then w.l.o.g. we may assume that there exists i ∈ supp(y) such that i ∈ supp(x). If not, then y ∈ B δ (0, d) as well, and we only need to interchange the roles of x and y. We conclude that row h i of H satisfies h i ≤ xH which comes from assumption d-Dis. In contrast, h i ≤ yH = xH, a contradiction. This proves d-Rev for H. where e is the vector with only non-zero entry 1 exactly in the th position. which is what condition d-Dis asserts. In this section we present the main conceptual novelty of this paper. This will ease the discussion of group testing in that an efficient decision scheme is presented under quite general assumptions. , for all x ∈ A and y ∈ B. We will briefly collect basic facts about residuated pairs. For a source, the reader is referred to [1] . Fact 1: f : A −→ B may be called a residuated mapping, if there is g : B −→ A, such that (f, g) is a residuated pair. The mapping g is then uniquely determined by f . Dually, f is uniquely determined by g which is called the residual of f . Fact 2: f and g are monotone mappings, and there holds g f ≥ id A and f g ≤ id B . Conversely, if two monotone mappings f and g satisfy g f ≥ id A and f g ≤ id B , then they will form a residuated pair. Fact 3: f g f = f and g f g = g. For this reason, the mappings h := g f and k := f g form closure and kernel operators, respectively. We observe that h(x) = x for all x ∈ im(g) and k(y) = y for all y ∈ im(f ). Theorem 2.5: Let f : B n 2 −→ B k 2 be a residuated mapping represented by the n×k-matrix H. Then the residual mapping g : B k 2 −→ B n 2 is given by the assignment y → yH T . Proof: In the following we will need to carefully observe the effects of the negation-operator: they will turn sums into products and products into sums. According to our list of facts, we only have to check if g is monotone, and if f g ≤ id B k 2 and g f ≥ id B n 2 . To begin, we compute which already implies the monotone property of g. For y = f (x) we now have y j = ≤n x H j , and this leads to for all 1 ≤ i ≤ n, and thus g f (x) ≥ x for all x ∈ B n 2 . Likewise, in a strictly dual fashion, we compute for all 1 ≤ j ≤ k, and hence f g(y) ≤ y for all y ∈ B k 2 , which completes the proof. We will shed further light on condition d-Rev. A proof of the following theorem will be given in a more detailed account. Theorem 2.6: Let f : B n 2 −→ B k 2 be a residuated mapping represented by the n×k-matrix H, and let g denote the residual mapping of f . Then the following are equivalent: (a) H satisfies d-Rev. (b) g f (x) = x for all x ∈ B δ (0, d). (c) B δ (0, d) ⊆ im(g). We enter the central topic of the paper. We will follow the matrix model of non-adaptive group testing as first presented in [3] and reformulate it to suit our language of Residuation. Definition 3.1: Let n and k be natural numbers. An (n, k)group testing scheme is a residuated mapping f : B n 2 −→ B k 2 . The matrix H representing f is referred to as testing matrix. As described in the introduction, typically an (unknown) vector x ∈ B n 2 represents a list of samples. These samples are distributed to k different tests, and H describes which sample is participating in which test. The testing process is mathematically represented by the multiplication of x by H, so that y = xH ∈ B k 2 becomes the vector of (known) test results. It would now be desirable to have a mapping g : that assigns each test result y ∈ B k 2 a sample vector x ∈ B n 2 (hopefully) showing a valid pattern of infective samples. Such a mapping g would be referred to as a decision scheme, or a decoder for the group testing scheme. Definition 3.2: An (n, k, d)-group testing scheme is an (n, k)-group testing scheme f together with a decoder g : If such a decoder g exists, we will say, f allows the identification of up to d infected samples, and it is obvious, that maximizing d and minimizing k are conflicting goals. For given d ≤ n, an (n, k, d) group testing scheme will be called optimal, if for every (n, k , d) group testing scheme there holds k ≥ k. There is a natural interest in decision schemes, because for group testing these are what syndrome decoders are for linear codes. In what follows, an incidence structure will be a pair (P, B), where P is a set of points, and where B ⊆ 2 P is called the set of blocks. If a point p ∈ P is contained in the block c ∈ B, then we say that p is incident with c. All of the incidence structures in this paper will be finite, meaning P (and hence B) are finite sets. Incidence matrices may thus be considered as indicator functions of their underlying incidence relation. Definition 4.2: For natural number s and t, a finite incidence structure (P, L) consisting of points and lines is called a partial linear space of order (s, t) if the following axioms hold: • Every line is incident with s + 1 points, and every point is incident with t + 1 lines. • Two different points are connected by at most one line. It is obvious, that in a partial linear space, two different lines meet in at most one point, and hence this class of incidence structures is self-dual in the sense, that interchanging the terms "line" and "point" will transform a partial linear space of order (s, t) into a partial linear space of order (t, s). A particularly well understood class of partial linear spaces is that of the generalized quadrangles. These spaces were first introduced by J. Tits [7] and enjoy cross links with many other structures in geometry and algebra in general. As indicated earlier, the pair (s, t) is referred to as the order of the quadrangle. Remark 4.4: A generalized quadrangle of order (s, t) has (s + 1)(st + 1) points and (t + 1)(st + 1) lines. We refer to the standard reference [6] for more information on generalised quadrangles. In terms of group testing, we can state the following theorem, the proof of which is rather simple. Theorem 4.5: Let (P, L) be a partial linear space of order (s, t), and let 1 , . . . m denote a collection of m distinct lines in L. If ∈ L is a line with ⊆ 1 ∪ · · · ∪ m then = j for some 1 ≤ j ≤ m provided m ≤ s. Proof: Assume = i for all i = 1, . . . , m. Then | ∩ i | ≤ 1 which shows which contradicts m ≤ s. For the incidence matrix of a partial linear space (P, L) we may derive the following immediate conclusion. Corollary 4.6: The incidence matrix of a partial linear space of order (s, t) satisfies condition s-Rev. Corollary 4.7: The incidence matrix of a generalized quadrangle of order (s, t) yields an (n, k, d)-group testing scheme where n = (s + 1)(st + 1), k = (t + 1)(st + 1), and d = s. We have presented the well-established discipline of nonadaptive group testing in the previously neglected language of Residuation Theory on vector spaces over the Boolean semifield B 2 . In doing so, we have developed a few novel aspects: (a) Our group testing schemes are modelled by residuated mappings f , the residual g may serve as decision schemes (decoders) for the syndrome decoding problem: For given y find x such that f (x) = y. (b) These testing schemes identify every infection pattern that is contained in im(g). (c) A condition d-Rev, is introduced that exactly describes the condition under which B δ (0, d) ⊆ im(g). Exactly the group testing schemes satisfying d-Rev will identify up to d infected samples out of a given group of samples by application of the residual mapping g as decoder. (d) If the matrix H represents f , then d-Rev is given for H if and only if B δ (1, d) ⊆ colspace(H). (e) Condition d-Dis as discussed in [5] is equivalent to d-Rev in this paper. (f) The incidence matrices of partial linear spaces of order (s, t) satisfy s-Dis and hence are suitable as testing matrices. (g) We have not made any assumptions on the relationship between k and n. In traditional noise-free group testing, k will be desired to be significantly smaller than n. In a noisy environment, and if error-correction is required as tests are inexpensive but unreliable, it appears possible to still strive for k ≤ n, but in general we might need to allow for k ≥ n. Residuation Theory The Detection of Defective Members of Large Populations A survey of combinatorial theory Tapestry: A Single-Round Smart Pooling Technique for COVID-19 Testing Essential Coding Theory Finite generalized quadrangles Ovoides et groupes de Suzuki We thank Ruth Greferath (Ph.D.) and Paul Wright (Ph.D.) for useful advice and discussions.