key: cord-0158876-3mf5zw9n authors: Pesce, Elena; Rapallo, Fabio; Riccomagno, Eva; Wynn, Henry P. title: Circuit bases for randomisation date: 2021-05-07 journal: nan DOI: nan sha: f4fec5b773c1f1c5398584121c01ebd261600ad8 doc_id: 158876 cord_uid: 3mf5zw9n After a rich history in medicine, randomisation control trials both simple and complex are in increasing use in other areas such as web-based AB testing and planning and design decisions. A main objective is to be able to measure parameters, and contrasts in particular, while guarding against biases from hidden confounders. After careful definitions of classical entities such as contrasts, an algebraic method based on circuits is introduced which gives a wide choice of randomisation schemes. There are ways in which a regression model can be biased because of the neglect of hidden variables, sometimes called hidden confounders. To some degree these biases can be removed using randomisation. A major source of conceptual difficulties is the continuing distinction between passive observation, characterised by the terms "observational study" and controlled experiment. In addition this distinction is flavored by different intellectual traditions. In most fields a controlled experimental design is conceived as an intervention. Thus one talks about setting the level of a variable X, or applying a treatment or treatment combinations. A weaker version of intervention would be a form of selection. Rather than interfere too much with the state of nature one may simply select a value of X which is already in a population, such as selecting a subject of a particular age. Stratification is in this category as is "matching", observing (or treating) a collection of subject who are close in terms of some multivariate metric applied to the possible confounders. "Natural Experiments" exploit opportunities where Nature has unwittingly designed an experiment for us. For a very thorough compendium of experimental design methodology both as intervention and as selection, see [5] . Traditions in agriculture and social sciences have stressed the role of randomisation, and indeed the method has been described as one of the the greatest contributions of statistics to scientific methodology; a major review is [4] which goes a long way towards updating earlier discussions such as [9] . After a long period in which factorial and optimum controlled experiments may be seen to have had a dominant role, influenced by success in product design and quality improvement, randomisation is making a come back, if indeed it ever left the limelight. It is now used extensively outside its traditional areas of clinical trials under the generic term randomised control trials (RCT). Notably, there is a fast growing application to experiments in social media, under the heading AB experiment in on-line marketing, see [10] , and to socio-technical experiments, such as smart metering in homes and transport, see e.g. [8] . Other important developments are in the field of "big data", where data is often collected without experimental design being used, so that biases can be a serious impediment to model building, see [7, 12, 13] . There seems to be no doubt that in nearly all fields the removal of biases in modelling is a major reason to randomise. The question then remains as to whether the randomisation, or rather the randomisation distribution, is to be used in the analysis, e.g., probability statements are made based on the randomisation, or whether randomisation should only be used in the design, e.g. for bias reduction. The latter approach is probably more common and is adopted here. A compromise position is a minimax approach which is closely related to the use of randomisation in finite population sampling, see [14, 15, 16, 19] . There is a subtle relationship between randomisation and combinatorial design, which is perhaps closest to the present paper, see [1] . Our purpose is to introduce a specialised but also very general technique, namely the theory of circuits, already studied in numerical analysis and algebraic statistics. This sets the paper in the sub-areas of "complex randomisation", "structured randomisation", or "randomised blocks". Some of the disparate interpretations of randomisation can be understood from a simple AB (RCT) experiment. Using traditional terminology, we want to assess the difference between the effect of two treatments A and B with effects θ 1 and θ 2 , respectively. That is we want to estimate φ = θ 1 − θ 2 . A standard model is to write for subjects i and j receiving treatments A and B, respectively where n 1 , n 2 are the respective sample sizes and δ 1i , δ 2j are unit effects of other influences, be the errors of measurement or other (hidden) factors. The naive estimate of the treatment difference iŝ Here the estimates of θ 1 and θ 2 are given by the respective sample means: where for instanceȲ 1· is the usual notation for the average of measurements over group A. The standard argument, and this is probably also the common sense argument of non-experts, is that if we randomise then the difference between the mean values of the deviations due to other factors will cancel out: δ 1· − δ 2· , will be approximately zero and will not perturbφ. Of course, if δ 1i , δ 2j are random with standard assumptions thenφ is both the Least Squares estimate and the best linear unbiased estimate of φ. A critical question is: what does the model mean, both scientifically and predictively? What are θ 1 , θ 2 and φ? More precisely, do parameter values refer to the finite population from which the sample was taken or to which the treatment were applied? Or is there some larger population of which the population of units under study is a subpopulation, such as all present and future subjects who may benefit from a vaccination decision base on the results of the experiment? Or even more metaphysically, are they part of a larger scientific theory, maybe even a "crucial experiment", to decide between two scientific theories? These questions are important, for example with AB experiments on people using social media. The commercial opportunities in terms of the use of huge (big) data sets come with a risk of bias arising from any number of demographic and operations factors. It is almost impossible to describe the population of social media users but if bias can be removed in some simple way then the estimates can genuinely reflect peoples' choices and behaviour. A naive but rather universal conclusion is something like: after randomisation we can use the model, which is sometimes expressed in a more expert fashion as: make sure you randomise your blocks. On the one hand this paper takes this simple approach, but on the other introduces a special technique, based on circuits, to decompose an experiment into mutually exclusive blocks in each of which randomisation can be carried out separately. Some solutions comprise recognisable combinatorial objects, others derive from running the programme 4ti2 [20] to obtain the circuits, see subsection 7.1. After a rather elementary formulation of the problem in the next two sections, we formally define the quest for what we will call valid randomisation schemes in Section 5, followed by a short discussion on analysis in Section 6. Sections 7 and 8 are the main developments, with Section 7 describing a sufficient condition that non-negative binary circuit (to be defined) gives a valid randomisation and Section 8 some special conditions. As we have stated in the AB case, randomisation is particularly suited to situations in which standard estimates are unaffected by a uniform shift of the observations, which is then subtracted out. Consider an experiment giving a random sample Y 1 , . . . , Y n . We have the following: for functions {f j (x)}, x a generic point in some design space X and ǫ a random error with the usual assumptions (zero mean and constant variance). An experimental design D = {x (i) , i = 1, . . . , n}, with sample size |D| = n, has design matrix X = {f j (x (i) )}, and we express the standard regression set-up, under standard assumptions by: where E is the expectation and θ the parameter vector. To repeat, the basic idea is to divide experiment into disjoint blocks in each of which we randomise, and then combine the results. Example 3.3 (2 2 experiment). We consider a simple example from linear regression, namely a 2 2 factorial design problem, with ±1 levels and no replication (for simplicity). We take the model without an interaction If we randomise a large population and uniformly apply the four combination of the design, {±1, ±1}, the potential bias effect will be negligibly small because the estimators of the θ-parameters are unbiased. But there is an alternative. Split the population into two groups randomise each separately and apply the controls (x 1 , x 2 ) = {(1, 1), (−1, −1)} to the first group and {(1, −1), (−1, 1)} to the second group. Then we can estimate θ 1 + θ 2 from the first group and θ 1 −θ 2 from the second group. Combining these estimates gives the same result, except possible the small effect or confounders, as if we randomised over the whole 2 2 experiment. Note that the parameters θ 1 and θ 2 and their estimates are already respectively parametric contrasts and empirical contrasts. This can be seen as splitting the 2 3 experiment into two (randomised) AB experiments. In the case of the orthogonal design described above the X-matrix takes the form X = [j : X 1 ], where j is the n-vectors of ones, for the constant (intercept) term, and X 1 is orthogonal to j: j T X 1 = 0. We describe such an X matrix as being in contrast form. All empirical and parametric contrasts are derived from X 1 . Thus we can prove the following lemma. Notice that from any model with integer design matrix X it is always possible to derive a reparametrisation with design matrixX written in contrast form. A design matrix X with column space containing the vector j = (1, 1, . . . , 1) T can be transformed to contrast formX = [j : X 1 ] with the same column space as X, where j T X 1 = 0. Proof. We can easily determine the reparametrisation which the transformation requires. Starting with:X φ = Xθ, we solve for φ: The term contrast is especially prevalent in Analysis of Variance (ANOVA) models, that is models for qualitative factors in which each level of each factor provides a parameter for an additive model. The classical notation, say, for a two-way I × J table with two factors is that the additive model would have parameters α i , (i = 1, . . . , I) and β j , (j = 1, . . . , J) and the model for the observations Y ij is where {ǫ ij } are the random errors with standard assumptions. We show this with an example. This X-matrix is not in contrast form, but it can be transform to one that is: From this, the reparametrisation is: Note that we have limited the analysis here to the decomposition ofX into [j, X 1 ] since for randomisation we are interested in the decomposition of the vector j, but the results in this section and many results about the circuit bases in the next sections could be written in general for a decomposition ofX into [X 2 , X 1 ] with X T 2 X 1 = 0. Using the representation of the design matrix in contrast form we can introduce and analyze the randomisation systems in order to give general answers to the questions stated earlier in Sect. 2 in the framework of AB experiments. The separation into blocks is described by the following definitions. Definition 5.1. For observations Y i , (i = 1, . . . , n) a potential randomisation system R is a set partition of N = {1, 2, . . . , n}, namely a decomposition of N into disjoint exhaustive subsets, R 1 , . . . , R k , called blocks, of size 2 or more: ( . For a regression model and experimental design D n with sample size n and a design matrix in contrast form [j : X 1 ], a valid randomisation system is a potential randomisation system for which all the associated binary vectors The next two examples are familiar in the sense that the orthogonal blocks are easily associated with addition factors or parameters in an orthogonal design. The third example may be less familiar. 5.1. Factorial fractions. We consider a 2 3 factorial experiment for main effects. The standard X-matrix is already in contrast form: In addition to a full randomisations there are two different randomisation systems and we list the R j partitions for each. . These two distinct randomisations of this example correspond to familiar decomposition into blocks based on abelian groups (see eg [2] ). The first arrives from a 2 3−1 experiment with defining contrast subgroup in classical notation The second corresponds to the 2 3−2 with subgroup For those more familiar with the algebraic design of experiments, these solutions are the point ideal corresponding respectively to the solutions of (1) : Consider an I × I table with the usual additive model. A Latin square based on the table has the usual definition. If I = 3 there are two mutually orthogonal Latin squares; in traditional notation: Each square gives a different valid randomisation based on the letters. Labelling the observations left-to-right and top-to-bottom the respective blocks are (ignoring commas) {159, 267, 348}, {168, 249, 357}. We state the general result without proof and in the terminology of this example. An additive preference model has (without replication) the six values Y i,j with the model Y ij = α i + α j + ǫ i,j (i, j = 1, 2, 3, 4; i < j) We are interested in contrast α i − α j , because their estimates would yield an estimated preference order. In this case: This gives a choice of X 1 : and the randomisation {16, 25, 34}. The informal approaches we have taken is that, for large samples randomisation has approximately the effect of introducing a block parameter. Our condition of orthogonality in the definition of valid randomisation and as exemplified, has so far ignored the fact that in standard terminology blocks do not have to be orthogonal. Indeed, there is rich theory of balanced incomplete blocks (BIBD) both from combinatorial and from optimal design theory. We note here some basic facts about orthogonal versus non-orthogonal blocks. (1) For orthogonal designs we set up a model in which every j-vectors is allocated a block parameter, then only under orthogonality is the usual LSE of the θ-parameters the best and there is no bias of these estimates from the block effects. (2) In the non-orthogonal blocks design case, if we use the LSE of the θparameters assuming that the block parameters are zero, when they are not, then the block parameters introduce bias. (3) In the non-orthogonal blocks case the "proper" LSE estimate of the θparameters in the presence of the block parameters, will be unbiased but will have higher variances than in case (2) above (the covariance matrix will Loewner-dominate). Models with non-orthogonal blocks with a specified block effect, require some effort to model or at least interpret the block affect, for example the effect of day if the experiment is conducted over days. In such cases a bias model is required. But where bias is caused by hidden, unspecified, confounders, such a bias model seems somewhat artificial. The effects are too artificial to model but sufficiently present that we prefer orthogonality. In this section, we introduce the circuits of a matrix to analyse the problem of randomisation. We consider the randomisation as the decomposition of the vector j = (1, . . . , 1) T into binary vectors: where each vector j h is a binary vector satisfying j T h X 1 = 0, h = 1, . . . , k. Such binary vectors j h are called binary randomisation vectors. Next, we introduce the circuits and their main properties. Let A be an integer-valued matrix with d rows and n columns. For our purposes, we can assume that A = X T 1 . Let u ∈ Z n be an integer-valued vector, u + be the positive part of u, namely u where Q is the set of rational numbers; (2) u has minimal support, i.e., there is no other circuit v with supp(v) ⊂ supp(u). Definition 7.2. The set of all circuits of the matrix A is named as the circuit basis of A and it is denoted with C(A). The circuit basis C(A) is always finite. The minimal support property gievs rise to a number of interesting properties of C(A). We recap in the following proposition the special features of the circuits we will use for describing randomisation. For the proofs and further details the reader can refer to [17] . Proposition 7.3. Let A be an integer-valued matrix with dimensions d × n and suppose that rank(A) = d. (1) The circuit basis C(A) is subset compatible, i.e., if we consider a matrix A ′ by selecting n ′ < n columns, then the circuit basis of A ′ is formed by the circuits in C(A) with support contained in the n ′ columns. When a non-negative binary circuit j 1 gives a valid randomisation, then also j 2 = j − j 1 is a binary non-negative vector in ker(A) so that the decomposition j = j 1 + j 2 is a valid randomization. Note that j 2 may be a circuit itself (and in such a case we call j = j 1 + j 2 a non decomposable randomisation), or not. In the latter case, the vector j 2 can be decomposed into the sum of non-negative circuits. From Proposition 7.3, Item 3, we see that the circuit basis, and in particular the set of non-negative circuits, is the natural tool to find valid non decomposable randomisations. In general, if the vector j can be written as the sum of binary non-negative circuits we have a valid randomisation. The main problem posed in this paper is to provide conditions for when there is a converse, that is to say classes of experimental designs, for which every randomisation vector j h is a circuit. In the next section we will describe an important class, here we have a useful sufficient condition. Lemma 7.5. If j 1 is a non-negative binary randomisation vector with two non-zero elements (#supp(j + 1 ) = 2), then it is a circuit of X T 1 . We can see that for every j 1 -vector in example covered by Lemma 7.5, there are two rows of X T 1 which have opposite signs. This is the case in Section 5.3 which yields the following result. This shows that if we have a valid randomisation comprising binary vectors each with two non-zero binary vectors then it will be found by inspecting the list of all circuits. 7.1. Computation of circuit. To find the randomisation systems from the circuit basis, we start from the design matrix X, we write it in contrast formX, and we extract the contrast matrix X 1 as described above. The actual computation of the circuits of the matrix X 1 can be done with the software package 4ti2, see [20] . In 4ti2 there is a function called circuits which computes the circuits of an integer matrix. The algorithms to compute circuits in 4ti2 belong to the class of combinatorial algorithms, and thus there is a limitation on the size of the matrices for which the computation of the circuit is actually feasible. In our experiments, problems with a set of points up to 50 are easily processed, but the execution time increases fast with the number of points. However, all the contrast matrices illustrated in this paper have been processed by 4ti2 in less than 0.1 seconds. 4ti2 is now available also within the symbolic software Macaulay2, see [21] , and there are R packages available which allow the communication between R and Macaulay2, leading to a flexible use of the symbolic computations into statistical analysis. With the aid of the circuits we are able to analyse also more complex models where the number of randomisation systems is relatively large. Example 7.9. In the case of 2 4 design with contrasts on the main effects, the contrast matrix is: and the situation becomes more complex. Although 0.02 seconds are enough to obtain the whole set of 456 circuits, the non-negative circuits are now 48 but there are also non-binary circuits with entries equal to 2. Selecting the binary circuits reduces to 32 circuits: 8 circuits with support on two points give a unique randomisation based on 2-ers, with the remaining 24 circuits on 4 points we can construct 30 valid randomisations. Each circuit on 4 points is used in 5 possible randomisations. With a large choice of randomization schemes the problem arises as to which to choose. This is discussed briefly in Section 9. Although the factorial design and Latin square examples can be considered wellknown, because of orthogonality properties of both, example in Section 5.3 may be less so. So we may ask what is the property of X T 1 for which the full valid randomisation system can be found as a set of circuits. Proof. This is based on some known results from the theory of Gröbner bases. Thus, for A unimodular, the circuits and the Universal Gröbner basis are equal. In this statement, the circuits should be seen as represented by the so-called binomials, that is if u = u + − u − are formed by "dummy" variables z i exponents given by u: These binomials generate a toric ideal I(A). This ideal is very widely studied, for example in algebraic statistics it is the starting point for Markov Chain Monte Carlo simulation for testing hypotheses on multinomial contingency tables, see [6] . Now, if A is totally unimodular then it is known that the initial ideal in(I(A)) is generated by square-free binomials for any given term-order (required to define a Gröbner basis), see [17] . The initial ideal in(I(A)) of the ideal I(A) is the ideal generated by the leading terms of the polynomials in I(A). Thus, all the binomials in the Universal Gröbner basis U(I(A)) have square-free leading terms. Finally, the non negative circuits are elements of U(I(A)), viewed as binomials of the form x u − 1. The leading term is always x u , it is square-free and therefore u is binary. We now complete the proof with the following. Proof. This is by contradiction. Let j 1 be a (non-negative binary) non decomposable randomisation vector and suppose it is not a circuit. Since j 1 ∈ ker(A), by Prop. 7.3, Item 3, j 1 has a representation as a non-negative linear combination of circuits u 1 + . . . + u k . Take one of such circuits u h . Its support is strictly contained in supp(j 1 ) and note that #supp(j 1 ) − #supp(u h ) > 1, because j 1 is not a circuit and there are no circuits with support on one point. Moreover, the circuit u h is binary by Lemma 8.3. So there is a refinement given by j 1 = u h + (j 1 − u h ), which contradicts j 1 being non decomposable. The most well known example of a totally unimodular matrix is generated by a directed graph G(V, E). The rows are indexed by vertices and the columns by directed edges with the following rule for entries if the edge is e = (i → j) then entries A i,e = 1, A j,e = −1 and all other entries in column e are zero. For A to be an X 1 matrix we need it to be (row) orthogonal to j = (1, 1, . . . , 1) this requires that for any vertex the number of in-arrows and the number of out-arrows must be the same. The graph for this example is pictured in Figure 2 . For the X 1 matrix above, there are 33 nonnegative circuits from a total of 198 circuits: 5 2-ers, 10 3-ers, 10 4-ers, and 8 5-ers. The valid randomisations we obtained from those circuits are reported in the following table giving the cardinality of the subsets and number r of different choices, classified by the corresponding integer partition. Figure 2 : a 5 + 5 + 5 randomisation and a 5 + 2 + 2 + 2 + 2 + 2 randomisation sharing a 5-er. By the properties of the circuits we know that no proper subset is possible in the previous randomisation, so for instance we know that no randomisation of the form 5 + 5 + 3 + 2 can share two 5-ers with the randomisation 5 + 5 + 5. However, the 5 + 5 + 5 shares a 5-ers with the randomisation 5 + 2 + 2 + 2 + 2 + 2, as shown in Figure 3 . We can ask a skeptical general question: given the wealth of combinatorial theory to find orthogonal blocks what benefit does the circuit method have? An immediate answer is that it provides, in appropriate cases, the choice of a large, even very large, variety of valid randomisations schemes and under special conditions all valid randomisations. Weighing designs give some intuition. Historically there are two types. A chemical balance experiment has two pans and compares set of objects. Weighing a set of objects on a one pan weighing machine is very similar to the choice experiments. In the chemical balance the observation itself is an empirical contrast, whereas in the single pan case, we have to reparametrised creating X 1 to obtain contrasts, as in the AB experiment. Informally, we could say that in some cases the contrast matrix X 1 represents a two-pan experiment embedded in a one pan experiments. Valid randomisations form a lattice under refinement which we suggest is natural generalisation of nested randomization. A single non decomposable j-vector is a minimal element. A non decomposable valid randomisation corresponds to partition of N = {1, 2, . . . , n}. There may be more than one non decomposable valid scheme, as we saw in the 2 3 example and in the last example. Also relevant is randomisation cost. It may be that a cost function which is related to the structure of the randomization and which is order preserving with respect to the refinement lattice could lead to useful strategies in cases where, as we have seen, the choice of valid randomisations is very large. That is, we have in the background the idea that more refined randomisation is cheaper. There is something of a computational challenge. As we arrive in the "big data" era we can expect more sources of bias and mote actual bias. If randomisation is to meet this challenge then we need to extend the theory and the technology of randomisation including fast computation. There is a considerable literature on sequential randomisation with a model, in the AB case, that subjects (e.g. patients) are awarded treatments A or B on the equivalent of a toss of a fair coin (there is a considerable work on biased coin design which we do not cover). This is an example where the method in this paper should be a cheaper procedure administratively than randomising over a fixed population in order to conduct a more complex randomised block experiment. Note that in the 2 2 experiment of Example 1 with two blocks of size 2, each block only supplies some of the information. The same for the 4 blocks of size 2 in the 2 3 experiment, whereas for the two 1 2 fractions of size 4 the parameters can be estimated from each block. In the 2-out-of-4 choice experiments we compare similarly attributes (1, 2) v. (3, 4) , (1, 3) v. (2, 4) and (1, 4) v. (2, 3) . The two-pan metaphor is useful. The extension to the k-out-of-2k example is straightforward and the blocks arise from all ways of splitting 2k objects into disjoint set of size k. It is likely in our view that sequential and adaptive randomisation will be increasingly important and costs are traded with effectiveness. Their impressive use in CoViD-19 vaccination trial (e.g. [18, 11] ) is likely to have a lasting impact. Finally, some mathematical remarks. The paper could have been written concentrating the link to matroid theory, because the term circuit is a term from matroid theory and the circuits presented here form a linear circuit. One matroid property, for example, is the fact that if the given circuit as defined here it has minimal support in that no vector with whose support is a subset can be a circuit, but should recall that out of the full set of circuits we select those that have non negative entries. Another mathematical feature which may be useful is that each block of randomisation scheme defined here has an associated permutation group and the full randomisation scheme generates a subgroup of the full permutation group S n . All possible schemes for a particular example may lead to a complex lattice of subgroups under set partition refinement. The relation between matroids and permutations group had been studied in [3] . Valid randomization Statistics for experimenters Bases for permutation groups and matroids Randomization in the design of experiments Handbook of design and analysis of experiments Algebraic algorithms for sampling from conditional distributions Principles of experimental design for big data analysis Scoping Study Using Randomized Controlled Trials to Optimize Small Buildings' and Small Portfolios'(SBSP) Energy Efficiency Programs (No. ANL/DIS-14/8) The randomization theory of experimental inference Online Controlled Experiments and A/B Testing. Encyclopedia of machine learning and data mining Oxford-AstraZeneca COVID-19 vaccine efficacy Experimental design issues in big data: The question of bias Large Datasets, Bias and Model Oriented Optimal Design of Experiments Minimax designs for sample surveys A minimax approach to randomization and estimation in survey sampling The use of random allocation for the control of selection bias Grobner bases and convex polytopes A real-time dashboard of clinical trials for COVID-19. The Lancet Digital Health Minimax purposive survey sampling design 4ti2 team, 4ti2-A software package for algebraic, geometric and combinatorial problems on linear spaces Macaulay2, a software system for research in algebraic geometry