key: cord-0180414-ovtcdnis authors: Ilmer, Ilia; Ovchinnikov, Alexey; Pogudin, Gleb; Soto, Pedro title: More Efficient Identifiability Verification in ODE Models by Reducing Non-Identifiability date: 2022-04-04 journal: nan DOI: nan sha: af91576e380863d0d69ae8a2929c0cb813646188 doc_id: 180414 cord_uid: ovtcdnis Structural global parameter identifiability indicates whether one can determine a parameter's value from given inputs and outputs in the absence of noise. If a given model has parameters for which there may be infinitely many values, such parameters are called non-identifiable. We present a procedure for accelerating a global identifiability query by eliminating algebraically independent non-identifiable parameters. Our proposed approach significantly improves performance across different computer algebra frameworks. Structural parameter identifiability is an crucial for design of mathematical models with ordinary differential equations (ODEs). For a given model, we may ask whether a parameter (or multiple parameters) can be discovered given sufficiently strong inputs and noiseless outputs. If the answer is positive, we say that a parameter (or multiple ones) is structurally identifiable. We can further categorize structural identifiability into local and global identifiability. The former corresponds to multiple possible parameter values that can be recovered for a given model. The latter means that a parameter value can be recovered uniquely. Otherwise, we say that a parameter is non-identifiable. Existing programs for parameter identifiability analysis rely on differential algebra and algebraic geometry. One example is SIAN [14, 15] , implemented in MAPLE [15, 16] , and Julia [23] . It uses F4 algorithm [8] to compute Gröbner basis to determine global identifiability properties. This step can be computationally costly and has double exponential theoretical complexity in the worst case [20] . There have been further developments in Gröbner basis computation, see for instance [1, 7, 9] . SIAN produces a polynomial system from the input ODE model. At that time in the program, the locally identifiable and non-identifiable parameters are known. Both of these classes of parameters are present in the polynomial system, however, if we know that some are non-identifiable, we can substitute several of them with numerical values thus reducing the workload for F4 algorithm. In this work, we present a method for finding the combination of non-identifiable parameters that can be substituted in SIAN and significantly reduce the runtime of Gröbner basis algorithm. We will demonstrate the result of our algorithm on a collection of ODE models that may present a challenge for the F4 algorithm. The computation is performed in MAPLE 2021. 2 The original algorithm for finding a Gröbner basis of a polynomial ideal was presented by Buchberger in [3] . However, the solution depends on multiple decisions such as selection strategy of polynomials and can be time-consuming [20, 10] . Faugère presented F4 [8] and later F5 [9] algorithms that leverage better selection strategies for polynomials and linear algebra during computation. Recently, [1] addressed the termination and complexity properties of the F5 algorithm. For an overview of F5-based solutions, see [7] . Solutions for the identifiability problem have implementations in various programming languages. Structural Identifiability Analyzer (SIAN) [14, 15] was implemented in MAPLE and Julia [23] and is capable of global and local identifiability analysis. Algorithms for finding multi-experiment identifiable combinations [24, 25] extended SIAN and are available on the web [16] . Fast local identifiability check based on power series was presented in [29] . A new global identifiability algorithm of [6] is implemented in Julia and has been included into the Julia's Scientific Machine Learning (SciML) infrastructure. Among other widely used packages, we highlight such solutions as web-based COMBOS [21] and COMBOS 2 [17] , DAISY [27] and DAISY 2 [28] and GenSSI 2.0 [18] . For a deeper overview of existing identifiability methods, algorithms, software, and benchmarks we refer to [22, 4, 31, 30, 26 ]. We aim to accelerate global identifiability assessment of SIAN [14] . Let us present the typical input format accepted by the program where f = (f 1 , . . . , f n ) and g = (g 1 , . . . , g n ) with f i = f i (x, µ, u), g i = g i (x, µ, u) are rational functions over the field of complex numbers C. The vector x = (x 1 , . . . , x n ) represents the time-dependent state variables and x represents their derivatives. The time-dependent vector-function u = (u 1 , . . . , u s ) represents the input variables. The m-vector y = (y 1 , . . . , y n ) represents the output variables. The vector µ = (µ 1 , . . . , µ λ ) represents constant parameters and x * = (x * 1 , . . . , x * n ) defines initial conditions of the model. The output functions y are differentiated to compute truncated Taylor polynomials at time t = 0, see [15, Theorems 3.16, 4.12] for details on the truncation bound. Further, SIAN samples x * , µ to evaluate each component y i and its derivatives. Gröbner basis lets one check if the sampled quantities x * , µ are unique that result in the computed values y i . This makes SIAN a randomized Monte-Carlo algorithm, and the user can specify the probability of correctness. At the time of computing Gröbner basis in SIAN, we know the local identifiability and non-identifiability of all parameters. It is tempting to exclude the latter from further consideration by numerical substitution. However, there is a subtlety, as the choice among non-identifiable parameters may affect the identifiability properties of others. Example 1. Consider the following ODE system The Structural Identifiability Toolbox [16] reports that parameter p 6 as non-identifiable. Assume we would like to substitute p 6 ⇒ 0 everywhere in the ODE. As a result, only-locally-identifiable parameters p 4 , p 7 become globally identifiable. If we substitute non-zero numbers into all non-identifiable parameters, for instance, p 2 ⇒ 131, p 3 ⇒ 93, p 5 ⇒ 17, p 6 ⇒ 41, the outcome is the same. To increase the efficiency by maximizing our substitution choices, we perform substitutions into a maximal set of algebraically independent variables B (a transcendence basis) in the polynomial system E t produced in [15, Algorithm 1, Step 2]. Such a set is found in Algorithm 1. In the implementation, the Jacobian matrix entries are sampled randomly according to [15, Algorithm 1, Step 3] to find pivot columns. We can use the same sampling bound as in [15] to get the pivot columns with the same probability of correctness. Algorithm 1: Finding algebraically independent parameters Input: Polynomial system E t in variables X Output: Set of algebraically independent variables 1 JacobianM atrix ← ∂E t ∂X ; 2 JacobianM atrix.sample(); 3 P ivots ← JacobianM atrix.pivotColumns(); Proof. See Section A.1 The result of Algorithm 1 is not unique and can change based on the column arrangement. For a basis of size k, we considered all N = n k combinations of columns if feasible. For large values N , we create a large sample of K < N combinations. In the final code, the user can specify sample size K. The heuristic for picking the best possible transcendence basis is as follows: 1. For a given transcendence basis T of size k, before performing the substitution, collect degrees of monomials (i.e. sum of degrees of each variable in monomial) that contain elements of T avoiding double-counting of degrees (i.e., if a monomial contains more than one transcendental element, we collect its degree once); 2. From the previous step, we obtained an array A µ of integer degree values d µ [m] for each member µ ∈ T and monomial m; 5. At this step, we have a collection of entropies {H µ |µ ∈ T } for each possible transcendence basis. Sort this collection, so that larger entries end up last (this can be done by a lexicographic approach). Pick the last transcendence basis. Figure 1 shows how the CPU time changes depending on the value of the resulting entropies. We pick the right-most basis with "highest" entropy values. Upper bound trend line (in red) shows that our choice yields better-than-median improvement. Let us explain the reasoning behind the approach above. The degree weighted count entropy of a transcendental element µ is capturing three heuristics at once: 1. Does µ appear in a lot of monomials? 2. Does µ have a large total degree? 3. Does µ typically contribute to monomials of a large degree? All considerations above arise from possible causes of computational hardness for an F4 algorithm to compute Gröbner basis. Simply put, we are counting the instances of µ with a weight that captures its typical (see [5] ) degree weight. Let us illustrate this with a simple example: suppose that p 1 and p 2 appear in the following collections M p1 , M p2 In this section, we present CPU time and memory usage comparison across three different setups for MAPLE and Magma computer algebra systems. We use the variable order as in [2, eq. 8]. We present CPU time and memory usage of F4 algorithm on SIAN-produced ideals: • with no changes (positive dimension) • without transcendence basis (zero-dimensional ideals), denoted as "0-dim" • without transcendence basis and with weighted ordering as defined by the main result of [2] These results are shown in tables 1 to 4. We observe significant improvements, especially in cases where MAPLE would otherwise be unable to complete the Gröbner basis computation. Combined with the weighted ordering [2] , we observe an significant combined speedup. In Table 1 and Table 2 , "N/A" stands for the following error message returned by MAPLE: "Error, (in Groebner:-F4:-GroebnerBasis) numeric exception: division by zero". We presented an enhancement that decreases runtime and memory consumption of identifiability testing algorithm SIAN [14] . Our solution consists of finding and removing the transcendence basis from the algebraically independent non-identifiable model parameters. In addition, the method can be of increased efficiency if combined with other Gröbner basis-specific improvements, such as weighted variable orderings. We showed this by applying the weighted ordering method [2] to a polynomial system with transcendence basis removed. It is possible to optimize the substitution procedure by, for instance, sampling the random values for transcendental elements directly into the polynomials. However, we did not observe significant differences at the final Gröbner basis stage. In this section, we prove our main result about the probability guarantee for the random substitutions into the transcendence basis and present details about ODE models we considered for transcendence basis substitution. We will present the ODEs, the output functions, and the discovered bases of the models used in the analysis of this paper. By [15, Lemma 4.4] , applied to X being an irreducible component V i of the variety V defined by E t and π the projection to the affine space A |B| of the B-variables, there exists a proper subvariety Y i ⊂ A |B| such that deg Y i ≤ deg V i and, for every a ∈ A |B| \ Y i , we have: π −1 (a) ∩ V i = ∅. Suppose θ is a locally but not globally identifiable parameter. By [15, Theorem 5.5] , the projection of V onto θ-axis is finite and has the cardinality > 1 if the integer substitutions that are used to convert E t into E t are outside the zero set of the polynomials Q and P 2 , which are defined in the proof of [15, Theorem 5.5] . Thus there exist components, say, V 1 and V 2 of V with different projections onto the θ-axis. We choose b such that b ∈ Y i for every i. Then there exist a 1 ∈ V 1 ∩ π −1 (b) and a 2 ∈ V 2 ∩ π −1 (b) having different θ-coordinates. By [13, Proposition 3] , there exists a nonzero polynomial P s of degree at most i deg Y i ≤ deg V such that, for each i, P s is zero on Y i . So, outside of the zero set of P s Q P 2 , substitutions into the B-variables cannot convert non-global identifiability into global identifiability. The proof of [15, Theorem 5.5] introduces a number D 2 ≥ 6 deg E t /(1 − p) ≥ 6 deg P s /(1 − p) and such that deg Q P 2 ≤ D 2 (1 − p)/2. We finally to obtain the desired probability guarantee. The system eq. (3) originates in [11] describing time periodicity in cell behavior. This example has 4 state variables x 1,2,3,4 and 6 parameters.     ẋ The transcendence basis found by our approach here has size 2 and the optimal combination consists of the initial condition x 3 (0) and parameter γ. The following model is a COVID-19 epidemiological model coming from [19, example 37, In this model, even with transcendental elements removed, the computation does not finish without using weighted ordering via algorithm of [2] . The transcendence basis is δ, R(0). This is a biomedical model applied to HIV infection in [32] . The outputs were changed to make the system more of a computational challenge to SIAN.       ẋ = λ − dx − βxv,ẏ = βxv − ay, v = ky − uv,ẇ = czyw − cqyw − bw, z = cqyw − hz, y 1 = w, y 2 = z The transcendence basis of this model is β, c. The next SEIR model for COVID-19 was presented in [19, Example 34] .   Ṡ = Λ − rβSI/N − µS,Ė = βSI/N − ( + µ)E, I = E − (γ + µ)I,Ṙ = γI − µR, y = I + R. Transcendence basis for Equation (6) is β, N . This model comes from [12, Equation 67 ]   ẋ = −xa + zy + ay,ẇ = ez − wf + xy, z = −cz − wd + xy,ẏ = bx + by − xz, g = x In this model, there is only one transcendental parameter d. Substituting this parameter, we observe a tremendous speedup: the system finishes computation without error in MAPLE. More Efficient Identifiability Verification in ODE Models by Reducing Non-Identifiability A PREPRINT This COVID-19 model has transcendence degree 7 with basis consisting of A(0), I(0), N (0), R(0), d 2 , d 3 , d 6 . The model comes from [19, p. 26] where we added the equationṄ = 0. = b N − S (I λ + λ Q a q + λ a A + λ j J + d 1 ), I = k 1 A − (g 1 + µ 2 + d 2 ) I, R = g 1 I + g 2 J − d 3 R, A = S λ(I + a q Q + a A + j J) − (k 1 + µ 1 + d 4 ) A, Q = µ 1 A − (k 2 + d 5 ) Q, J = k 2 Q + µ 2 I − (g 2 + d 6 ) J, N = 0, y 1 = Q, y 2 = J The authors are grateful to CCiS at CUNY Queens College. This work was partially supported by the NSF under grants CCF-1563942, CCF-1564132, DMS-1760448, DMS-1853650, and DMS-1853482 On the complexity of the F5 Gröbner basis algorithm Obtaining weights for Gröbner basis computation in parameter identifiability problems A theoretical basis for the reduction of polynomials to canonical forms Structural identifiability of systems biology models: a critical comparison of methods Elements of information theory Differential elimination for dynamical models via projections with applications to structural identifiability A survey on signature-based algorithms for computing Gröbner bases A new efficient algorithm for computing Gröbner bases (F4) A new efficient algorithm for computing Gröbner bases without reduction to zero (F5) One sugar cube, please" or selection strategies in the Buchberger algorithm Oscillatory behavior in enzymatic control processes Reduction of dimension for nonlinear dynamical systems Definability and fast quantifier elimination in algebraically closed fields SIAN: software for structural identifiability analysis of ODE models Global identifiability of differential models Web-based Structural Identifiability Analyzer COMBOS2: an algorithm to the input-output equations of dynamic biosystems via Gaussian elimination GenSSI 2.0: multi-experiment structural identifiability analysis of SBML models Structural Identifiability and Observability of Compartmental Models of the COVID-19 Pandemic The complexity of the word problems for commutative semigroups and polynomial ideals On finding and using identifiable parameter combinations in nonlinear dynamic systems biology models and COMBOS: a novel web implementation On identifiability of nonlinear ODE models and applications in viral dynamics SIAN: Structural Identifiability Analyzer Computing all identifiable functions of parameters for ODE models Multi-experiment parameter identifiability of ODEs and model theory Comparison of approaches for parameter identifiability analysis of biological systems DAISY: An efficient tool to test global identifiability. Some case studies A new version of DAISY to test structural identifiability of biological models A probabilistic algorithm to test local algebraic observability in polynomial time Observability and structural identifiability of nonlinear biological systems Benchmarking optimization methods for parameter estimation in large kinetic models Specific therapy regimes could lead to long-term immunological control of HIV