key: cord-0047206-e2fta2s6
authors: Florescu, Dorian; England, Matthew
title: A Machine Learning Based Software Pipeline to Pick the Variable Ordering for Algorithms with Polynomial Inputs
date: 2020-06-06
journal: Mathematical Software - ICMS 2020
DOI: 10.1007/978-3-030-52200-1_30
sha: 071fae23d53d032f076c0ae94b66b6f558dd3cee
doc_id: 47206
cord_uid: e2fta2s6

We are interested in the application of Machine Learning (ML) technology to improve mathematical software. It may seem that the probabilistic nature of ML tools would invalidate the exact results prized by such software, however, the algorithms which underpin the software often come with a range of choices which are good candidates for ML application. We refer to choices which have no effect on the mathematical correctness of the software, but do impact its performance. In the past we experimented with one such choice: the variable ordering to use when building a Cylindrical Algebraic Decomposition (CAD). We used the Python library Scikit-Learn (sklearn) to experiment with different ML models, and developed new techniques for feature generation and hyper-parameter selection. These techniques could easily be adapted for making decisions other than our immediate application of CAD variable ordering. Hence in this paper we present a software pipeline to use sklearn to pick the variable ordering for an algorithm that acts on a polynomial system. The code described is freely available online.

Mathematical Software, i.e. tools for effectively computing mathematical objects, is a broad discipline: the objects in question may be expressions such as polynomials or logical formulae, algebraic structures such as groups, or even mathematical theorems and their proofs. In recent years there have been examples of software that acts on such objects being improved through the use of artificial intelligence techniques. For example, [21] uses a Monte-Carlo tree search to find the representation of polynomials that are most efficient to evaluate; [22] uses a machine learnt branching heuristic in a SAT-solver for formulae in Boolean logic; [18] uses pattern matching to determine whether a pair of elements from a specified group are conjugate; and [1] uses deep neural networks for premise selection in an automated theorem prover. See the survey article [12] in the proceedings of ICMS 2018 for more examples.

Machine Learning (ML), that is statistical techniques to give computer systems the ability to learn rules from data, may seem unsuitable for use in mathematical software since ML tools can only offer probabilistic guidance, when such software prizes exactness. However, none of the examples above risked the correctness of the end-result in their software. They all used ML techniques to make non-critical choices or guide searches: the decisions of the ML carried no risk to correctness, but did offer substantial increases in computational efficiency. All mathematical software, no matter the mathematical domain, will likely involve such choices, and our thesis is that in many cases an ML technique could make a better choice than a human user, so-called magic constants [6] , or a traditional human-designed heuristic.

In Sect. 2 we briefly survey our recent work applying ML to improve an algorithm in a computer algebra system which acts on sets of polynomials. We describe how we proposed a more appropriate definition of model accuracy and used this to improve the selection of hyper-parameters for ML models; and a new technique for identifying features of the input polynomials suitable for ML.

These advances can be applied beyond our immediate application: the feature identification to any situation where the input is a set of polynomials, and the hyper-parameter selection to any situation where we are seeking to take a choice that minimises a computation time. Hence we saw value in packaging our techniques into a software pipeline so that they may be used more widely. Here, by pipeline we refer to a succession of computing tasks that can be run as one task. The software is freely available as a Zenodo repository here: https://doi. org/10.5281/zenodo. 3731703 We describe the software pipeline and its functionality in Sect. 3. Then in Sect. 4 we describe its application on a dataset we had not previously studied.

Our recent work has been using ML to select the variable ordering to use for calculating a cylindrical algebraic decomposition relative to a set of polynomials.

A Cylindrical Algebraic Decomposition (CAD) is a decomposition of ordered R n space into cells arranged cylindrically, meaning the projections of cells all lie within cylinders over a CAD of a lower dimensional space. All these cells are (semi)-algebraic meaning each can be described with a finite sequence of polynomial constraints. A CAD is produced for either a set of polynomials, or a logical formula whose atoms are polynomial constraints. It may be used to analyse these objects by finding a finite sample of points to query and thus understand the behaviour over all R n . The most important application of CAD is to perform Quantifier Elimination (QE) over the reals. I.e. given a quantified formula, a CAD may be used to find an equivalent quantifier free formula 1 .

CAD was introduced in 1975 [10] and is still an active area of research. The collection [7] summarises the work up to the mid-90s while the background section of [13] , for example, includes a summary of progress since. QE has numerous applications in science [2] , engineering [25] , and even the social sciences [23] .

CAD requires an ordering of the variables. QE imposes that the ordering matches the quantification of variables, but variables in blocks of the same quantifier and the free variables can be swapped 2 . The ordering can have a great effect on the time/memory use of CAD, the number of cells, and even the underlying complexity [5] . Human designed heuristics have been developed to make the choice [3, 4, 11, 14] and are used in most implementations.

The first application of ML to the problem was in 2014 when a support vector machine was trained to choose which of these heuristics to follow [19, 20] . The machine learned choice did significantly better than any one heuristic overall.

The present authors revisited these experiments in [15] but this time using ML to predict the ordering directly (because there were many problems where none of the human-made heuristics made good choices and although the number of orderings increases exponentially, the current scope of CAD application means this is not restrictive). We also explored a more diverse selection of ML methods available in the Python library scikit-learn (sklearn) [24] . All the models tested outperformed the human made heuristics.

The ML models learn not from the polynomials directly, but from features: properties which evaluate to a floating point number for a specific polynomial set. In [20] and [15] only a handful of features were used (measures of degree and frequency of occurrence for variables). In [16] we developed a new feature generation procedure which used combinations of basic functions (average, sign, maximum) evaluated on the degrees of the variables in either one polynomial or the whole system. This allowed for substantially more features and improved the performance of all ML models. The new features could be used for any ML application where the input is a set of polynomials.

The natural metric for judging a CAD variable ordering is the corresponding CAD runtime: in the work above models were trained to pick the ordering which minimises this for a given input. However, this meant the training did not distinguish between any non-optimal ordering even though the difference between these could be huge. This led us to a new definition of accuracy in [17] : to picking an ordering which leads to a runtime within x% of the minimum possible.

We then wrote a new version of the sklearn procedure which uses crossvalidation to select model hyper-parameters to minimise the total CAD runtime of its choices, rather than maximise the number of times the minimal ordering is chosen. This also improved the performance of all ML models in the experiments of [17] . The new definition and procedure are suitable for any situation where we are seeking to take a choice that minimises a computation time.

The input to our pipeline is given by two distinct datasets used for training and testing, respectively. An individual entry in the data set is a set of polynomials that represent an input to a symbolic computation algorithms, in our case CAD. The output is a corresponding sequence of variable ordering suggestions for each set of polynomials in the testing dataset.

The pipeline is fully automated: it generates and uses the CAD runtimes for each set of polynomials under each admissible variable ordering; uses the runtimes from the training dataset to select the hyper-parameters with crossvalidation and tune the parameters of the model; and evaluates the performance of those classifiers (along with some other heuristics for the problem) for the sets of polynomials in the testing dataset.

We describe these key steps in the pipeline below. Each of the numbered stages can be individually marked for execution or not in a run of the pipeline (avoiding duplication of existing computation). The code for this pipeline, written all in Python, is freely available at: https://doi.org/10.5281/zenodo.3731703.

(a) Measuring the CAD Runtimes: The CAD routine is run for each set of polynomials in the training dataset. The runtimes for all possible variable orderings are stored in a different file for each set of polynomials. If the runtime exceeds a pre-defined timeout, the value of the timeout is stored instead.

The training dataset is first converted to a format that is easier to process into features. For this purpose, we chose the format given by the terms() method from the Poly class located in the sympy package for symbolic computation in Python.

Here, each monomial is defined by a tuple, containing another tuple with the degrees of each variable, and a value defining the monomial coefficient. The All the data points in the training dataset are then collected into a single file called terms train.txt after being placed into this format. Subsequently, the file y train.txt is created storing the index of the variable ordering with the minimum computing times for each set of polynomials, using the runtimes measured in Step I(a).

(c) Feature Generation: Here each set of polynomials in the training dataset is processed into a fixed length sequence of floating point numbers, called features, which are the actual data used to train the ML models in sklearn. This is done with the following steps:

i. Raw feature generation

We systematically consider applying all meaningful combinations of the functions average, sign, maximum, and sum to polynomials with a given number of variables. This generates a large set of feature descriptions as proposed in [16] . The new format used to store the data described above allows for an easy evaluation of these features. An example of computing such features is given in Fig. 1 . In [16] we described how the method provides 1728 possible features for polynomials constructed with three variables for example. This step generates the full set of feature descriptions, saved in a file called features descriptions.txt, and the corresponding values of the features on the training dataset, saved in a file called features train raw.txt. 

After computing the numerical values of the features in Step I(c)i this step will remove those features that are constant or repetitive for the dataset in question, as described in [16] . The descriptions of the remaining features are saved in a new file called features descriptions final.txt.

The final set of features is computed by evaluating the descriptions in features descriptions final.txt for the training dataset. Even though these were already evaluated in Step I(c)i we repeat the evaluation for the final set of feature descriptions. This is to allow the possibility of users entering alternative features manually and skipping steps i and ii. As noted above, any of the named steps in the pipeline can be selected or skipped for execution in a given run. The final values of the features are saved in a new file called features train.txt.

i.

The pipeline can apply four of the most commonly used deterministic ML models (see [15] for details), using the implementations in sklearn [24] .

-The K-Nearest Neighbors (KNN) classifier -The Multi-Layer Perceptron (MLP) classifier -The Decision Tree (DT) classifier -The Support Vector Machine (SVM) classifier Of course, additional models in sklearn and its extensions could be included with relative ease. The pipeline can use two different methods for fitting the hyperparameters via a cross-validation procedure on the training set, as described in [17] :

-Standard cross-validation: maximizing the prediction accuracy (i.e. the number of times the model picks the optimum variable ordering). -Time-based cross-validation: minimizing the CAD runtime (i.e. the time taken to compute CADs with the model's choices). Both methods tune the hyperparameters with cross-validation using the routine RandomizedSearchCV from the sklearn package in Python (the latter an adapted version we wrote). The cross-validation results (i.e. choice of hyperparameters) are saved in a file hyperpar D** ** T** **.txt, where D** ** is the date and T** ** denotes the time when the file was generated.

ii. Fitting the parameters

The parameters of each model are subsequently fitted using the standard sklearn algorithms for each chosen set of hyperparameters. These are saved in a file called par D** ** T** **.txt.

The models in Step I are then evaluated according to their choices of variable orderings for the sets of polynomials in the testing dataset. The steps below are listed without detailed description as they are performed similarly to Step I for the testing dataset. (c) Predictions Using ML: Predictions on the testing dataset are generated using the model computed in

Step I(c). The model is run with the data in Step II(a)ii, and the predictions are stored in a file called y D** ** T** ** test.txt.

In our prior papers [15] [16] [17] we compared the performance of the ML models with the human-designed heuristics in [4] and [11] . For details on how these are applied see [15] . Their choices are saved in two files entitled y brown test.txt and y sotd test.txt, respectively.

(e) Comparative Results: Finally, in order to compare the performance of the proposed pipeline, we must measure the actual CAD runtimes on the testing dataset. The results of the comparison is saved in a file with the template: comparative results D** ** T** **.txt.

The pipeline above was developed for choosing the variable ordering for the CAD implementation in Maple's Regular Chains Library [8, 9] . But it could be used to pick the variable ordering for other procedures which take sets of polynomials as input by changing the calls to CAD in Steps I(a) and II(e) to that of another implementation/algorithm.

Step II(d) would also have to be edited to call an appropriate competing heuristic.

The pipeline described in the previous section makes it easy for us to repeat our past experiments (described in Sect. 2) for a new dataset. All that is needed to do is replace the files storing the polynomials and run the pipeline.

To demonstrate this we test the proposed pipeline on a new dataset of randomly generated polynomials. We are not suggesting that it is appropriate to test classifiers on random data: we simply mean to demonstrate the ease with which the experiments in [15] [16] [17] that originally took many man-hours can be repeated with just a single code execution.

The randomly generated parameters are: the degrees of the three variables in each polynomial term, the coefficient of each term, the number of terms in a polynomial and the number of polynomials in a set. The means and standard deviations of these parameters were extracted from the problems in the nlsat dataset 3 , which was used in our previous work [15] so that the dataset is of a comparable scale. We generated 7500 sets of random polynomials, where 5000 were used for training, and the remaining 2500 for testing. The results of the proposed processing pipeline, including the comparison with the existing human-made heuristics are given in Table 1 . The prediction time is the time taken for the classifier or heuristic to make its predictions for the problems in the training set. The total time adds to this the time for the actual CAD computations using the suggested orderings. We do not report the training time of the ML as this is a cost paid only once in advance. The virtual solvers are those which always make the best/worst choice for a problem (in zero prediction time) and are useful to show the range of possible outcomes. We note that further details on our experimental methodology are given in [15] [16] [17] .

As with the tests on the original dataset [15, 16] the ML classifiers outperformed the human made heuristics, but for this dataset the difference compared to the Brown heuristic was marginal. We used a lower CAD timeout which may benefit the Brown heuristic as past analysis shows that when it makes suboptimal choices these tend to be much worse. We also note that the relative performance of the Brown heuristic fell significantly when used on problems with more than three variables in [17] . The results for the sotd heuristic are bad because it had a particularly long prediction time on this random dataset. We note that there is scope to parallelize sotd which may make it more competitive.

We presented our software pipeline for training and testing ML classifiers that select the variable ordering to use for CAD, and described the results of an experiment applying it to a new dataset.

The purpose of the experiment in Sect. 4 was to demonstrate that the pipeline can easily train classifiers that are competitive on a new dataset with almost no additional human effort, at least for a dataset of a similar scale (we note that the code is designed to work on higher degree polynomials but has only been tested on datasets of 3 and 4 variables so far). The pipeline makes it possible for a user to easily tune the CAD variable ordering choice classifiers to their particular application area.

Further, with only a little modification, as noted at the end of Sect. 3, the pipeline could be used to select the variable ordering for alternative algorithms that act on sets of polynomials and require a variable ordering. We thus expect the pipeline to be a useful basis for future research and plan to experiment with its use on such alternative algorithms in the near future.

DeepMathdeep sequence models for premise selection

Identifying the parametric occurrence of multiple steady states for some biological networks

Optimising problem formulation for cylindrical algebraic decomposition

Companion to the tutorial: cylindrical algebraic decomposition

The complexity of quantifier elimination and cylindrical algebraic decomposition

Understanding expression simplification

Quantifier Elimination and Cylindrical Algebraic Decomposition. Texts & Monographs in Symbolic Computation

Computing cylindrical algebraic decomposition via triangular decomposition

Quantifier elimination by cylindrical algebraic decomposition based on regular chains

Quantifier elimination for real closed fields by cylindrical algebraic decompostion

Efficient projection orders for CAD

Machine learning for mathematical software

Cylindrical algebraic decomposition with equational constraints

Choosing a variable ordering for truth-table invariant cylindrical algebraic decomposition by incremental triangular decomposition

Comparing machine learning models to choose the variable ordering for cylindrical algebraic decomposition

Algorithmically generating new algebraic features of polynomial systems for machine learning

Improved cross-validation for classifiers that make algorithmic choices to minimise runtime without compromising output correctness

Solving the conjugacy decision problem via machine learning

Using machine learning to improve cylindrical algebraic decomposition

Applying machine learning to the problem of choosing a heuristic to select the variable ordering for cylindrical algebraic decomposition

Code optimization in FORM

Learning rate based branching heuristic for SAT solvers

TheoryGuru: a mathematica package to apply quantifier elimination technology to economics

Scikit-learn: machine learning in Python

New domains for applied quantifier elimination

Acknowledgements. This work is funded by EPSRC Project EP/R019622/1: Embedding Machine Learning within Quantifier Elimination Procedures. We thank the anonymous referees for their comments.