An algorithm for discovering Lagrangians automatically from data


Submitted 7 August 2015
Accepted 20 October 2015
Published 4 November 2015

Corresponding author
Jonathan J. Hudson,
jony.hudson@imperial.ac.uk

Academic editor
Jingbo Wang

Additional Information and
Declarations can be found on
page 15

DOI 10.7717/peerj-cs.31

Copyright
2015 Hills et al.

Distributed under
Creative Commons CC-BY 4.0

OPEN ACCESS

An algorithm for discovering
Lagrangians automatically from data
Daniel J.A. Hills, Adrian M. Grütter and Jonathan J. Hudson

Department of Physics, Imperial College of Science, Technology and Medicine, London, United
Kingdom

ABSTRACT
An activity fundamental to science is building mathematical models. These models
are used to both predict the results of future experiments and gain insight into the
structure of the system under study. We present an algorithm that automates the
model building process in a scientifically principled way. The algorithm can take ob-
served trajectories from a wide variety of mechanical systems and, without any other
prior knowledge or tuning of parameters, predict the future evolution of the system.
It does this by applying the principle of least action and searching for the simplest
Lagrangian that describes the system’s behaviour. By generating this Lagrangian in a
human interpretable form, it can also provide insight into the workings of the system.

Subjects Artificial Intelligence, Data Mining and Machine Learning, Scientific Computing and
Simulation
Keywords Lagrangian, Physics, Least-action, Discovery

INTRODUCTION
Modern science is, in many senses, highly automated. Experiments are frequently run

under computer control, with data often recorded by the computer directly. Computerised

data analysis and visualisation are widely used to process the resulting large volumes of

data. Indeed, the ability to collect and analyse massive data sets is opening up an entirely

new measure-first-ask-questions-later approach to science: the Square Kilometer Array

radio telescope is expected to collect approximately one exabyte of data per day (Newman

& Tseng, 0000); over 1014 collisions from the ATLAS detector were analysed in the search

for the Higgs boson (ATLAS collaboration, 2012); and state-of-the-art whole-genome

sequencers can currently sequence 600 gigabases per day (Hayden, 2014). In each of these

examples the scientific questions are not fully formulated in advance of taking the data,

and the question of how to best extract knowledge from the dataset is of great interest.

This motivates the study of how to scale up the processes of scientific reasoning to take

advantage of the wealth of available data.

Thus far, scientific reasoning has largely resisted automation. Hypothesising and

refining models is still on the whole carried out by humans, with little direct support

from computers. It has long been a desire of artificial intelligence researchers to automate

this part of science, and with the growing volume of data available from experiments the

motivation for this desire comes ever more sharply into focus. In this paper we present

a step in this direction: an algorithm that automates finding a mathematical model for a

system, in a scientifically principled way, by examining only its observed behaviour.

How to cite this article Hills et al. (2015), An algorithm for discovering Lagrangians automatically from data. PeerJ Comput. Sci. 1:e31;
DOI 10.7717/peerj-cs.31

mailto:jony.hudson@imperial.ac.uk
https://peerj.com/academic-boards/editors/
https://peerj.com/academic-boards/editors/
http://dx.doi.org/10.7717/peerj-cs.31
http://dx.doi.org/10.7717/peerj-cs.31
http://creativecommons.org/licenses/by/4.0/
http://creativecommons.org/licenses/by/4.0/
https://peerj.com/computer-science/
http://dx.doi.org/10.7717/peerj-cs.31


Early attempts to automatically model physical systems searched for simple math-

ematical regularities in observed quantities. Langley’s (1979) BACON system was able

to re-discover many simple laws—the ideal gas law, Ohm’s law, Coulomb’s law and

others—from experimental data. Dzeroski & Todorovski (1993) went beyond simple static

laws with their LAGRANGE system which was able to search for differential equations that

governed observed time series. They extended this work to the LAGRAMGE system which

additionally allowed an expert user to provide domain knowledge, improving the quality

of the results (Todorovski & Dzeroski, 1997). The PRET system, developed by Bradley and

collaborators (2001), brings to bear a variety of advanced AI techniques on the problem

of identifying system differential equations. It has a sophisticated method for representing

qualitative observations, and allows expert-user domain knowledge to be combined with

automatic search very effectively. Schmidt & Lipson (2009) used a genetic programming

approach to automatically evolve invariant mathematical expressions from observed data.1

1 Note that Schmidt and Lipson originally
claim that their technique is capable
of discovering Lagrangians, but it has
been shown that this is false except for
Lagrangians of a very particular, trivial
form (Hillar & Sommer, 2012). Schmidt
& Lipson (2010) do not include their
claim in a subsequent paper on the same
work.

In the context of engineering, there is a significant body of work on ‘system identification’,

with techniques ranging from very general ad hoc fitting methods to fitting detailed

physical models representing important classes of system (Sjberg et al., 1995; Ljung, 2010).

In this work we take a different approach than those described above, the essence

of which is that we embed a simple, general physical principle—the principle of least

action—and very little else into our algorithm. While we are embedding the domain

knowledge of a physicist in our algorithm, we are not embedding information about any

particular physical system or class thereof. Rather we are capturing a deep understanding

that has been distilled by physicists over the past 270 years, and packaging it into an

algorithm that can be applied by non-experts. We find the algorithm to be surprisingly

powerful, given its simplicity, but this power comes not from the ingenuity of its

construction, rather from the broad applicability of the physical principle embedded in it.

THE PRINCIPLE OF LEAST ACTION
The principle of least action is one of the most fundamental and most celebrated principles

in physics. First proposed by Maupertuis (1744) and Euler (1744) it states that the problem

of predicting the behaviour of many physical systems can be cast as finding the behaviour

that minimises the expenditure given by some cost function. The total expenditure of the

system is known as the action. It is a remarkable fact that the behaviour of a very wide range

of physical systems—including those studied in classical mechanics, special and general

relativity, quantum field theory, and optics—can all have their behaviour explained in

terms of minimising a cost function.

Each physical system has its own cost function, and once this function is known

it is possible to predict exactly what the system will do in the future. The exercise

of determining the cost function—often known as Lagrange’s function, or just the

Lagrangian—for a particular physical system is central to physics. Feynman described

this process well (Feynman, Leighton & Sands, 1963) as “some kind of trial and error”

advising students that “You just have to fiddle around with the equations that you know

and see if you can get them into the form of the principle of least action.” In this paper we

Hills et al. (2015), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.31 2/17

https://peerj.com/computer-science/
http://dx.doi.org/10.7717/peerj-cs.31


present an algorithm that does this “fiddling” automatically, without requiring the user to

have any expertise in physics.

The Lagrangian is well-suited to be the output of an automated modelling algorithm. It

has the desirable property in that it is a single, scalar expression that contains everything

necessary to predict the system’s future evolution. Consider, in contrast, finding the

Hamiltonian where it would also be necessary to find the corresponding conjugate

momenta. The Lagrangian has the additional quality that it is coordinate-independent

and as a result can be written in any coordinate system. This is useful in the case of an

automated algorithm where it is not obvious in which coordinate system the data might be

presented. However, it should be noted that not all physical systems can be described by a

Lagrangian. In particular, dissipative systems can not be modelled this way. Nevertheless,

many interesting processes do admit a Lagrangian formulation, and what’s more, as the

algorithm is automatic little is lost by speculatively applying it to a system’s trajectory. We

might imagine a future where an ensemble of algorithms such as this one try to find an

appropriate model for a system, based on a variety of physical principles and insights. We

present this algorithm as a step towards such a future system.

The problem of taking a Lagrangian and automatically calculating the resulting motion

of the system has been widely studied and applied. To the best of our knowledge, this work

is the first that solves the inverse problem of taking the observed motion and calculating the

Lagrangian for non-trivial systems.

THE ALGORITHM
To find a model for a system we search over a space of possible Lagrangians. To do this we

need three elements: an objective to guide the search, which will take the form of a score

function; a representation of the possible Lagrangians; and an algorithm to execute the

search over the possible Lagrangians, working to improve the score. We will first describe

the score function, which is the central idea of the algorithm.

Score function
The objective of our algorithm is to find a Lagrangian that, when integrated along the

system’s observed trajectory, yields a smaller total (action) than when integrated along

any neighbouring trajectory. It would be possible to implement this definition of the least

action principle directly in an algorithm, but instead we take an indirect approach that

is more computationally efficient. For a Lagrangian L(θ,φ,...,θ̇ ,φ̇,...) and a trajectory
(θ (t),φ(t),...,θ̇ (t),φ̇(t),...) it is possible to write down a condition, in the form of a set

of differential equations, that must be satisfied if the action is to be stationary along the

trajectory. These differential equations are known as Euler–Lagrange’s equations,

d

dt

∂ L
∂q̇

−
∂ L
∂q

= 0 ,

where q ∈ {θ,φ,...}. It is to be understood that the partial derivatives are taken

symbolically with respect to the coordinates and velocities, which are then replaced with

the time-dependent functions from the trajectory before the time-derivative is taken. We

Hills et al. (2015), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.31 3/17

https://peerj.com/computer-science/
http://dx.doi.org/10.7717/peerj-cs.31


can define a score function based on these conditions,

EL(L) =


q∈{θ,φ,...}

 
d

dt

∂ L
∂q̇

−
∂ L
∂q

2
dt, (1)

which is zero if the Euler–Lagrange equations are exactly satisfied. We note that Hillar and

Sommer first proposed using a (different) score function derived from the Euler–Lagrange

equations in Hillar & Sommer (2012), but did not apply it to finding Lagrangians from

data.

In practice our observations of the system are not functions (θ (t),φ(t),...,θ̇ (t),φ̇(t),...)

but discretely sampled time-series of the coordinates and generalized velocities. The

algorithm operates with a dataset which is a time-series of samples

D = ((θ (1),φ(1),...,θ̇ (1),φ̇(1),...),...).

where time runs from t = 1...N . The velocity samples may be either directly measured

or derived from measured coordinate data. We divide this time-series into two portions,

a training set, comprising samples 1...M, and a validation set of samples M + 1...N . The

algorithm will conduct its search using only the training set, reserving the validation set

for out-of-sample measurement of the prediction error. In this way we can truly test the

algorithm’s ability to predict the future dynamics of the system. In all of the examples in

this paper the sampling times will be evenly spaced, but this is not a requirement. We can

discretize the Euler–Lagrange score function (Eq. (1)) to work with these sampled datasets,

giving

ELD(L) =
M

t=1


q∈{θ,φ,...}


d

dt

∂ L
∂q̇


t
−


∂ L
∂q


t

2
, (2)

where the subscript on the score indicates that it is taken with respect to the dataset D. The

square-bracketed quantities in this expression are time-series, and the subscript indicates

taking the element in this time-series at the given time. So, for instance, the first term in

(2) is to be calculated, in principle, by: first differentiating the candidate Lagrangian L
symbolically with respect to the appropriate generalized velocity; evaluating this quantity

at every time-step in the dataset to yield a new time-series; taking the discrete derivative

of this new time-series with respect to time; and finally finding the element at time t in

this time-derivative time-series. In practice, as we shall see below, a more computationally

efficient implementation may be used.

The function ELD is the basis of the score function, capturing the principle of least

action, but it is not sufficient on its own. While it is true that the Lagrangian we seek

minimises ELD, the converse is not true as there are other functions which minimise ELD
but are not physically meaningful Lagrangians. The first class of functions that we wish to

avoid are those which are numerically tiny, for instance L = 10−100θ . We deal with these by

Hills et al. (2015), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.31 4/17

https://peerj.com/computer-science/
http://dx.doi.org/10.7717/peerj-cs.31


introducing a normalisation score for each candidate Lagrangian,

ND(L) =
M

t=1


q∈{θ,φ,...}


d

dt

∂ L
∂q̇

2
t
+


∂ L
∂q

2
t
.

We will compose our final score from the scores ELD and ND in such a way, to be detailed

below, that to score well a candidate Lagrangian must simultaneously have a low score

for ELD and a score of around one for ND. The target value of one for ND is chosen

arbitrarily. We can always arrange for the normalisation score to be approximately one,

as the least-action trajectory is unchanged if the Lagrangian is scaled by a constant.

There is a second, more interesting class, of unwanted expressions that minimise the

Euler–Lagrange score ELD. Consider, for instance, the candidate Lagrangian L = θ nθ̇ .
This Lagrangian satisfies the Euler–Lagrange equations trivially, in a way that does not

depend on the trajectory. Such path-independent least-action Lagrangians are interesting

from a physics point-of-view, being closely related to gauge invariance, but here they are a

nuisance. To guide the search away from these expressions we introduce a second ‘control’

trajectory, C. This trajectory is unrelated to the behaviour of the system under study and

serves solely to eliminate path-independent Lagrangians. We reason that the Lagrangian

that we are seeking will score well with ELD but should score poorly on ELC, which is the

Euler–Lagrange score evaluated along the control trajectory. The exact form of the control

trajectory is unimportant so long as it not a valid trajectory of the system under study. In

this work we use a control trajectory which is uniform motion in each coordinate, with

velocity arbitrarily chosen to be 0.1, for all experiments.

We combine the three parts described above to give the search score function,

S(L) = U (ND(L))U (ELC(L))
ELD(L) + ϵ
ELC(L) + ϵ

, (3)

where U(x) = ln(x + ϵ)2 + 1 is a function that is minimised, with value approximately one,

when the argument is one. The small constant ϵ, typically set to be 10−10, ensures that the

score function has the desired asymptotic behaviour for small values of the numerator and

denominator, even when faced with errors from finite precision machine numbers. The

factor U (ELC(L)) prevents the search algorithm from driving towards Lagrangians that
perform badly on the real dataset, but even worse on the control data. Overall, the score

function drives the search to find Lagrangians that simultaneously minimise the action

along the observed trajectory while having a non-zero action along the control trajectory,

and a normalisation score close to one. We note that the way that the score function is

assembled is somewhat ad hoc. Its purpose is to guide the search to the correct answer while

avoiding pathologies, and there are a number of ways this could be done—indeed, many

were tried during development. This score function is simply presented as a particular

arrangement that we have demonstrated to work.

Note that the score function, S, does not in any way consider whether the prediction

of the candidate Lagrangian agrees with the training data. It only considers whether the

trajectory satisfies a least action principle for the candidate Lagrangian. The fact that this,

Hills et al. (2015), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.31 5/17

https://peerj.com/computer-science/
http://dx.doi.org/10.7717/peerj-cs.31


on the face of it unrelated, objective leads to successful predictions is the insight from

physics that we have embedded in our algorithm.

Representation and search
We have experimented with two representations of candidate Lagrangians. The first, a

restricted polynomial representation, allows a fast search algorithm to be implemented. It

is limited in the Lagrangians it can represent exactly, although through Taylor’s theorem

it can find approximations to any Lagrangian. This representation was used to generate

the bulk of the results in this paper, and we describe it in detail in this section. The second

representation lifts some of the constraints of the restricted polynomial model, at the

expense of vastly increased computational cost. We describe it in ‘Generalisation’.

The restricted polynomial representation assumes that the Lagrangian can be

represented by a polynomial in the coordinates and velocities. The model is a sum of

monomial terms, parameterised by coefficients multiplying every term. We restrict this

polynomial in two ways: we limit the maximum power of any coordinate or velocity to be

m; and we limit the maximum degree of any combination of coordinates and velocities to

p. In addition, we remove any terms from the model that can have no physical significance,

that is terms that are constant or of the form qnq̇. These terms simply “fall through” the

Euler–Lagrange equation without changing the resulting equations of motion, so there is

no value in including them in the search. For example, for one variable θ with m = 3 and

p = 4 the resulting model would be

c1θ̇
2
+ c2θ̇

3
+ c3θ + c4θ θ̇

2
+ c5θ θ̇

3
+ c6θ

2
+ c7θ

2
θ̇

2
+ c8θ

3
.

For a given restricted polynomial model the score function is minimised by adjusting

the parameters ci. We conduct this optimisation using the Nelder–Mead simplex

algorithm (Nelder & Mead, 1965) using the modified parameters of Gao & Han (2012)

which improve the efficiency in high dimensions. The coefficients are bounded between

−1 and 1, enforced by a penalty function. We use a tight convergence tolerance, usually one

part in 1010, to encourage the search to break out of local minima. We impose a maximum

iteration limit, usually 5 × 106, on the search to ensure that it is bounded in time.

We do not know in advance what values of m and p are needed to accurately represent

the Lagrangian of the system under study. What’s more, we wish to find the simplest

Lagrangian such that the trajectory satisfies the principle of least action. We approach

this using a simple heuristic algorithm. We start with the smallest non-trivial model

(m = 2,p = 2) and optimise the parameters with the simplex search. We then make an

in-sample prediction of how well the optimised Lagrangian predicts the dynamics of the

system in the training sample. This is done by generating equations of motion from the

optimised Lagrangian and numerically solving them, using initial conditions derived

from the first sample in the dataset. If this in-sample prediction fits better than a specified

tolerance then we stop and return the Lagrangian. If it does not fit then we generate a

larger model (i.e., with larger values of m and/or p) and try again. The models are stepped

through in increasing number of monomial terms. This proceeds until either a model

Hills et al. (2015), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.31 6/17

https://peerj.com/computer-science/
http://dx.doi.org/10.7717/peerj-cs.31


Figure 1 Sketches of the five test systems that we consider.

is found that fits or a maximum bound on model complexity is reached. This heuristic

algorithm only crudely captures the notion of mathematical complexity of the model, but

it seems to work adequately well.

Note that, for a given polynomial model, it is possible to partially pre-calculate the score

function ELD for a given dataset, yielding a function quadratic in the coefficients ci. This is

possible because the form of the model is fixed and it is possible to calculate its derivatives

in advance. As a result, after the initial simplification of the score function, optimisation

iterations are fast, and have a run-time independent of the number of data points.

Code for the score function, search algorithms and the datasets we use below can be

downloaded from Hudson, Hills & Grütter (a).

RESULTS
We will consider five test systems, illustrated in Fig. 1. The first is the unforced Duffing

oscillator, a textbook non-linear system. The second, a simple pendulum, is interesting

because its Lagrangian cannot be represented exactly in the restricted polynomial

representation. The third system, two masses on a frictionless surface joined by three

springs to each other and two immovable walls, has two coupled degrees of freedom.

The fourth system is the double pendulum, a coupled, two degree-of-freedom non-linear

system capable of chaotic motion. As with the simple pendulum, the double pendulum

cannot have its Lagrangian represented exactly by a finite degree polynomial. The fifth and

final system is the Penning-type ion-trap, a three degree-of-freedom system with magnetic

and electrostatic forces, that is of considerable experimental relevance.

Figure 2 shows the result of applying the algorithm to simulated data sets for these

systems. It can be seen that the algorithm is able to successfully predict the future dynamics

of all of the test systems. Let us look in detail at the progress of the algorithm, and the

resulting learned models, for two of the example systems.

In the case of the Duffing oscillator the algorithm tried seven, increasingly complex,

polynomial models to arrive at the prediction shown, which was generated by the model

with m = 4 and p = 4. The final model has 10 free parameters, and required 2,160

Nelder–Mead iterations to optimise. The complete search working through all seven

models, with a single-threaded implementation, executes in under five seconds on a

2012 2.0GHz Intel Core i7-3667U powered MacBook Air. The optimised Lagrangian,

where we have removed terms with coefficients less than 10−5 and displayed the remaining

Hills et al. (2015), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.31 7/17

https://peerj.com/computer-science/
http://dx.doi.org/10.7717/peerj-cs.31


Figure 2 Result of running the algorithm on simulated data from the five test systems. In each graph
panel, the (blue) open circles to the left of the vertical bar are the training data. The solid (red) line, to the
right of the vertical bar is the algorithm’s prediction. The (red) filled circles to the right of the bar show
the actual behaviour of the system. For clarity only every third validation point is shown. The algorithm
does not have access to these validation points when it is making its prediction. It can be seen that the
algorithm has accurately learned the dynamics of the system in each case.

coefficients to two decimal places for clarity, was

L = −0.30x2 + 0.14x4 + 0.20ẋ2.

This is exactly the expression, apart perhaps from overall scaling, that would be written by

a human physicist. The coefficients yielded by the search are found to match the correct

Hills et al. (2015), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.31 8/17

https://peerj.com/computer-science/
http://dx.doi.org/10.7717/peerj-cs.31


Figure 3 Application of the algorithm to data with simulated noise added. The graphs are in the same
format as Fig. 2. We see that the algorithm is robust to noise, finding a model that accurately predicts the
future evolution.

coefficients to the 6th decimal place, limited by the convergence tolerance that we set. By

generating a model in this form the algorithm gives insight into the system directly from

the data.

The case of the simple pendulum is also interesting to consider. Here the search

algorithm tried three models, where the third, with m = 2 and p = 4, converged in 480

Nelder–Mead iterations. The search in this case took around 0.6 s. The generated model,

multiplied by 100 to make it more readable, was

L = 0.049x + 8.6x2 − 3.0ẋ2 − 0.0030xẋ2 − 0.41x2ẋ2. (4)

It can be noted that this is not a straightforward Taylor expansion of the simple pendulum’s

Lagrangian, and it is not obvious how to relate it to the standard form. Experimenting with

removing terms and solving the resultant equations of motion indicates that the terms

proportional to x and xẋ2 are unimportant, but the relatively small term in x2ẋ2 is essential.

Despite being in an unexpected form, this Lagrangian does make successful predictions.

We shall see in ‘Generalisation’ that it is in fact a local approximation of a true Lagrangian

around the region of configuration space that the training trajectory explored.

Real world measurements are inevitably noisy and so to be practically useful it is

important that the algorithm is able to converge even in the presence of imperfections

in the data. We took the data for our third test system—the coupled harmonic oscillators—

and added normally distributed noise, with standard deviation 0.1 (about 5% of the

oscillation amplitude) to the position, velocity, and acceleration. Figure 3 shows the result

of running the algorithm on this noisy data set. We see that the algorithm is robust to this

noise, finding a model that describes the future evolution well.

Hills et al. (2015), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.31 9/17

https://peerj.com/computer-science/
http://dx.doi.org/10.7717/peerj-cs.31


Table 1 Summary of models found for each of the test systems and the computational effort required
to find them. “Total iterations” gives the number of Nelder–Mead iterations used searching through all
forms of the model, including the final form. “Final iterations” gives the number of Nelder–Mead itera-
tions used refining the final form of the model. The “Generalised simple pendulum” will be introduced
in ‘Generalisation’.

System (m,p) Parameters Total iterations Final iterations Time to converge
(s)

Duffing oscillator (4,4) 10 5,340 2,160 4.5

Simple pendulum (2,4) 5 940 580 0.5

Coupled harmonic oscillator (2,2) 10 1,740 1,740 1.2

Double pendulum (2,4) 43 400,970 269,470 460

Penning trap (2,2) 21 5,650 5,650 6.2

Oscillator with noise (2,2) 10 1,240 1,240 0.7

Generalised simple pendulum (4,4) 10 3,270 8,450 7.1

Table 1 summarises the size of the models found and the computational effort required

to find them. We have not conducted a detailed study of how this proof-of-principle

algorithm scales as the problem size increases, but note that our preliminary investigations

show that it scales quite poorly. We find that the primary determinant of convergence rate

seems to be the number of free parameters in the model. We therefore speculate that a

more sophisticated technique for varying the structure of the model might be of use in

improving the performance. A particularly promising approach might be to follow (Clegg

et al., 2005) and use a genetic algorithm to evolve the form of the model in combination

with a continuous algorithm like Nelder–Mead to optimise the parameter values for any

given model form.

GENERALISATION
We have shown that the algorithm can find models which successfully predict the future

evolution of the system’s behaviour. However, a good physical model does not just capture

the behaviour of a particular time-series, corresponding to a particular set of initial

conditions. Rather, it should be able to predict the behaviour of the system over a range

of initial conditions. It is perhaps this ability to generalise that sets a true physical model

apart from a mere fit or interpolation of the system’s behaviour. It is interesting, therefore,

to study whether the models found by our algorithm have this property.

We have seen in the case of the Duffing oscillator that the discovered model is indeed the

correct model, and we would expect that this model will correctly predict the dynamics of

the system for any initial conditions. We test this by simulating the behaviour of the system

for a wide range of initial conditions, and comparing the results to the predictions of the

model. We find, as expected, that the learned model for the Duffing oscillator does make

correct predictions for all initial conditions.

Applying this procedure to the other test systems we find that the coupled harmonic

oscillators and the Penning-type ion trap models also generalise well, making successful

predictions for all initial conditions. This indicates that our algorithm is not merely a

Hills et al. (2015), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.31 10/17

https://peerj.com/computer-science/
http://dx.doi.org/10.7717/peerj-cs.31


Figure 4 Predictions for different initial conditions of the learned simple pendulum model (red
dashed line) compared to the true behaviour (blue solid line). The amplitude of the pendulum swing
varies between panels. The model was trained at the amplitude shown in (F). We see that the model makes
good predictions for the initial conditions it was trained on, but breaks down for other initial conditions.

sophisticated curve fitting routine, but rather is finding the underlying physical truth

behind the system dynamics to make its predictions.

The pendulum and double pendulum models do not generalise well, as we might have

anticipated from the form of the Lagrangian in Eq. (4). Figure 4 compares the prediction

of the learned simple pendulum model against the true behaviour, for a variety of swing

amplitudes. We see that while the prediction is accurate for the amplitude at which

the model was trained, it deviates at other amplitudes. These results are perhaps to be

expected, and could well be the same as generated by a human physicist given the same

data. The algorithm has found a mathematically simple approximation that works well

for the data it has available to it, but does not have enough to go on to determine the true

underlying model.

We consider two approaches to generating models that generalise better for these

systems, inspired by the approaches a human physicist might take. The first method

is simply to train the models with more data, corresponding to a wider range of initial

conditions. The second is to introduce new mathematical constructs which allow a simpler

model to be found, reasoning that this model is more likely to generalise well.

For the first approach we follow exactly the same procedure as before except we generate

a number of trajectories, corresponding to a range of initial conditions, and use a score

that is the sum of the scores for the individual trajectories. We applied this procedure to

the simple pendulum system. The resulting search takes approximately 15 times longer

to converge than the single-trajectory search. We find that the algorithm is unable to

converge on an m = 2, p = 4 model, as it did before, and has to continue its search until

it finds an m = 4, p = 4 model whose predictions fit all of the trajectories adequately.

Figure 5 compares the predictive ability of this model with the ‘single-trajectory’ model

of the previous section. We see that, as shown in Fig. 4, the single-trajectory model makes

Hills et al. (2015), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.31 11/17

https://peerj.com/computer-science/
http://dx.doi.org/10.7717/peerj-cs.31


Figure 5 Comparing a model for the simple pendulum trained on multiple trajectories with one
trained on a single trajectory. The curves show the squared error between the model’s prediction and
the true behaviour, as a function of the pendulum’s swing amplitude. The dashed (blue) curve shows
the result for a model trained at a single swing amplitude, indicated by the heavy (blue) arrow. This
model performs well at the amplitude it was trained at, but poorly at other amplitudes. The model
corresponding to the solid (red) curve was trained with multiple trajectories, indicated by the other
(red) arrows. The original trajectory was also included in the training set for this model. We see that
the ‘multi-trajectory’ model makes better predictions across a wide range of initial conditions, including
conditions that it was not trained on. The multi-trajectory model is able to make successful predictions
up to surprisingly large amplitudes, well beyond those it has seen in training.

good predictions for the initial condition it was trained at, but makes poor predictions

for other initial conditions. The ‘multi-trajectory’ model, though, is much improved. It

makes good predictions at all of the initial conditions it was trained at, and further makes

good predictions at other, unseen initial conditions as well. We have found similar results

for the double-pendulum system, although the computational expense of the problem

constrained the experiment to a limited region of initial-condition-space.

Our second approach to generalisation is to expand the representation of the

Lagrangians to encompass a wider range of mathematical expressions. We reason that, with

a wider palette of mathematics at its disposal, the algorithm may be able to find a model of

simpler form that works well. History has shown, although this may be tautological, that

often systems of interest to physicists can be described by remarkably simple mathematical

models. We hope that by allowing the algorithm to generate structurally simpler models, it

may be more likely to discover the underlying physical truth.

We have developed a proof-of-principle implementation of a richer representation,

and a corresponding search algorithm, detailed in the Appendix . Briefly, we take a genetic

programming approach (Koza, 1992) and compose mathematical expressions as trees with

leaf-nodes corresponding to the system variables, simple functions (sine, cosine, square) of

these variables, and numerical constants. Branch-nodes of the tree are arithmetic operators

+,−,×. This structure can represent a much wider range of mathematical forms than

our polynomial representation. We search over this tree-structured representation using

Hills et al. (2015), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.31 12/17

https://peerj.com/computer-science/
http://dx.doi.org/10.7717/peerj-cs.31


an algorithm (Zitzler, Laumanns & Thiele, 2001) that simultaneously tries to optimise the

score and minimise the size of the trees. Thus, this search algorithm tries explicitly to find

simple expressions that score well on the data.

Repeating the search on large-amplitude (±0.95π ) simple pendulum data using the

tree-based representation highlights the relative strength of this approach. The generated

model, which makes a successful prediction, is

L = 0.25θ̇ 2 + 2.0cos(θ ),

the same as would be written by a human physicist. Naturally, this model makes correct

predictions over the full range of initial conditions. There are two reasons that the

tree-based expression search is able to converge on this model. First it is only because the

representation of possible models is richer that this model can be directly represented at all.

Second, the notion of mathematical complexity in this representation more closely models

that of a human physicist. This allows the search algorithm to do more work driving the

result towards an expression that we recognize as canonical. It must be cautioned, though,

that this is only a proof-of-principle demonstration. To reach this result we had to bias the

search algorithm, as described in the Appendix , and even then the run time is significantly

longer, often taking many hours with a multi-threaded implementation on the hardware

described above. We were not able to get results for the double pendulum system at all

with the computing resources at our disposal. Nonetheless, we present this result as the

technique shows potential for learning models that are both better able to generalise, and in

a format more suitable for communication to human physicists.

CONCLUSION
We have demonstrated an algorithm that can predict the future dynamics of a physical

system directly from observed data. We have shown that the algorithm generates models

that can be communicated to a human physicist, sometimes even finding models in

textbook form. We have further shown that the models generated generalise well to unseen

data, and are not merely fits or interpolations, but are truly capturing the physical essence

of the system under study.

One might ask what the use of such an algorithm is. As a first point, we find the question

of whether a computer can do science to be fascinating in itself. Investigating the limits

of a computer’s ability in this regard educates us as to the strengths and weaknesses of

our current scientific processes, and invites us to consider a different perspective on our

scientific work.

But perhaps a more practical answer is that tools such as this could assist humans in

their work. We see this assistance as coming in two forms. The first is simply automating

the actions of a scientist so they can be applied to more data. Techniques that can be

automatically applied to datasets, scanning for scientifically interesting features—in the

case of the algorithm in this paper, for example, finding that there is a least action principle

at work—may come to be a fruitful approach to generating unexpected scientific leads as

we head into a data-dominated era. The second is opening up the techniques of science to

Hills et al. (2015), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.31 13/17

https://peerj.com/computer-science/
http://dx.doi.org/10.7717/peerj-cs.31


Figure 6 An expression tree representing θ 2 − 3cos(θ̇ ).

non-specialists. By capturing the idea of searching for least action models in an algorithm

we make it available to anyone, including those without the necessary skills to do it by

hand. By way of analogy, it is interesting to consider popular online natural language

translation software. While no-one would consider these tools suitable for translating

poetry, they nonetheless are exceedingly useful to many people in the common case where

a ‘good-enough’ translation will do. While we do not imagine computers will replace

expert human physicists in the near term, we envisage the availability of tools to automate

scientific reasoning will empower non-specialists to take better advantage of the discoveries

and insights of physics.

ACKNOWLEDGEMENTS
We acknowledge many useful discussions with Jeremy Mabbitt, Mike Tarbutt, Jack Devlin,

Iain Barr, Ben Sauer, and Ed Hinds. DJAH and AMG contributed equally to this work.

APPENDIX: TREE-BASED REPRESENTATION AND
SEARCH
Here we describe in detail the tree-structured symbolic representation of mathematical

expressions and the corresponding search routine. The expressions are built following

a grammar designed to bias the search towards expressions that might be Lagrangians

for simple physical systems. Each terminal of these expression trees is one of: numerical

constants, randomly generated; the coordinate variables and velocities; the squares of the

coordinates and velocities; and for coordinates which represent angles, the sine and cosine

of the coordinates and velocities. The non-terminal nodes of the trees are the operators

+,−,×. An example expression tree is shown in Fig. 6, representing the expression

θ 2 − 3cos(θ̇ ).

To search through these tree-structured expressions we take a genetic programming

approach (Koza, 1992), explicitly optimising both the least-action score Eq. (3) and also

a complexity score, using the SPEA2 multi-objective optimisation algorithm (Zitzler,

Laumanns & Thiele, 2001). This biologically-inspired evolutionary algorithm maintains a

population of candidate expressions and breeds, reproduces, and mutates them to try to

simultaneously optimise the least-action and complexity scores.

Hills et al. (2015), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.31 14/17

https://peerj.com/computer-science/
http://dx.doi.org/10.7717/peerj-cs.31


In detail, we first construct a population of randomly generated expressions, usually

numbering 100. We score these expressions using the least-action score and also assign a

complexity score which is simply the number of nodes in the expression tree. The SPEA2

algorithm takes the current population, and an initially empty set of elite expressions,

representing the best that have been seen so far. It has a rather complex selection

mechanism (Zitzler, Laumanns & Thiele, 2001) that produces a new set of elite expressions,

plus a set of expressions, the breeding pool, which are candidates for reproduction. A

new generation is created from the breeding pool by mutation (10%) and pair-wise

crossover (90%). Mutation is effected by replacing a randomly chosen subtree of the

given expression with a randomly generated subtree. The crossover operation takes two

expressions, selects a random point in each of the two trees, and swaps the sub-trees at

these points to generate two new expressions. The evolutionary process is repeated starting

from this new generation, and we iterate for a large number of generations, typically many

thousand. To improve the convergence speed of the numeric constants in the expressions

we also incorporate a small amount of hill-descent into each evolutionary iteration: a

subset (20%) of the expressions have their numeric constants randomly adjusted by a small

amount, and if this improves their least-action score, the modification is kept. We also

impose a maximum size of expression (50 nodes) and trim expressions that exceed it each

generation to ensure that the run-time is bounded. The final elite set is a set of expressions

that represent the trade-off between least-action score and complexity. We select from this

set the simplest expression that has a least-action error below a specified threshold.

In this tree-based method, the structure of the candidate Lagrangians varies during the

search, so it is not possible to partially pre-calculate the score function, as it was in the

restricted polynomial technique. Rather it must be calculated in full for each expression

in the population. Further, the search space of possible expressions is exceedingly large,

and the score is not very smooth with respect to the genetic operations. As a result, the

search must run for many generations and is extremely computationally expensive. A

naı̈ve implementation might calculate the partial derivative time-series in (Eq. (3)) by

symbolically differentiating the candidate expression and then calculating the value of the

derivative. This, however, can be exponentially expensive in the depth of the expression,

in terms of both memory and runtime. A better approach, that we adopt in this work,

is to simultaneously evaluate the expression’s value and its derivatives using automatic

differentiation (Kalman, 2002). This method avoids calculating an expression for the

derivative, and has run-time proportional to the size of the expression.

ADDITIONAL INFORMATION AND DECLARATIONS

Funding
The authors received no funding for this work.

Competing Interests
The authors declare there are no competing interests.

Hills et al. (2015), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.31 15/17

https://peerj.com/computer-science/
http://dx.doi.org/10.7717/peerj-cs.31


Author Contributions
• Daniel J.A. Hills and Adrian M. Grütter performed the experiments, analyzed the data,

performed the computation work, reviewed drafts of the paper.

• Jonathan J. Hudson conceived and designed the experiments, performed the experi-

ments, analyzed the data, contributed reagents/materials/analysis tools, wrote the paper,

prepared figures and/or tables, performed the computation work, reviewed drafts of the

paper.

Data Availability
The following information was supplied regarding data availability:

https://github.com/JonyEpsilon/flow.

REFERENCES
ATLAS collaboration. 2012. Observation of a new particle in the search for the

Standard Model Higgs boson with the ATLAS detector at the LHC. Physics Letters B
716:1 DOI 10.1016/j.physletb.2012.08.020.

Bradley E, Easley M, Stolle R. 2001. Reasoning about nonlinear system identification. Artificial
Intelligence 133:139 DOI 10.1016/S0004-3702(01)00143-6.

Clegg J, Dawson J, Porter S, Barley M. 2005. The use of a genetic algorithm to optimize the
functional form of a multi-dimensional polynomial fit to experimental data. The 2005 IEEE
congress on Evolutionary computation 1:928–934 DOI 10.1109/CEC.2005.1554782.

Dzeroski S, Todorovski L. 1993. Discovering dynamics. In: Proc. tenth international conference on
machine learning. San Mateo: Morgan Kaufmann Publishers Inc., 97–103.

Euler L. 1744. Methodus inveniendi lineas curvas maximi minive proprietate gaudentes.
In: Methodus inveniendi lineas curvas maximi minive proprietate gaudentes. Bousquet, Lausanne
& Geneva.

Feynman R, Leighton R, Sands M. 1963. The Feynman lectures on physics. In: The Feynman
lectures on physics. 2nd edition. vol. 1. Boston: Addison-Wesley.

Gao F, Han L. 2012. Implementing the Nelder-Mead simplex algorithm with adaptive parameters.
Computational Optimization and Applications 51:259 DOI 10.1007/s10589-010-9329-3.

Hayden EC. 2014. Is the $1,000 genome for real? Nature DOI 10.1038/nature.2014.14530.

Hillar C, Sommer F. 2012. Comment on the article “Distilling free-form natural laws from
experimental data”. ArXiv preprint. arXiv:1210.7273.

Hudson JJ, Hills DJA, Grütter AM. Flow project code. Available at https://github.com/JonyEpsilon/
flow.

Kalman D. 2002. Doubly recursive multivariate automatic differentiation. Mathematics Magazine
75:187 DOI 10.2307/3219241.

Koza JR. 1992. Genetic programming: on the programming of computers by means of natural
selection. Cambridge: MIT Press.

Langley P. 1979. Rediscovering physics with BACON.3. In: Proceedings of the 6th international
joint conference on artificial intelligence. vol. 1. San Francisco: Morgan Kaufmann Publishers
Inc., 505–507 series and number IJCAI’79.

Ljung L. 2010. Perspectives on system identification. Annual Reviews in Control 34(1):1–12
DOI 10.1016/j.arcontrol.2009.12.001.

Hills et al. (2015), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.31 16/17

https://peerj.com/computer-science/
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
http://dx.doi.org/10.1016/j.physletb.2012.08.020
http://dx.doi.org/10.1016/S0004-3702(01)00143-6
http://dx.doi.org/10.1109/CEC.2005.1554782
http://dx.doi.org/10.1007/s10589-010-9329-3
http://dx.doi.org/10.1038/nature.2014.14530
http://arxiv.org/abs/1210.7273
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
https://github.com/JonyEpsilon/flow
http://dx.doi.org/10.2307/3219241
http://dx.doi.org/10.1016/j.arcontrol.2009.12.001
http://dx.doi.org/10.7717/peerj-cs.31


Maupertuis PLM De. 1744. Accord de différentes lois de la nature qui avaient jusqu’ici paru
incompatibles. Mém. As. Sc. Paris 417.

Nelder JA, Mead R. 1965. A simplex method for function minimization. The Computer Journal
7(4):308–313 DOI 10.1093/comjnl/7.4.308.

Newman R, Tseng J. Memo 134: cloud computing and the square kilometre array. Available at
http://www.skatelescope.org/uploaded/8762 134 Memo Newman.pdf.

Schmidt M, Lipson H. 2009. Distilling free-form natural laws from experimental data. Science
324(5923):81–85 DOI 10.1126/science.1165893.

Schmidt M, Lipson H. 2010. Symbolic regression of implicit equations. In: Riolo R, O’Reilly
U-M, McConaghy T, eds. Genetic programming theory and practice VII, genetic and evolutionary
computation, Genetic and evolutionary computation. Springer, 73–85.

Sjöberg J, Zhang Q, Ljung L, Benveniste A, Delyon B, Glorennec P-Y, Hjalmarsson H, Judit-
sky A. 1995. Nonlinear black-box modeling in system identification: a unified overview. Trends
in System Identification, Automatica 31(12):1691–1724 DOI 10.1016/0005-1098(95)00120-8.

Todorovski L, Dzeroski S. 1997. Declarative bias in equation discovery. In: Proc. fourteenth
international conference on machine learning. San Mateo: Morgan Kaufmann Publishers Inc.,
376–384.

Zitzler E, Laumanns M, Thiele L. 2002. Giannakoglou K, et al., ed. Evolutionary methods
for design, Optimisation and Control with Application to Industrial Problems (EUROGEN).
International Center for Numerical Methods in Engineering (CIMNE), 95–100.

Hills et al. (2015), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.31 17/17

https://peerj.com/computer-science/
http://dx.doi.org/10.1093/comjnl/7.4.308
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://www.skatelescope.org/uploaded/8762_134_Memo_Newman.pdf
http://dx.doi.org/10.1126/science.1165893
http://dx.doi.org/10.1016/0005-1098(95)00120-8
http://dx.doi.org/10.7717/peerj-cs.31

	An algorithm for discovering Lagrangians automatically from data
	Introduction
	The principle of least action
	The Algorithm
	Score function
	Representation and search

	Results
	Generalisation
	Conclusion
	Acknowledgements
	Appendix: Tree-based representation and search
	References