key: cord-0167346-eu0063yx
authors: Dvzamonja, Mirna; Koutsoukou-Argyraki, Angeliki; Paulson, Lawrence C.
title: Formalising Ordinal Partition Relations Using Isabelle/HOL
date: 2020-11-26
journal: nan
DOI: nan
sha: 949f02a5386c008161c3b1e1912d826ab6fe1762
doc_id: 167346
cord_uid: eu0063yx

This is an overview of a formalisation project in the proof assistant Isabelle/HOL of a number of research results in infinitary combinatorics and set theory (more specifically in ordinal partition relations) by ErdH{o}s--Milner, Specker, Larson and Nash-Williams, leading to Larson's proof of the unpublished result by E.C. Milner asserting that for all $m in mathbb{N}$, $omega^omegaarrows(omega^omega, m)$. This material has been recently formalised by Paulson and is available on the Archive of Formal Proofs; here we discuss some of the most challenging aspects of the formalisation process. This project is also a demonstration of working with Zermelo-Fraenkel set theory in higher-order logic.

Higher-order logic theorem proving was originally intended for proving the correctness of digital circuit designs. The focus at first was bits and integers. But in 1994, a bug in the Pentium floating point division unit cost Intel nearly half a billion dollars [32] , and verification tools suddenly needed a theory of the real numbers. The formal analysis of more advanced numerical algorithms, e.g. for the exponential function [21] , required derivatives, limits, series and other mathematical topics. Later, the desire to verify probabilistic algorithms required a treatment of probability, and therefore of Lebesgue measure and all of its prerequisites [24] . The more verification engineers wanted to deal with real-world phenomena, the more mathematics they needed to formalise. And it was the spirit of the field to reduce everything to a minimal foundational core rather than working on an axiomatic basis. So a great body of mathematics came to be formalised in higher-order logic, including advanced results such as the central limit theorem [1] , the prime number theorem [22] and the Jordan curve theorem [20] .

These results are impressive, but they are mostly 19th century or earlier. Clearly the introduction of proof assistants into mathematical research calls for the formalisation of more modern results. There are some such achievements, notably in Coq: 1 the four colour theorem [16] and the Feit-Thompson theorem [17] . Here the mathematics is more recent and is certainly not analysis, but the focus on finite computations might be concerning.

The material we have formalised for this case study [12, 28, 44] comes from the second half of the 20th century and concerns an entirely unexamined field: infinitary combinatorics and more specifically, ordinal partition relations. This field deals with generalisations of Ramsey's theorem to transfinite ordinals. It was of special interest to the legendary Paul Erdős, and it is particularly lacking in intuition, to such an extent that even he could make many errors [13] . Moreover, because our example requires ordinals such as ω ω , it is a demonstration of working with Zermelo-Fraenkel set theory in higher-order logic.

From a mathematical point of view, this work has the ambitious aim to be more than a case study of formalisation. We hope that it is a first step in a programme of finding ordinal partition relations by new methods, using the techniques developed in the formalisation. The reader familiar with Ramsey theory on cardinal numbers, including the encyclopaedic book by Paul Erdős et al. [11] or more recent works on the use of partition relations in topology and other celebrated applications, e.g. by Stevo Todorčević [42, 43, 44] , might be doubtful about the need for new methods in discovering partition relations. But none of the powerful set-theoretic methods for studying cardinal partition relations apply to ordinal partition relations. The difficulty is that in addition to the requirement on the monochromatic set to have a given size, which we would ask of a cardinal partition relation, the ordinal case also has the requirement of preserving the order structure through having a fixed order type. In fact, the difference between the order structure versus an unstructured set shows up already in the case of addition: the addition of infinite cardinals is trivial, whereas the ordinal addition is nuanced. Ordinal partition relations are the first instance of structural Ramsey theory, which is a growing and complex area of combinatorics.

The fact is that everything we know about ordinal partition relationswhich is short enough to be reviewed in our §2-has been proven painfully and laboriously. Ingenious constructions by several authors since the 1950s have chipped the edges off the most important problem in the subject, which is to characterise the countable ordinals α such that α −→ (α, m) for a given natural number m (for the notation see §2). The simplest nontrivial case of m = 3 is the subject of a $1000 open problem of Erdős, posed back in 1987, see §2.

We may ask why it is that modern set theory is so silent on the subject of ordinal partitions. Perhaps it is the case of the chicken and the egg. In the case of cardinal numbers, whose study has been at the heart of almost everything done in set theory since the time of Cantor, one of the first important advances was exactly the understanding of partition relations. They are the backbone of infinite combinatorics: many theorems in set theory can be formulated in terms of partitions, an attitude well supported by the work of Todorčević cited above. So to better understand the ordinals, we first have to understand their partition properties, rather than expecting that powerful general methods will be developed first and then yield an understanding of ordinal partitions. The most interesting problems about ordinal partitions are about countable ordinals, while modern set theory really only starts at the first uncountable cardinal.

The plan of this paper is as follows: in the next section we give a short but comprehensive introduction to ordinal partition relations; in Section 3, we present the material formalised including brief sketches of Larson's proofs of partition theorems for ω 2 and ω ω ; in Section 4, we give a more detailed exposition on the Nash-Williams partition theorem including two different proofs; in Section 5, we present an introduction to the Isabelle/HOL theorem prover; Section 6 presents our formalisation of the Nash-Williams theorem; Section 7 discusses the formalisation of Larson's proofs; finally, Section 8 summarises what we learned from this project.

As a side lemma in his work on decidability, Frank Ramsey [38] in 1929 proved what we now call Ramsey's theorem. It states that for any two natural numbers m and n, if we divide the unordered m-tuples of an infinite set A into n pieces, there will be an infinite subset B of A whose all unordered m-tuples are in the same piece of the division. This theorem has since been generalised in many directions and Ramsey theory now forms an important part of combinatorics, both in the finite and in the infinite case. We shall only discuss Ramsey theory of cardinal and ordinal numbers, although a vast theory exists extending Ramsey theory to various structures, see for example the work of Todorčević [43] , [44] . Also, we shall only be interested in partions of pairs of ordinals, as it already proves to be quite challenging. In this section we review what is known about such ordinal partitions.

It is convenient to introduce the notation coming from Erdős and his school, for example in Combinatorial Set Theory [11] . Writing

means that for every partition of the set [α] 2 of unordered pairs of elements of α into two parts (called colours, say 0 and 1), there is either a subset B of α of order type β whose pairs are all coloured by 0, or a subset C of α of order type γ whose pairs are all coloured by 1. Such a B is said to be 0-monochromatic, while C is 1-monochromatic. The notation implies that the situation is trivial unless β, γ ≤ α, as we shall assume. Note also that for all ordinals, the following rules of monotonicity hold:

Some authors, such as Larson [28] , use the notation α −→ (β, γ) 2 to emphasise that it is pairs that are coloured, but since we shall only ever work with pairs, we omit the superscript. The negation of α −→ (β, γ) is written α −→ (β, γ). It turns out that finding the triples α, β, γ for which the relation (1) holds is highly non-trivial. It is even non-trivial when all of the ordinals α, β, γ are countable, which is the case to which we shall restrict our attention. On the other hand, for α ≤ ω the situation is already understood by the classical Ramsey theorem, so we shall assume α > ω.

The first observation to make is that we should assume that γ is finite.

Proof. Given such α we shall define a colouring c of the pairs of ordinals in α in two colours 0 and 1. Let π : α → |α| be a bijection. Given a pair {i, j} we define c({i, j}), denoted as customary c(i, j), by letting it be the truth value of the following statement:

Here, of course, by truth value 1 we mean 'true' and by 0 we mean 'false'. A 1-monochromatic set in α of order type ω would give an infinite decreasing sequence of ordinals in |α|, so it cannot exist. On the other hand, a 0monochromatic set in α of order type |α|+1 would give an increasing sequence in |α| of order type |α| + 1, so it cannot exist either. Hence c shows that α −→ (|α| + 1, ω). ⋆ 2.1

By definition it follows that for any α we have α −→ (α, 2), so the first nontrivial case is the following question, to whose solution Erdős attached the prize of $1000 in his 1987 paper [9] :

Erdős's problem Characterise the set of all countable ordinals α such that α −→ (α, 3).

This problem is still very much open. In fact, the more general problem of characterising countable ordinals α and natural numbers m such that the relation α −→ (α, m) holds was asked already by Erdős and Richard Rado in 1956 [10] and it was the slow progress on it that made Erdős reiterate the simplest case of this problem in his 1987 problem list [9] .

The first progress towards solving Erdős's problem came from Ernst Specker [41] , whose results were continued by Chen-Chung Chang [5] . Chang gave a very involved proof of ω ω −→ (ω ω , 3) and gained $250 from Erdős. In an unpublished manuscript, Eric Milner improved Chang's result to say that

for all natural numbers m. The main proof that we have formalised is Jean Larson's proof of Milner's result. Comparing her paper [28] with earlier proofs explains why the paper is called 'A short proof . . . ', but it is not a short proof and formalising it was a challenge.

It turns out that the behaviour of countable ordinals with respect to partition relations is influenced by their Cantor Normal Form, so we take the opportunity to remind the reader of that concept.

Theorem 2.1 (Cantor Normal Form) Every ordinal number can be written in a unique way as an ordinal sum of the form

where n is non-negative integer, as are the m i for i ≤ n, and β 0 > β 1 > . . . β n ≥ 0 are ordinals.

The state of the art regarding the known positive instances of Erdős's problem is the following theorem of Rene Schipperrus from his 1999 Ph.D. thesis [39] , published many years later, in 2010, as a journal paper [40] : (Schipperus 1999 ) Suppose that β is a countable ordinal whose Cantor Normal Form has at most two summands. Then

The delay between the thesis and the paper is indicative of the difficulty of the proof and the process of checking its correctness. A sketch of Schipperus' proof, divided in seven subsections, is given on pages 188-209 of the excellent survey article [19] by András Hajnal and Larson. The reason that Schipperus focused on ordinals of the type ω ω β is that if the ordinal α is not a power of ω then it cannot satisfy α −→ (α, 3) , as shown in the following Observation 2.2. Hence, only the powers of ω are of interest. Fred Galvin showed [14] that for an ordinal of the form α = ω β where β ≥ 2 is not itself a power of ω, we have α −→ (α, 3). Hence Schipperus' choice of ordinals. Still open is the case of α = ω ω β where β has at least three summands in its Cantor Normal Form.

Observation 2.2 Suppose that α is an ordinal which is not a power of ω. Then α −→ (α, 3).

Proof. It follows from the Cantor Normal Form that any α which is not a power of ω is additively decomposable, in the sense that there exist ordinals β, γ < α such that α = β + γ. Fixing such β and γ, we define c on [α] 2 by letting c(x, y) = 0 if either x, y < β or x, y ≥ β. Otherwise, we let c(x, y) = 1.

Then it suffices to note that any 0-monochromatic subset of α is either contained in β or in [β, β+γ) and hence has the order type at most max(β, γ), which is strictly less than α. On the other hand, if we have x, y, z < α distinct, at least two of them will be < β or at least two of them will be ≥ β and in either case, the set they form will get mapped to 1 by c. Hence the set {x, y, z} is not 1-monochromatic. ⋆ 2.2

In this section, we have mostly concentrated on the result we formalised and Erdős's problem. Information on some additional instances of α −→ (α, m) for m > 3 can be found in the Hajnal-Larson paper [19] .

The ultimate objective of the project was to formalise Larson's proof [28] of the following unpublished result by E.C. Milner:

While working towards that objective, many set-theoretic prerequisites had to be formalised, notably Cantor Normal Form, indecomposable ordinals and many elementary properties of order types. Her paper contains, as a simpler example of the methods she employed, a proof of Specker's theorem [41] :

Although not strictly necessary, the proof of Theorem 3.2 was formalised as a warmup exercise, as it is structured similarly to the proof of Theorem 3.1. The project also required the formalisation of a short but difficult (and error filled) proof by Erdős and Milner [13] : 

The last significant side project necessitated by Larson's proof of Theorem 3.1 was to formalise the Nash-Williams partition theorem, as presented by Todorčević [44] . For this we need a little notation: for a set A and integer k ≥ 0, let [A] k be the set of all the k-element subsets of A. A more detailed analysis of the formalisation of the Nash-Williams partition theorem (Theorem 3.4) will be given in the following section. This theorem is a generalisation of Ramsey's theorem [38] : For simplicity, both theorems are presented in their 2-colour versions. We obtain Ramsey's theorem from Nash-Williams simply by noting that the set [M] p is thin. While Ramsey's theorem (Theorem 3.5) was used in the proof of Theorem 3.2, the Nash-Williams partition theorem (Theorem 3.4) was used in a similar fashion in the proof of Theorem 3.1.

We give brief sketches of the proofs of Theorems 3.2 and 3.1 below. For details, the reader may refer to Larson's paper [28] or to the formalised versions of the proofs, where every step is made explicit, in the entries by Paulson at the Archive of Formal Proofs [35, 36] .

It is sufficient to prove the theorem for functions f :

: a < b < ω} is ordered lexicographically and is order isomorphic to ω 2 . The following is Larson's Definition 2.2.

and form 3 if a = c and b = d. If A has one of these forms, then the interaction scheme of A is defined by It follows from Ramsey's theorem (Theorem 3.5) that we can obtain an infinite set N ⊆ ω and j 0 , j 1 , j 2 , j 3 ∈ {0, 1} so that for any k < 4 and any pair 

It is then shown that for the infinite set N we may obtain a set X ⊆ U which is order isomorphic to ω 2 so that for any pair {x, y} ⊆ X there is a k < 4 so that {x, y} has form k and i({x, y}) ⊆ N. Therefore, f ({x, y}) = j 0 = j 1 = j 2 = j 3 = 0.

Thus we have shown that f ([X] 2 ) = {0}.

3.2 Sketch of Larson's proof [28] of Theorem 3.1

The reader will notice that this proof follows a pattern similar to the one above. Some notational conventions:

• s and t denote increasing sequences s of elements of ω.

• s < t means the greatest element of s is less than the least element of t.

• s * t denotes the concatenation of two sequences.

• |s| denotes the length of s.

For each n < ω, define W (n) = {(a 0 , a 1 , . . . , a n−1 ) : a 0 < a 1 < · · · < a n−1 < ω} ordered lexicographically. W (n) is thus order isomorphic to ω n . Let W = W (0) ∪ W (1) ∪ · · · be ordered first by length of sequence and then lexicographically, so that W is order isomorphic to ω ω . It now suffices to prove the theorem for functions f :

We now present Larson's Definition 3.5 [28] .

if there are non-empty sequences a 1 , a 2 , . . ., a k (a k+1 ), b 1 , b 2 , . . ., b k and c and d such that

(1) x = a 1 * a 2 * · · · * a k ( * a k+1 ),

(4) d = (|b 1 |, |b 1 | + |b 2 |, . . . , |b 1 | + |b 2 | + · · · + |b k |),

If {x, y} is of form 2k − 1 (form 2k) and a 1 , a 2 , . . ., a k (a k+1 ), b 1 , b 2 , . . ., b k , c and d are as above, then we call

the interaction scheme of {x, y}).

Let m < ω. The theorem is trivially true for m = 0 and m = 1, so we assume that m > 1. In a similar fashion as before, we assume that f :

. So, to prove the theorem, we will find a set X of type ω ω for which f ([X] 2 ) = {0}.

To this end, from the Erdős-Milner theorem (Theorem 3.3), we infer that for every k, n < ω with n > 0, ω n·k −→ (ω n , k); then considering f restricted to W (n · m) we obtain a set W ′ (n) order isomorphic to ω n for which

It is order isomorphic to ω ω and f has value zero on pairs of sequences in W ′ of the same length.

Larson [28, p. 134] remarks 'Without loss of generality, we may assume that W ′ is our original set W . Thus to prove the theorem, we must find a set X ⊆ W of type ω ω for which f has value zero on pairs of sequences of different lengths.' This identification of W and W ′ is possible because there is an order isomorphism between them that preserves lengths. Strictly speaking, the bulk of the elaborate construction is done using W , then finally mapped back to W ′ ; this can be seen in the formal version [36] .

By applying the Nash-Williams partition theorem (Theorem 3.4) to f it is eventually shown that we can obtain an infinite set N and a sequence {j k : k < ω} so that for each k < ω with k > 0 we obtain an m-element set M k , so that for any {x, y} ⊂ M k we have f ({x, y}) = j k . Thus, for any k < ω with k > 0 it follows that j k = 0. It is then shown that we may obtain a set X ⊆ W order isomorphic to ω ω , so that for each {x, y} ⊆ X there is an l < ω for which {x, y} has form l and if l > 0, then (n l ) < i({x, y}) ⊆ N where n l is the l-th term in the enumeration of N in increasing order. Thus, for pairs {x, y} ⊆ X which are not of form 0, we have f ({x, y}) = j l = 0 for some l. In the case l = 0, for any pair of form 0 we have by assumption

In this section we shall give a proof of a fundamental theorem due to Crispin Nash-Williams [31] . This result was discovered while studying the notion of well quasi orders (wqo) P , notably distinguishing those that have the property that for every countable ordinal α, the set P α of all sequences of length α from P is wqo when ordered by the embeddability relation. Such orders are called bqo or better quasi orders [31] . Neither wqo nor bqo are relevant here, but the theorem proved by Nash-Williams is of use in the study of ordinal partition relations, as well as in many other contexts, for example reverse mathematics [27, 30] . In particular, the theorem was used by Larson in the paper we formalised [28] .

Nash-Williams' theorem has seen many different proofs. While above we have steered away from presenting full proofs about ordinal partition relations as they are too long, here we present not only one, but two proofs of Nash-Williams' theorem, to give the reader the flavour of the way that proofs are constructed in this subject. Many other proofs of the theorem are known: Alberto Marcone gives one [30] and refers to several others, including one by Stephen G. Simpson using descriptive set theory and the notion of bad arrays, which is perhaps the most popular proof these days (it originates in the methods of Fred Galvin and Karel Prikry [15] and Richard Laver [29] ). Paulson formalised Nash-Williams' theorem as described in §6. The formalisation [35] corresponds to Todorčević's presentation [44] of the original Nash-Williams proof, which is the second of the two proofs we give. He uses a notion of combinatorial forcing due to Galvin and Prikry [15] . The proof of Nash-Williams' theorem using combinatorial forcing appeared as early as 1985 in a note by Ian Hodkinson on a Ph.D course given by Wilfrid Hodges [23] (which the authors have specifically asked not to use as a primary reference), but in fact Galvin and Prikry [15] prove a stronger theorem by the same method.

The main objects in Nash-Williams' theorem are families of finite subsets of ω. We introduce some notation and definitions regarding such sets. (2) We identify sets in [ω] <ω with their increasing enumerations, and hence the set [ω] <ω becomes a subset of the set <ω ω of finite sequences in ω. Therefore, we can consider the relation of being an initial segment on P(ω), writing s ⊑ t when s is an initial segment of t.

We shall be interested in subsets F of [ω] <ω that are dense in the sense that every element of [ω] ∞ has an element of F as an initial segment. In particular, we shall consider such sets that are minimal, meaning that we cannot take away an element of F and still satisfy the density requirement. This is the same as to say that if we have an element of F then none of its proper initial segments are in F . Larson [28] 

(2) A front on an infinite set A ⊆ ω is a thin subset F of [A] <ω which in addition satisfies the density condition:

for every X ∈ [A] ∞ , there exists s ∈ F such that s ⊑ X.

Notice that the singleton {∅} is a front, which we call the trivial front.

Since the Nash-Williams partition theorem has been used in so many different contexts, it is not surprising that the terminology diversified as well: most concepts ended up having more than one name. Here we follow a couple of references we find particularly nice: the survey paper [4] by Raphaël Carroy and Yann Pequignot, and the book [44] by Todorčević.

The theorem basically says that if we divide a front on a set A into a finite number of pieces, then one of them will contain a front on some infinite subset A ′ of A. This is the analogue of the version of the pigeonhole principle which says that if an infinite set is divided into a finite number of pieces, then one of the pieces is infinite. It is often said, including by Larson [28] , that the theorem is a Ramsey-type statement. Note however that Ramsey's theorem applies to colourings of pairs of elements of ω, while the Nash-Williams theorem applies to colouring singletons in a front. The analogue of the theorem applying to pairs of elements of a front is easily seen to be false. We shall prove the Nash-Williams theorem in the following version, which easily yields the informal statement mentioned above by applying the theorem finitely many times. 

For this proof we shall recall the notion of the rank of a well-founded tree. Most readers will be familiar with the following notions, but we prefer to define them for completeness. (2) A tree is (T, ≤) is said to be well-founded if it has no infinite branch, or equivalently, if the reverse order (T, ≥) is well-founded as a partial order.

Every well-founded partial order P has a rank : there is a unique smallest ordinal α such that there is a strict order-preserving function from P to α. The formal definition of this concept in the case of a well-founded tree (T, ≤) (taking into account that it is actually the reverse order that is well-founded) is given by the recursive formula which assigns rank 0 to all the terminal nodes of the tree and proceeds downwards by letting for s a non-terminal node:

Note that if our thin set F is not a front on A, then we can find an infinite subset A ′ of A which does not have any initial segment in F , and hence A ′ is as required by the conclusion of the theorem. Therefore we may assume that F is a front. The connection between the fronts and the trees is that to every front F we can associate the tree T (F ) defined by

ordered by ⊑. The crucial observation is quite simple:

Proof. Suppose that η is an infinite branch of T (F ) and let X = η. Then X is infinite and hence by the fact that F is a front, there is s ∈ F such that s ⊑ X. Since η is infinite, there exists s ′ on η which is strictly above s. By the definition of T (F ), there is t ∈ F such that s ′ ⊑ t. But then s ⊑ t and s and t are two different elements of the front F , a contradiction. ⋆ 4.4

Observation 4.4 implies that for every front F , the rank ρ(T (F )) is well defined, and this is what we call the rank of the front F and denote by ρ(F ). We shall use induction on ρ(F ) to prove the Nash-Williams theorem. Let α = ρ(F ). The proof is by induction on α.

If α = 0, then F = {∅} is the trivial front and the conclusion trivially follows. Suppose that α > 0 and the theorem is true for any front of rank < α. Notice that since F is a non-trivial front we have that ∅ / ∈ F . For every n < ω consider

Then clearly T (F ) = {∅} ∪ n<ω T n and ρ(T (F )) = sup n<ω (ρ(T n ) + 1). For each n we shall find a front F n on an infinite subset of ω such that T (F n ) is isomorphic to T n . Let F n = {s ⊆ ω \ (n + 1) : {n} ∪ s ∈ F }. We claim that F n is a front on ω \ (n + 1). First, if s, t ∈ F n are such that s ⊑ t, then {n} ∪ s ⊑ {n} ∪ t and both {n} ∪ s and {n} ∪ t are elements of F . Therefore, s = t. As for the density, if A is an infinite subset of ω \ (n + 1), there is some t ∈ F with t ⊑ {n} ∪ A. Then n ∈ t and letting s = t \ {n}, we obtain that s ⊑ A and s ∈ F n . So, F n is indeed a front on ω \ (n + 1). It remains to see that T (F n ) is isomorphic to T n .

If s ∈ T n then n ∈ s and for some t ∈ F we have s ⊑ t. Hence t\{n} ∈ F n and hence s \ {n} ∈ T (F n ). If u ∈ T (F n ), then for some s ∈ F n we have u ⊑ s and therefore

This shows that s → s \ {n} is a bijection between T n and T (F n ), and as it is clearly order preserving, we have an isomorphism. In particular ρ(T (F n )) = ρ(T n ) < ρ(T (F )) and so F n is a front with the rank less than α and we can apply the induction hypothesis to it. The colouring c induces a colouring c n of F n , given by c n (s) = c({n} ∪ s).

The proof now continues by defining recursively for k < ω sets A k ∈ [ω] ∞ and n k = min(A k ), as follows. To start, let A 0 = A and n 0 = min(A). Given A k and n k , we note that {s ∈ F n k :

At the end of this recursion, there must be an infinite set Y and i * such that for all k ∈ Y we have i k = i * . Let A ′ = {n k : k ∈ Y }. We claim that A ′ is as required, as shown by the set F ′ = {s ∈ F : s ⊑ A ′ }, which is clearly a front on A ′ . Since n k s are increasing, we have that A ′ is infinite. If t ∈ F ′ , let k be such that n k = min(t). We need to show that c(t) = i * .

We notice that A ′ \ (n k + 1) ⊆ A k+1 . Hence, letting s = t \ {n k }, we have that c(t) = c n k (s) = i * . This finishes the proof.

Nash-Williams' original proof was reworked by Hodkinson and Hodges [23] and Todorčević [44] to use combinatorial forcing. Many results in infinite combinatorics can be seen as instances of the fact that one can do settheoretic forcing over a countable family of dense sets without changing the underlying universe, as proved by Paul Cohen [7] and explained more carefully by others [8, 26] . We use capital letters close to the beginning of the Latin alphabet B, C . . . to denote infinite subsets of A and lowercase letters close to s such as s, u, t to denote finite sequences in A. Here comes a key definition of this particular instance of combinatorial forcing: Definition 4.5 Comparable, accepts, rejects, decides:

(1) s and t are comparable if either s ⊑ t or t ⊑ s.

(4) B decides s if B either strongly accepts s or rejects s.

(5) If F ′ is a subset of F , we make definitions similar to (1)-(4) taking F ′ as a parameter, and then we add the qualification with respect to F ′ to the notion of accepting, rejecting and so on. 3

Let us make some simple observations about the notions introduced.

Observation 4.6 Let B and s be given.

(1) If B rejects (strongly accepts) s, then so does every C ∈ [B] ∞ . It follows that the analogue is true for the notion of deciding.

(2) There exists C ⊆ B which decides s and where max(s) < min(C).

(3) B accepts (rejects, strongly accepts, decides) s iff B \ max(s) accepts (rejects, strongly accepts, decides) s.

Proof. (of Observation 4.6). (1) is evident from the definitions. For (2), let C = B \ [max(s) + 1]. If C strongly accepts s, then C is as required. If that C does not strongly accept s, then there is D ∈ [C] ∞ which rejects s and then D is as required.

For (3), if B accepts s then there is t ∈ F comparable with s such that t \ s ⊆ B. But then it follows that t \ s ⊆ (B \ max(s)) since if t ⊑ s this is vacuously true, and if s ⊑ t then t \ s is disjoint from max(s). The rest of the cases are proved similarly. ⋆ 4.6 Lemma 4.7 There is B ∈ [A] ∞ which decides all its finite subsets.

Proof. (of Lemma 4.7). By recursion on n < ω we shall choose pairs (s n , A n ) so that

• A n decides every subset of s n ,

• max(s n ) < min(A n ) and

We let s 0 = ∅ and we choose A 0 using Observation 4.6 (2). Given (s n , A n ), let s n+1 = s n ∪ min(A n ) and let A n+1 ∈ [A n ] ∞ be a set which decides every subset of s n+1 . Such a set is obtained by a finite sequence of applications of Observation 4.6 (2). By cutting off the first several elements of A n+1 , which we can do by applying Observation 4.6 (1), we can assume that max(s n+1 ) < min(A n+1 ).

At the end of this recursion, let B = n<ω s n . Since we have made sure that s n+1 \ s n = ∅ for every n, we can conclude that B is infinite. If s is a finite subset of B, then there is first n such that s ⊆ s n . We have that B \ max(s n ) ⊆ A n and therefore B \ max(s n ) decides s. By Observation 4.6(3), we conclude that B decides s. ⋆ 4.7

Let B be as provided by Lemma 4.7. The final lemma we need is the following: is infinite, since B decides every s ∪ {n}. Hence C, in particular, accepts s by the assumption on B, as exemplified by some t ∈ F . If t ⊑ s then clearly t ⊑ s ∪ {n} for any n and t \ s ⊆ C, so a contradiction. Hence s is a proper initial segment of t. Let n = min(t \ s). We claim that C accepts s ∪ {n}, which will give a contradiction with the choice of C. Indeed, t\ (s ∪{n}) ⊆ C and t ∈ F , so we are done with the first claim of the lemma.

The second claim follows by taking m large enough so that B strongly accepts s ∪ {n} for all n > m and then using the hereditary nature of strong acceptance, as per Observation 4.6(3). ⋆ 4.8

We now go back to the proof of Theorem 4.1.

Clearly, both F i are thin sets, so all the observations and lemmas we proved about F apply also to each F i . In particular, by applying Lemma 4.7 and Lemma 4.8 to F 0 we can find A ′ ∈ [A] ∞ which decides every of its finite subsets with respect to F 0 and moreover, for every s ∈ [A ′ ] <ω which A ′ strongly accepts, A ′ also strongly accepts s ∪ {n} for all n ∈ A ′ \ [max(s) + 1]. If A ′ rejects ∅, then clearly no finite subset of A ′ is in F 0 and hence we have c(s) = 1 for every element of [A ′ ] <ω ∩ F . Now suppose that A ′ strongly accepts ∅ with respect to F 0 . It follows by the choice of A ′ (and an inductive argument) that A ′ strongly accepts all its finite subsets with respect to F 0 . If there were to exist an element s ∈ [A ′ ] <ω with s ∈ F 1 , then the strong acceptance by A ′ of s would yield a t ∈ F 0 (so t = s) comparable with s, which is impossible since F is thin. Therefore we have c(s) = 0 for every element of [A ′ ] <ω ∩ F . ⋆ 4.1

Isabelle is an interactive theorem prover originally developed in the 1980s with the aim of supporting multiple logical formalisms. These include firstorder logic (intuitionistic as well as classical) and higher-order logic as well as Zermelo Fraenkel set theory. However, the 1990s saw higher-order logic take a dominant role in the field of formal verification, and while Isabelle/ZF and Isabelle/HOL share the entire Isabelle code base (basic inference procedures, a sophisticated user interface, etc.), Isabelle/HOL [33] has much additional automation: so much so that it's the best choice even for set theory.

Unlike proof assistants based on constructive type theories, Isabelle/HOL implements simple type theory. Types can take types as parameters but not integers for example. The type of finite sequences (known as lists) takes the component type as a parameter, but there is no type of n-element lists. Logical predicates form the basis of a basic typed set theory, where any desired set can be expressed by comprehension over a formula. We can define the set of n-element lists where the elements are drawn from some other set. Thus, the simple framework given by types can be refined through the use of sets.

There are always trade-offs between expressiveness of a formalism and ease of automation. Reliance on a simple classical formalism frees us from the many technical difficulties which seem to plague constructive type theories, such as intensional equality, difficulties with the concept of set, and performance issues in space and time.

Isabelle employs the time-honoured LCF approach [18] . This architecture ensures soundness through the use of a small proof kernel that implements the rules of inference and has the sole right to declare a statement to be a theorem. Upon this foundation, Isabelle/HOL provides many forms of powerful automation [37] :

• simplification, i.e., systematic directed rewriting using identities • sophisticated logical reasoning even with quantifiers • sledgehammer : strong integration with external theorem provers • automatic counterexample finding for many problem domains

We found that some of this automation works effectively with Larson's elaborate constructions on sequences.

Isabelle's higher-order logic is closely based on Church's simple type theory [6] . It includes the following elements:

• Types and type operators, for example int (the type of integers), or α⇒β (the type of functions from α to β), or α list (the type of finite sequences whose elements have type α). Note the postfix syntax: (int list) set is the type of sets of lists of integers.

• Terms built of constants, variables, λ-abstractions, and function applications.

• Formulas: terms of the truth value type, bool (Church's o). The usual connectives and quantifiers are provided.

• The axiom of choice (AC) for all types via Hilbert's operator ǫx.φ, denoting some a such that φ(a) if such exists. 4

The typed set theory essentially identifies sets with predicates, with type α set essentially the same as α⇒bool. On this foundation, recursive definitions of types, functions and predicates/sets are provided through programmed procedures that reduce such definitions to primitive constructions and automatically prove the essential properties. This basis is expressive enough for the formalisation of the numerous advanced results mentioned in the introduction.

Since the set type operator can be iterated only finitely many times, simple type theory turns out to be weaker than Zermelo set theory (let alone ZF). For work requiring the full power of ZFC, it is convenient to assume some version of the ZF axioms within Isabelle/HOL. The approach adopted here [34] seeks a smooth integration between ZFC and simple type theory. We introduce a type V (the type of all ZF sets) and then V set is the type of classes. Our ZF axioms characterise the small classes, those that can be embedded into elements of V . The axiom of choice comes from Hilbert's ǫ-operator, which is part of Isabelle/HOL. On this basis it is straightforward to define the usual elements of Cantor's paradise, including ordinals, cardinals, alephs and order types. We borrow large formal developments from the existing Isabelle/HOL framework: recursion on ∈ is just an instance of well-founded recursion, and it's easy to deduce that the type real corresponds to some element of V without redoing the construction of the real numbers. We define order types on wellorderings only (yielding ordinals). These wellorderings can be defined on any Isabelle/HOL types: we can consider orderings defined on type nat rather than on the equivalent ordinal, ω. We started with a small library of facts about order types, which grew and grew in accordance with the demands of the case study.

The point of adopting Isabelle/HOL over Isabelle/ZF is the possibility of making use of its aforementioned automation (Sledgehammer) and the counterexample-finding tools (Nitpick [2] and Quickcheck [3] ). The main drawback of doing set theory in Isabelle/HOL is the impossibility of working without AC: the axiom is inherently part of the framework. That drawback has no bearing on the present project, however.

Proofs in Isabelle are written in a structured formal language. They contain nested scopes in which variables may be introduced along with assumptions about them and local definitions, from which some conclusion is proved, perhaps with the help of intermediate results.

Here is a trivial lemma concerning two natural numbers, x and y ; if x div 2 = y div 2 and even x ←→ even y then x = y. Intermediate results are inserted using the keyword have, while the conclusion is presented using show. The lines beginning with the keyword by contain justifications. A justification can consist of a proof method such as simp (for simplification), or a series of proof methods, or a full proof structure enclosed within the brackets proof and qed. Thus the various proof elements (there are many others) can be nested to any depth.

lemma div2_even_ext_nat: fixes x y :: nat assumes "x div 2 = y div 2" and "even x ←→ even y"

shows "x = y" prooffrom even x ←→ even y have "x mod 2 = y mod 2" by (simp only: even_iff_mod_2_eq_zero) auto with assms have "x div 2 * 2 + x mod 2 = y div 2 * 2 + y mod 2" by simp then show "x = y" by simp qed Explicit statements of the assumptions and conclusions make structured proofs more verbose than the tactic-style proofs that predominate with other proof assistants, but infinitely more legible. Our objective is not merely to formalise mathematical results but to create a formal proof document that makes the argument clear to the reader. Without having to rerun the proof on a computer or trust the software, a knowledgeable reader can note what is being claimed and decide for herself whether it follows from the given assumptions.

Although the Nash-Williams partition theorem is only a minor part of the project, it's important and its formalisation is brief enough to present in reasonable detail. In the next section, we'll turn to Larson's paper.

As we saw in Sect. 4 above, the theorem is concerned with sets of natural numbers. Finite sets of natural numbers can be identified with ascending finite sequences; typical treatments of Nash-Williams use sets, while Larson uses sequences. For sets S and T , we write S < T to express that every element of S is less than every element of T (it holds vacuously if either set is empty). This is essentially the same statement as the s < t mentioned in the previous section. Our formalisation [35] follows Todorčević [44] , which was the second proof presented in Sect. 4.

S is an initial segment of T if T can be written in the form S ∪ S ′ such that S < S ′ , written less_sets S S' in Isabelle syntax. where

A family F of sets is thin provided every element of F is finite (expressed as F ⊆ {X. finite X} ) and it does not contain distinct elements S and T where one is an initial segment of the other. 5 definition thin_set :: "nat set set ⇒ bool"

where

These definitions provide the necessary vocabulary to state the theorem, although its proof appears at the end of the development after those of all prerequisite lemmas.

theorem Nash_Williams:

assumes F: "thin_set F" "r > 0" shows "Ramsey F r"

We now formalise the concepts of rejecting, strongly accepting and deciding a set, as described in Sect. 4. We regard F as fixed and say M decides S, etc. Note that the identification of 'M accepts S' with 'M does not reject S' is formalised as an abbreviation rather than a definition, a distinction that affects proof procedures but makes no difference mathematically. 

A great many obvious facts about these primitives can be proved automatically. But there are some nontrivial properties not mentioned in the text that require elaborate proofs. A key technique in this field is called diagonalisation, involving the construction of a sequence of infinite sets

. . from which something can be obtained.

The following proposition states that an infinite set M can be refined to an infinite N ⊆ M that decides all subsets of the given finite set S. The formal proof is 49 lines long and involves an inductive construction along with multiple inductive subproofs.

proposition ex_infinite_decides_finite:

Todorčević's Lemma 1.18 states that an infinite set M can be refined to an infinite N ⊆ M that decides all of its finite subsets. He notes that it follows by some "immediate properties" of the definitions "and a simple diagonalisation procedure". The formal equivalent is 190 lines.

proposition ex_infinite_decides_subsets:

assumes "thin_set F" "infinite M" obtains N where "N ⊆ M" "infinite N" "decides_subsets F N"

Todorčević's Lemma 1.19 states that M strongly accepts S ∪ {n} for all but finitely many n in M, under the given conditions. It has a four-line proof that leaves out a great many key details, at least for non-experts. The formal equivalent is 69 lines. The version given above turns out to be slightly too weak for its intended use in the sequel. The following strengthening is necessary. It yields an infinite N ⊆ M such that N strongly accepts S ∪ {n} for all n in N such that n > max(S). Todorčević emailed a six-line proof sketch (he also helped with 1.19) ; the formal proof is 167 lines and includes an inductive construction. The proof of Nash-Williams itself is given for the case of r = 2 and the informal text is 11 lines long. It is interesting that the formal equivalent is only 65 lines long: it is therefore shorter than the proofs of several supposedly obvious lemmas. The straightforward generalisation of the theorem to all r > 0 is, at 73 lines, slightly longer.

theorem Nash_Williams_2:

assumes "thin_set F" shows "Ramsey F 2"

The de Bruijn factor [45] is defined as the ratio of the size of the formal mathematics to the size of the corresponding mathematical exposition. It can be regarded as measuring the cost of formalisation. Unfortunately, it's highly inexact. Mathematical writing varies greatly in its level of detail. The first ever de Bruijn factor was calculated for Jutting's translation of Landau's Grundlagen der Analysis-on the construction of the complex numbers-into AUTOMATH.

An aspect which has not been mentioned so far is the ratio between the length of pieces of AUT-QE text and the length of the corresponding German texts. Our claim at the outset was that this ratio can be kept constant. . . . As a measure of the lengths the number of stored AUT-QE expressions . . . and (rough estimates of) the number of German words. [25, p. 46] The highest ratio, 6.4, is obtained for the translation of Landau's Chapter 4, in which the real numbers are constructed from the positive real numbers (in the form of Dedekind cuts, which had been defined in Chapter 3). But this is a book that devotes 173 pages to the development of complex number arithmetic from logic. It includes a full calculation to prove that the complex conjugate distributes over addition. The proof that a/b = c/d ⇐⇒ ad = bc includes two references to previous theorems about complex arithmetic. One could imagine covering the same material in 10% of the space.

Formal proofs can also be more or less compact, the price of compactness generally being a loss of legibility. Wiedijk [45] has proposed to deal with some of the arbitrariness by comparing the sizes of compressed text files, but this requires retyping possibly lengthy texts into L A T E X. He and others report de Bruijn factors in the range of 3-6, but for the Nash-Williams proof above it is 20 and upwards (crudely counting lines rather than symbols). Clearly one reason is that the source text is highly concise, only sketching out the key points. One of the lemmas (the strengthening of Todorčević's 1.19) is not even stated. But also, our proof style is more prolix than is strictly necessary.

Larson's proof [28, pp. 133-140] of Theorem 3.1 is an intricate tour de force and the formal proof development is almost 4600 lines long. Here we can only cover its main features, focusing on a few typical constructions and some particular technical issues. We also take a brief look at a few of the formal theorem statements. Our objective is simply to highlight aspects of the formalisation task and the strengths and weaknesses of today's formal verification tools.

Much of the task of formalisation is simply labour: translating the definitions and arguments of the mathematical exposition into a formal language and generating proofs using the available automated methods. Particular difficulties arise when the exposition appeals to intuition or presents a construction. In the case of a construction there are two further sources of difficulties: when properties of the construction are claimed without further argument (as if the construction itself were sufficient proof), or worse, when proofs later in the exposition depend on properties of the construction that are never even stated explicitly.

An example of appeal to intuition is when Larson constructs the set W ′ and remarks that 'without loss of generality'-a chilling phrase to the formaliser-we can regard it as the same as W . This turns out to mean that the bulk of the argument will be carried on using W , which is simply the set of increasing sequences of integers. It is presumed obvious to the reader that none of the main lemmas could possibly be proved using W ′ , about which little can be known. W ′ is introduced towards the end, and the necessary adaptations to the main proof aren't that hard to figure out. But a little hint would have been helpful.

For an example of properties claimed without argument, consider the notation i({x, y}) from Definition 3.3, suggesting that i({x, y}) is uniquely determined by x and y. And so it turns out to be, though the formal version makes explicit its dependence on the 'form' of {x, y}, namely l. That i l ({x, y}) is well-defined is perhaps obvious, since the decomposition of x and y into concatenations of sequences turns out to be unique, but the formal proof of this 'obvious' fact is an elaborate induction, around 200 lines long. The function is also injective, which is never claimed explicitly but required for the proof of Lemma 3.6, which defines a function g k by g k (i({x, y})) = g k ({x, y}). Proving this additional fact requires another substantial formal proof (100 lines).

For another example of the difficulty of formalisation, consider the following construction, typical of this problem domain. We are given a positive It turns out that if l = 2k − 1 then we can put

The reversal noted above turns out to be crucial to the second case, where x is the concatenation of k + 1 segments while y is the concatenation of only k: the last segment of y has the form a i k * a i k+1 , and this meets all the requirements of Definition 3.3 above.

The formalisation of such a construction amounts to writing a tiny but delicately crafted computer program. 6 Significant effort is needed to prove fairly obvious properties of this construction, such as that the a i j are nonempty, or that a i ′ j ′ < a i j if j ′ < j ≤ k and i ′ , i < m. These are immediate by construction since elements are drawn from the set N in increasing order. But the formal proofs require elaborate inductions.

A similar situation arises with Larson's Lemma 3.8 [28, p. 139 ]. She defines three collections of sequences:

The ordering constraints guarantee that the set {a j * b(1, j, k 1 ) * b(2, j, k 2 ) * · · · * b(j, j, k j ) : j < k 1 < k 2 < · · · < k j < ω} has order type ω j , and the union over j of these has order type ω ω . These order type claims seem plausible, but the text contains no hint of how to prove them. Relevant is that the construction ensures that for j and k with 1 ≤ j ≤ i ≤ k, the sequence b(i, j, k) has length d j i − d j i−1 and is therefore independent of k. For the ω j claim, an induction on j seems to be indicated; our formal proof is nearly 300 lines long, including 200 lines of auxiliary definitions and lemmas, yet a much shorter proof may exist.

There is another challenge at the very end of Larson's lemma. We have x and y such that

Here we know that a j < a r , hence j < r and this guarantees disjointness of all the segments shown. Now Larson [28, p. 140 ] remarks, 'for some l < 2j, {x, y} has form l.' Referring to the definition of form, this again is clear: the sequences for x and y need to be interleaved in strict order, which may force adjacent sequences in the expression above to be concatenated. The form can be as small as 1, if x < y, when all the sequences get concatenated; it can be as high as 2j + 1 if no concatenations occur for x, as in this example: 7 a j < a r < b(1, j, k 1 ) < b(1, r, p 1 ) < b(2, j, k 2 ) < · · · < b(j, r, p j ) * · · · * b(r, r, p r ).

To ask for a justification of this obvious claim would be unreasonable. And yet to formalise the process of examining the interleavings of these sequences and arranging them so as to satisfy the conditions of Definition 3.3-so that those conditions can be proved -turned out to require weeks of work.

It's finally time to look at the formalisation itself. We do not present actual proofs-they are long and not especially intelligible-nor even all of the numerous definitions and technical lemmas required for the formalisation.

We begin with something simple: the set W , which is written WW and is the set of all strictly sorted (increasing) sequences of natural numbers. In Isabelle, nat list is the type of such sequences. We do not work with W ′ except in the body of the main theorem.

definition WW :: "nat list set"

where "WW ≡ {l. strict_sorted l}"

Next comes the notion of an interaction scheme, Def. 3.3 above. We need to build up to this. The function acc_lengths formalises the accumulation of the lengths of lists for the variables c and d there. We write length l for |l|, and x#l is the list consisting of l prefixed with x as its first element. The integer argument acc is a necessary generalisation for the sake of the recursion but will normally be zero. The built-in function concat joins a list of lists. But we also need a function to concatenate two lists of lists, interleaving corresponding elements:

where "interact [] ys = concat ys" | "interact xs [] = concat xs" | "interact (x#xs) (y#ys) = x @ y @ interact xs ys"

We are finally ready to define interaction schemes. We split up the long list of conditions in Def. 3.3 as two inductive definitions, although there is no actual induction: this form of definition works well when there are additional variables and conditions, which would otherwise have to be expressed by a big existentially quantified conjunction. Here xs and ys are the x and y of the definition, ka and kb are the lengths of the a-lists and b-lists, and zs is the interaction scheme. The following definition allows us to write Form l U to express that U has form l. Even and odd forms are treated differently; if l = 2k for k > 0, the numeric parameters of Form_Body are k + 1 and k. A naive treatment of the definition would force a lot of proofs to be written out twice. The two cases, while not identical, are similar enough to be treated uniformly in terms of the more general Form_Body ka kb.

inductive Form :: "[nat, nat list set] ⇒ bool"

where "Form 0 {xs,ys}" if "length xs = length ys" "xs = ys" | "Form (2*k-1) {xs,ys}" if "Form_Body k k xs ys zs" "k>0" | "Form (2*k) {xs,ys}" if "Form_Body (Suc k) k xs ys zs" "k>0"

Finally we can define the interaction scheme itself, writing inter_scheme k U for i k (U). Hilbert's epsilon (@zs ) returns some zs (actually it's unique) such that definition inter_scheme :: "nat ⇒ nat list set ⇒ nat list" where "inter_scheme l U ≡ @zs. ∃ k xs ys. l > 0 ∧ (l = 2*k-1 ∧ U = {xs,ys} ∧ Form_Body k k xs ys zs ∨ l = 2*k ∧ U = {xs,ys} ∧ Form_Body (Suc k) k xs ys zs)"

It turns out to be injective in the following sense, for two sets U and U ′ that have the same form l > 0. The proof is a painstaking reversal of the steps shown in Def. 3.3 and is about 50 lines long.

proposition inter_scheme_injective: assumes "Form l U" "Form l U'" "l > 0" "inter_scheme l U' = inter_scheme l U" shows "U' = U"

A considerable effort is needed to show that the interaction scheme is defined uniquely. The following lemma takes a long set of assumptions derived from the possibility of two distinct interaction schemes and shows that the a-lists and b-lists necessarily coincide. The proof is by induction on the length of as.

proposition interaction_scheme_unique_aux:

assumes "concat as = concat as'" "concat bs = concat bs'" and "as ∈ lists (

and "strict_sorted (interact as bs)"

and "length bs ≤ length as" "length as ≤ Suc (length bs)" and "as'

and "strict_sorted (interact as' bs')" and "length bs' ≤ length as'" "length as' ≤Suc (length bs')" and "length as = length as'" "length bs = length bs'" shows "as = as' ∧ bs = bs'"

It is now fairly straightforward to show that the first four arguments of Form_Body determine the fifth, which is the interaction scheme. The inequality kb ≤ ka ≤ kb + 1 eliminates the need to treat the cases of even and odd forms separately.

proposition Form_Body_unique:

assumes "Form_Body ka kb xs ys zs" "Form_Body ka kb xs ys zs'" and "kb ≤ ka" "ka ≤ Suc kb" shows "zs' = zs"

And so we find that the interaction scheme is uniquely defined for every valid instance of the predicate Form_Body. The full proof of this more-or-less obvious statement, for which Larson [28] gives no justification, is longer than 240 lines.

lemma Form_Body_imp_inter_scheme:

assumes "Form_Body ka kb xs ys zs" "0<kb" "kb ≤ka" "ka ≤ Suc kb" shows "zs = inter_scheme ((ka+kb) -1) {xs,ys}"

The material presented thus far (plus much else not presented) serves to make sense of Larson's definitions. Now we turn to the results that make up the proof of her main result, which we sketched in Sect. 3.2 above. But first, please note that our [A] k notation for the set of all k-element subsets of A is available in Isabelle: [A] k . Further notational conventions: N always denotes an infinite set of natural numbers, and n k refers to the k th element of N, so N = {n k : k < ω}. The formula [n k ] < i(U) ⊆ N denotes a conjunction: that n k is less than the first element of i(U) and that every element of i(U) belongs to N.

The first of Larson's technical lemmas [28, Lemma 3.11] states that the set of interaction schemes {i l (U) | U has form l} is thin for l > 0. This set is formalised using the image operator (' ). The proof involves deriving a contradiction from the existence of U and U ′ with distinct interaction schemes, one an initial segment of another. As with previous results, the proof involves breaking things down according to Def. 3.3, fairly straightforwardly (under 75 lines). The next step in Larson's development [28, Lemma 3.6] is proved in 150 lines from the original 11-line text. It uses the Nash-Williams partition theorem to obtain an infinite set N and a sequence j k such that if k > 0, U has form k and [n k ] < i(U) ⊆ N, then g(U) = j k . Larson next proves that for every infinite set N and every m, l < ω with l > 0, there is an m element set M such that M ⊆ W (necessary but omitted in the text) and for every {x, y} ⊆ M, {x, y} has form l and i({x, y}) ⊆ N [28, Lemma 3.7] . In the proof of the main theorem, the N above is derived from the one obtained from the previous lemma. In the Isabelle version, M ∈ [WW] m expresses that M is an m element set and M ⊆ W . The formalisation of this one-page proof takes nearly 900 lines, including some preparatory lemmas. About 240 of those lines are devoted to establishing the basic properties of the sequences d 1 , . . . , d m and a 1 1 , a 1 2 , . . . , a 1 k+1 , . . ., a m 1 , . . . a m k+1 outlined in Sect. 7.1 above. About 130 lines were devoted to the degenerate cases l = 1 and l = 2, which needed to be treated separately. Larson's next result states that for every infinite set N, there is a set X ⊆ W of order type ω ω such that for any {x, y} ⊆ X, there is an l such that {x, y} has form l and if l > 0 then [n l ] < i({x, y}) ⊆ N. Her proof is slightly longer than a page and the full formalisation is about 1700 lines long. Of this, approximately 400 concern the construction and properties of the sequences, with a further 400 for the order type calculation and about 600 lines to formalise the last paragraph of the proof (nine lines of text). Recall that [X] 2 denotes the set of all two element subsets of X.

proposition lemma_3_8:

assumes "infinite N" obtains X where "X ⊆ WW" "ordertype X (lenlex less_than) = ω↑ω" and " U.

The lemmas described above constitute the main body of Larson's development. Building on them, the main theorem can be formally proved with just 360 lines of code. The formulation below-in terms of the lexicographic ordering on W -trivially leads to the standard formulation in terms of the ordinal ω ω . The proof begins by assuming a partition f of the set [W ] 2 . The construction of the set W ′ and the work necessary to effectively identify the sets W and W ′ takes up about 200 lines. The result of this construction is a partition f ′ of [W ] 2 that assigns the colour 0 to all pairs of sequences in W of the same length: a condition necessary to make the proof go through. theorem partition_ ωω_aux:

assumes " α ∈ elts ω" shows "partn_lst (lenlex less_than) WW [ ω↑ω, α] 2"

The most frustrating aspect of formalisation is the need to spell out the proofs of obvious statements. We could not escape this phenomenon, and discuss a few examples below, some of them positive. The function interact, defined above, concatenates alternating elements of two lists. A key property is that if the two lists satisfy the constraints given in the definition of a form (Def. 3.3), then the result of interact will be strictly ordered (and therefore in W ). The proof is a messy induction and we would like to state the theorem in the simplest possible way.

Fortunately, Isabelle/HOL provides counterexample finding tools: Nitpick [2] and Quickcheck [3] . Their purpose is to identify invalid conjectures before any time is wasted in proof attempts. Nitpick works by abstracting the conjecture to a propositional formula and attempting to find a model with the help of a satisfiability checker, while Quickcheck simply tries to evaluate the conjecture at intelligently chosen values. Both need the conjecture to be computational, in a broad sense. As much of our work here is concerned with finite sets or sequences of integers, these tools can be effective.

We were able to use Nitpick to formulate this theorem correctly, including nuances such as Suc n < length xs, while omitting irrelevant conditions. lemma strict_sorted_interact_I:

assumes "length ys ≤ length xs" "length xs ≤ Suc (length ys)" " x. x ∈ list.set xs =⇒ strict_sorted x" " y. y ∈ list.set ys =⇒ strict_sorted y" " n. n < length ys =⇒ xs!n < ys!n" " n. Suc n < length xs =⇒ ys!n < xs!Suc n" "xs ∈ lists (

As mentioned above, Larson's Lemma 3.8 [28, p. 139 ] constructs sequences satisfying the conditions of a form, for a given pair {x, y}. As part of this process, families of sequences are generated, each consisting of short sequences that yield x and y when concatenated together. We have to check that any two of these can be transformed to satisfy those conditions and obtain an upper bound for the form. The transformation involves concatenating adjacent short sequences to form longer sequences that constitute the a 1 , a 2 , . . ., a k (a k+1 ) and b 1 , b 2 , . . ., b k of Def. 3.3. Carrying out this apparently routine task requires elaborate thought.

Our first attempt involved the following function, which took as arguments a sequence of sequences coupled with a set B, which in practice would contain the elements of the opposite sequence. The idea was to concatenate consecutive subsequences unless some element of B separated them. Nearly all the necessary properties could be proved easily, but there seemed to be no way to show that the resulting interaction scheme (obtained by applying coalesce to both sequences of sequences) was correctly ordered. Thanks to counterexample-finding, many conjectures could be rejected without attempting any proof. Therefore coalesce was abandoned in favour of the following predicate, which deals with both sequences of sequences simultaneously, considering each of them cut into two parts (the arguments as1@as2 and bs1@bs2 represent arbitrary cut points for them both). Then the two leading parts, as1 and bs1, are concatenated provided all the ordering properties are satisfied. Here concat concatenates a sequence of sequences to form one sequence:

concat [ [1, 2] , [3] , [4, 5] ] = [1, 2, 3, 4, 5] . The conditions concat as1 < concat bs1 and concat bs1 < concat as2 ensure that the elements of concat bs1 lie between the two halves of the first sequence. Thus, just enough of the a-sequence is taken so that it lies before the start of the b-sequence, from which just enough elements are taken to allow the a-sequence to resume. This formulation avoids any direct expression of iteration or computation (the root of the problems with coalesce ) in favour of writing the a and b-sequences as each divided at an arbitrary point, the rule applying only subject to the ordering constraints shown.

With this approach, most of the required properties are shown easily enough. The most difficult is the following statement, which expresses that any two sequences-subject to certain conditions-can be successfully merged. Again, counterexample checking was crucial to find the simplest formulation of the necessary conditions. The proof is by induction on the sums of the lengths of as and bs.

proposition merge_exists:

assumes "strict_sorted (concat as)" "strict_sorted (concat bs)" "as ∈ lists ( 

The formalisation of Larson's proof of ω ω −→ (ω ω , m) took approximately six months. This includes a month and a half spent formalising Erdős-Milner [12] and half a month proving the Nash-Williams partition theorem. Her Lemma 3.8 required two months, 11 days of which were devoted to the order type proof mentioned in Sect. 7.1 above. The remaining two months were devoted to Lemma 3.7 and the main theorem. Due to COVID-19, most of the work was undertaken at home, not the best environment for doing mathematics.

Having looked at Larson's work in excruciating detail for months, we can only be impressed by the intricacy, delicacy and fragility of her constructions and wonder how she kept so many details in mind. She deserves her reputation of being careful and clear. Although formalisation efforts regularly identify flaws in mathematical exposition, we found no serious errors in hers. Her narrative proof is seven pages long [28, p. 133-140] and the corresponding formalisation is some 4600 lines, not counting prerequisites such as Nash-Williams and Erdős-Milner, which are proved elsewhere. Estimating 30 lines per page, this suggests a de Bruijn factor of roughly 23.

Our work shows that ordinal partition theory is clearly formalisable within Isabelle/HOL augmented with a straightforward axiomatisation of ZFC. We have found no serious errors in the original mathematical material, and although we struggled in some places it is quite hard to fault Larson's exposition [28] beyond noting that a few hints here and there could have saved us quite a bit of effort. The reader of a mathematical proof is expected to invest much thought.

As usual, a concern is the disproportionate effort needed to prove some simple observations. The inductive constructions of sequences that appear in Larson's Lemmas 3.7 and 3.8 [28] must surely be regarded as straightforward and yet we struggled to find the right language in which to express them and derive their obvious properties. The same can be said of the order type calculation in 3.8. It's also unfortunate that the degenerate cases l = 1 and l = 2 in 3.7 required so much work. However, given the inherent complexity of the subject matter, it's reassuring to know that the entire development has been checked formally, with a proof text [36] that is available for inspection or automated analysis.

This case study also demonstrates the diversity of mathematical topics that can be formalised in Isabelle/HOL: we have formalised material that is light years away from the analysis, topology and algebra formalised by other authors (as mentioned in the introduction).

Ordinal partition relations seem to be at the same time formalisable in Isabelle/HOL and at a point of their mathematical development where human advances seem rare and not forthcoming. None of the high power techniques of set theory and model theory such as large cardinals, forcing, pcf or elementary submodes seem to be relevant. Therefore, we hope that some advances in this subject might be obtained by automatisation. However, we remain with a humble conviction that doing enough preparatory work with Isabelle to be able to produce such results will require a considerable intellectual effort.

A formally verified proof of the central limit theorem

Nitpick: A counterexample generator for higher-order logic based on a relational model finder

The new Quickcheck for Isabelle

Well, better and in-between

A partition theorem for the complete graph on ω ω

A formulation of the simple theory of types

Set Theory and the Continuum Hypothesis

Fast Track to Forcing

Some problems on finite and infinite graphs

A partition calculus in set theory

Combinatorial Set Theory: Partition Relations for Cardinals

A theorem in the partition calculus

A theorem in the partition calculus corrigendum

Pinning countable ordinals. Fundamenta Fundamenta Mathematicae

Borel sets and Ramsey's theorem

The four colour theorem: Engineering of a formal proof

A machine-checked proof of the odd order theorem

The use of machines to assist in rigorous proof'. Philosophical Transactions of the

Partition relations

The Jordan curve theorem, formally and informally

Floating point verification in HOL Light: the exponential function

Formalizing an analytic proof of the prime number theorem

Kruskal's theorem and Nash-Williams theory, after Wilfrid Hodges. version 3.6 online at www

Three chapters of measure theory in Isabelle/HOL

Checking Landau's "Grundlagen" in the AU-TOMATH System

of Studies in Logic and the Foundations of Mathematics

Analyzing Nash-Williams' partition theorem by means of ordinal types

A short proof of a partition theorem for the ordinal ω ω

On Fraïssé's order type conjecture

On the logical strength of Nash-Williams' theorem on transfinite sequences

On well-quasi-ordering transfinite sequences

Pentium FDIV flaw

Isabelle/HOL: A Proof Assistant for Higher-Order Logic

Zermelo Fraenkel set theory in higher-order logic. Archive of Formal Proofs

The Nash-Williams partition theorem. Archive of Formal Proofs

Ordinal partitions. Archive of Formal Proofs

From LCF to Isabelle/HOL. Formal Aspects of Computing

On a Problem of Formal Logic

Countable partition ordinals

Countable partition ordinals

Teilmengen von Mengen mit Relationen

Partitioning pairs of countable ordinals

Partition Problems in Topology

Introduction to Ramsey spaces

The De Bruijn factor

Acknowledgements. Angeliki Koutsoukou-Argyraki and Lawrence C. Paulson thank ERC for their support through the ERC Advanced Grant ALEXAN-DRIA (Project GA 742178). Mirna Džamonja's research was supported by the GAČR project EXPRO 20-31529X and RVO: 67985840. All three authors thank the London Mathematical Society for support through their Grant SC7-1920-11. Thanks to Stevo Todorčević for advice.

1 be the first d i+1 1 elements of N greater than d i+1 k+1 . This defines d 1 , d 2 , . . . , d m , a 1 1 , a 2 1 . . . , a m 1 . Let the rest of the sequences be defined in the order that follows, so that for any i and j, a i j is the sequence of the least (d i j − d i j−1 ) elements of N all of which are larger than the largest element of the sequence previously defined: This is a fairly intricate construction, particularly in the reversal at the end: a m−1 k , a m k , a m k+1 , a m−1 k+1 . The point is to achieve the conclusion of Larson's Lemma 3.7, namely that if m and l are natural numbers with l > 0,