Abstract
Computational logic is the use of computers to establish facts in a logical formalism. Originating in nineteenth century attempts to understand the nature of mathematical reasoning, the subject now comprises a wide variety of formalisms, techniques and technologies. One strand of work follows the ‘logic for computable functions (LCF) approach’ pioneered by Robin Milner, where proofs can be constructed interactively or with the help of users’ code (which does not compromise correctness). A refinement of LCF, called Isabelle, retains these advantages while providing flexibility in the choice of logical formalism and much stronger automation. The main application of these techniques has been to prove the correctness of hardware and software systems, but increasingly researchers have been applying them to mathematics itself.
1. Introduction
Computers control everything from heart pacemakers to aircraft, putting everyone at risk when they fail. Computer-related failures regularly cause disruption, damage and occasionally death. A recent example is the WannaCry ransomware attack, which hit computers around the world, notably in the National Health Service in the UK [1]. This attack was made possible by a specific defect, MS17-010, affecting several versions of the Windows operating system. In 1994, a floating-point division error in the Pentium processor forced Intel to recall millions of chips at a cost of $475 million [2]. Many such failures are caused by carelessness and could be prevented by the introduction of more disciplined development methods. The ultimate discipline is to use formal logic and mathematics to prove the correctness of designs. This is called formal verification.
Formal verification typically relates a formal specification to a model implementation. That is, we start with a complete description of all services to be delivered by the system (the specification) and prove that it is satisfied by the model implementation, where both have been expressed in some logical formalism. A proof that all services will be delivered includes a proof that no crashes or other failures will prevent this. However, any formalized implementation must leave out some details of the actual implementation: for example, finite-precision arithmetic may be assumed to be exact, encryption to be unbreakable or hardware components to be infinitely fast. (Readers may remember schoolbook physics problems involving frictionless surfaces, or, when friction is included, it is assumed to be linear.) If the model of the implementation is too unrealistic, proofs about it will overlook certain failure modes. No model can include all real-world details, but it is crucial to include the aspects of the implementation that we are most concerned about. For example, in computer security, many systems can be broken without defeating the encryption itself [3]. Then it makes sense to assume encryption to be unbreakable if that simplifies the proofs. Validating the model is complementary to the formal proofs and could involve a variety of methods, ranging from professional judgement to systematic testing. (It is a misconception to think that formal proof eliminates the need for testing.)
Formal verification requires the support of specialized computer software. Computational logic is concerned with formal logic as implemented on computers, where it is frequently applied to solve problems related to computation. However, formal verification is increasingly being applied to systems that interact with the real world and deal with phenomena modelled by differential equations, or that have to achieve goals expressed in terms of probability. Therefore, it has become necessary to formalize mathematical knowledge.
2. A brief history of formal logic
A logical formalism is a precisely defined symbolic language that includes logical primitives such as ‘and’ (∧), ‘or’ (∨), ‘not’ (¬), ‘implies’ (→), ‘for all’ (∀) and ‘there exists’ (∃). It may include the ability to define new symbols as abbreviations for other expressions. It will include rules for making simple logical deductions: for example, from ϕ∧ψ, the conclusions ϕ and ψ can both be derived; from ϕ we can derive ϕ∨ψ. From such basic primitives, vast swathes of mathematics can be derived.
Formal logic emerged from philosophy in the nineteenth century. Gottlob Frege’s Begriffschrift (concept language) was the first recognizably modern treatment of logic. His work introduced the project of reducing the whole of mathematics to logic [4,5]. But Frege’s work ended with the discovery of Russell’s paradox:
Let R be the set of all sets that are not members of themselves. Then is R a member of itself?
If R is a member of itself (written R∈R) then it cannot be a member of R, and therefore R∉R; on the other hand, if R∉R then R∈R. We get a contradiction either way. Note that, as long as we confine ourselves to English, Russell’s paradox may resemble amusing puzzles such as ‘This statement is false’, but once we use a formal language with strict rules we are in trouble. For R∉R and R∈R are both true, from which all other statements can be proved using basic logic. There are many other paradoxes [6, p. 60], e.g. Burali-Forti, which involves constructing the greatest ordinal number Ω, but then Ω+1<Ω. As Gödel remarks,
By analysing the paradoxes to which Cantor’s set theory had led, [Russell] freed them from all mathematical technicalities, thus bringing to light the amazing fact that our logical intuitions (i.e. intuitions concerning such notions as truth, concept, being, class) are self-contradictory. [5, p. 452]
From that moment, the development of formal logic was determined by the approach taken to solving the paradoxes. Set comprehension—the idea that every property ϕ created a set, {x|ϕ(x)}—somehow had to be constrained to deny the existence of {x|x∉x}, the Russell set. One approach was to restrict comprehension to some already existing set, say A, yielding the smaller set {x∈A|ϕ(x)}: the elements of A satisfying the property ϕ. The universe of sets is not itself a set, and it is built up from the empty set using a few primitives such as union and powerset. This approach leads to modern axiomatic set theory [7,8].
Another solution to the paradoxes involves the notion of types. Whitehead and Russell’s original conception of type [6, p. 37] was rather unclear [5], but by 1940 Church [9] had developed types to a form that survives today. Rather than postulating one universal collection of everything that exists, mathematical objects are classified into distinct types, representing smaller collections. Every construction has a fixed type. When forming a set, all the elements must have the same type, say τ, and the set itself will have a different type, which we may write as τ set. Thus the fatal proposition R∈R cannot even be written, because if R has type τ then it cannot have type τ set at the same time.
Types seem to be unpopular with mathematicians, and it is usual to assert that mathematics is founded on set theory. Whitehead & Russell [6] provided no notation or symbols for types, making them invisible, while Church [9] had one single type ι of individuals encompassing all objects that were neither sets nor functions. However, Church’s ideas entered into computer science, particularly through his collaboration with Alan Turing, and by the early 1960s the logical notion of type was starting to get conflated with programming language features (previously called ‘modes’) that distinguish between integer and floating-point arithmetic, for example. The ubiquity and utility of types in computer programming may be one reason why computer scientists tend to prefer elaborately typed formalisms.
Yet another solution to the paradoxes was the philosophy of constructive mathematics or intuitionism [10]. Briefly, this is the view that mathematical objects exist only in our imaginations, in contrast to the Platonist or realist view that mathematical objects have their own reality [11, p. 323]. Intuitionism demanded a new treatment of logic, in which the disjunction ψ∨ϕ could be true only if we knew which of the two statements, ψ or ϕ, was true. Many of the basic tautologies of Boolean logic must then be rejected, starting with ψ∨¬ψ (as we have no general way of knowing which alternative holds) and with ¬¬ϕ a weaker statement than ϕ. At the level of quantifiers, ∃x ϕ(x) could be true only if we knew some specific value a for which ϕ(a) held. For that reason, ∃x ϕ(x) was a stronger claim than ¬[∀x ¬ϕ(x)], although in classical logic the two formulae are equivalent.
Intuitionism has not caught on with mainstream mathematicians. However, there are strong links between intuitionistic logic and computation: the knowledge mentioned in the previous paragraph had to be computable. Recent decades have seen the introduction of formalisms [12,13] that identify intuitionistic formulae with elaborate types: every logical symbol has an equivalent at the type level, and to prove a proposition amounts to exhibiting an element of the corresponding type. These formalisms are highly expressive and, typically, all expressions denote computable functions. And there is a direct identification between logical propositions and types [14]. For example, proving the formula (∀x∈A)(∃y∈B) ϕ(x,y) yields a computable function f such that if x∈A (that is, x belongs to type A) then f(x) returns some y∈B paired with some sort of ‘evidence’ affirming that the property ϕ(x,y) holds.1 Then we would have shown that f had the type , which we can identify with the formula above. Here, and are the type-level counterparts of the quantifiers ∀ and ∃. The elements of are functions f that map any given a∈A to some f(a)∈B(a), generalizing the function type A→B by allowing the result type to depend on the value of the argument; similarly, the elements of are pairs 〈a,b〉 where a∈A and b∈B(a), generalizing the Cartesian product A×B. This ‘propositions as types’ paradigm is the focus of much current research; see Wadler [15] for a detailed historical introduction.
Formal verification today can, therefore, be done within a wide variety of logical formalisms, which include
(i) those based on axiomatic set theory, and with no concept of type;
(ii) the simple type theory of Church, where a few primitive types can be combined using operators such as × (Cartesian product), + (disjoint sum), → (function space) and the set-type constructor mentioned above;
(iii) dependent type theories where types can take parameters, as in B(x), and with variable binding as in , thereby allowing the type of n×n matrices, for example.
The typed approaches have received the most attention in computational logic, and below we shall focus on simple type theory. The third approach is also the focus of much research.
There are many other formalisms (e.g. modal and relevance logics) and computational logic techniques [16] not covered below. In particular, model checking techniques [17] are widely used. They can automatically verify system properties expressed in temporal logics (for reasoning about time). However, they rely on state enumeration rather than proof construction, so are out of scope here.
3. Mechanized logic: the LCF tradition
If we want to prove theorems in a formal logic, computer power is necessary. Whitehead and Russell’s development of basic logic, classes and relations only reaches (a precursor to!) 1+1=2 after hundreds of pages [6, p. 360]. Mathias [18] has pointed out that the representation of the number 1 as defined by Bourbaki expands to 4 523 659 424 929 symbols. Better choices can mitigate such issues, but the fact remains that formal proofs are extremely long and detailed.
Ideally, we would like the computer to prove our theorems automatically. If that is impossible, then we would like to give only strategic advice, and let the machine work out the details. As a general rule, greater automation requires the choice of a less expressive formalism. Boolean logic (where we only have and, or, not) is decidable, so, in principle, all questions can be settled automatically. The problem is known as SAT (for Boolean SATisfiability) and although a general solution is thought to be exponential in the number of Boolean variables, highly efficient heuristic approaches exist. In a dramatic demonstration of this technology, Heule & Kullmann [19] solved the Boolean Pythagorean triples problem, an open problem in Ramsey theory, generating a 200 TB proof.
Our logic becomes more expressive if we introduce the quantifiers ‘for all’ and ‘there exists’. If we forbid quantification over functions and sets, this is called first-order logic. This theory is complete: all theorems can be proved, given unlimited resources. But there is no general way to determine how much effort might be required or whether the claim is false. Automatic theorem proving in first-order logic is a significant field in its own right; one of its main achievements, though 20 years ago, is the solution of the Robbins conjecture [20]. Incidentally, two early titans of computational logic—Hilary Putnam [21] and Alan Robinson [22]—were trained as philosophers, and Putnam is chiefly remembered for his essays.
Unfortunately, fully automatic theorem proving cannot meet the needs of verification. Arithmetic, functions and sets are necessary for specifying even simple systems. No automatic procedure can decide arbitrary statements involving integer arithmetic: this is Hilbert’s famous Entscheidungsproblem, which was settled by Turing [23]. Another reason is that specifications may extend to hundreds of pages, and fully automatic theorem proving does not scale well. Interactive methods are the only feasible way to prove theorems based on huge specifications.2 And the name most associated with interactive theorem proving is Robin Milner (FRS 1988).
Milner was interested in an obscure formalism called logic for computable functions (LCF), and built a simple proof checker while at Stanford University, CA, USA. It had two major drawbacks: proofs often required long sequences of repetitive, obvious steps and the proofs themselves took up too much memory. Upon taking a position at the University of Edinburgh, UK, he created an ambitious new system called Edinburgh LCF [24], incorporating several innovations. One was to provide the user with a specialized programming language, called ML (for metalanguage), so that users could extend the system with their own code. That would enable them to deal with any issue of repetitive steps by coding their own automation. Another was to eliminate the storage of proofs through a software technique known as abstract data types. By reserving the privilege of creating theorems to a small kernel of code, the need to store proofs was eliminated. User-supplied automation, however faulty, could never cause Edinburgh LCF to proclaim an invalid theorem. Edinburgh LCF also introduced an entire framework of techniques and terminology:
— theory files containing specifications and definitions of the concepts of interest: theories can build on other theories, allowing hierarchical developments of any size;
— backward proof: one begins with a statement of the desired result and successively reduces it (using proof tactics) to simpler statements until all have been proved;
— tacticals for combining proof tactics in simple ways, for example repetition or executing a series of tactics;
— built in automation, for example to perform routine simplification;
— along with the ability to extend this automation by writing additional ML code.
Today, almost all of the main interactive theorem provers incorporate these techniques, and, in many cases, actual code from Edinburgh LCF. ML turned out to be useful for programming, in general, and has had a huge impact on the design of modern programming languages such as Haskell, OCaml and Scala. The meanings of LCF and ML go well beyond their literal meanings as acronyms.3
Another significant figure in interactive theorem proving was Michael J.C. Gordon (FRS 1994). He created the HOL system [25], which survives to this day in several variants, including HOL4 and HOL Light [26]. All of these are descendants of Edinburgh LCF, which he also helped to create [24]. Even more significant is Gordon’s development of techniques for verifying computer hardware; when he started working on this topic in the early 1980s, essentially all verification research was focused on software. Gordon developed techniques that scaled from a single-transistor device to an entire computer. His key insight was that the right formalism for this task was simple type theory, also known as higher order logic, hence HOL.
The choice of HOL was radical at the time. First-order logic was dominant, thanks perhaps to its strong theoretical properties such as completeness. Gordon noted that such properties were irrelevant to verification, and recognized that the additional expressiveness of HOL was necessary [27]. His choice represented a return to the stronger logics of the early twentieth century. With his students, Gordon extended his techniques to cover a variety of digital systems, including floating-point hardware, probabilistic algorithms and many other applications. Typical of these is Harrison’s [28] verification of a floating-point algorithm for the computation of the exponential function. The proof required a complete formalization of the Institute of Electrical and Electronics Engineers (IEEE) floating-point standard, in all its complexity, as well as the error analysis for the algorithm itself.
While on the topic of radical choices, we must not overlook research into constructive-type theories. Inspired originally by the work of Martin-Löf [14] and by Coquand & Huet [12], researchers built a series of interactive theorem provers, all following the Edinburgh LCF paradigm. For reasons of space, I will only mention Coq [29], which has become perhaps the most popular interactive theorem prover in the world. Typical of all these approaches is that, through dependent types, the type system carries out much of the reasoning burden that would otherwise be done by explicit proof steps. Among the landmark achievements carried out using Coq are the formal verification of a C compiler [30] and the formalization of the odd order theorem [31]. A full treatment of this line of work would require a separate article.
4. A new theorem-proving architecture: Isabelle
During the 1980s, many different logical formalisms were competing for researchers’ attention. In addition to LCF, type theories were evolving rapidly, along with a wide variety of formalisms for proving software correct. One drawback of the LCF approach was its requirement that every primitive inference rule had to be implemented as program code, and then a second time as a proof tactic. An error in the latter code could prevent the user from obtaining the theorem they thought they had proved, while an error in the former code could allow the system to assert false statements as true.
To understand this drawback in more detail, consider one of the simplest logical inference rules: if ϕ and ψ are theorems, then so is their conjunction, ϕ∧ψ. It is often written as
To implement this rule using the LCF architecture (which is also adopted by the HOL family and Coq), we begin by designing data structures that can represent the full syntax of our formalism. Then we collect all the inference rules that need to be implemented; taken together, they will define the abstract data type
Defining type
These subgoals are not theorems but merely statements to work on next. In order to yield a theorem at the end, the tactic also needs to return a piece of code involving
So we see that, to implement any formal calculus using the LCF architecture, every inference rule must be implemented in code twice. There could be dozens of rules, some of them complicated, and any error would be damaging. Modifying an LCF-style system to implement a new formalism requires months of work.
The solution to these problems is to use the LCF idea in a different way. Instead of focusing on the theorems of the given formalism, we should focus on contextual implications of the form Γ⇒ϕ. In particular, such implications can be joined end to end, creating proof trees. This approach has a number of advantages, as follows.
— Contexts are managed in the same way for every formalism being supported. Meanwhile, formalisms that manage the context explicitly can still be handled.
— The incremental build-up of a proof tree can express both forward and backward proof.
— Subgoals can even contain shared ‘placeholder’ variables, as may arise when proving something like ∃x (ϕ(x)∧ψ(x)). That is to say, we can try to prove something of the form ϕ(−) and if we manage to prove ϕ(f(1)), then the other subgoal will become ψ( f(1)).
But the main advantage is simply that inference rules can now be written as assertions. For example, the conjunction rule above becomes
Isabelle is the world’s first generic theorem prover: designed to support logical inference, in general, as opposed to any single formalism. Isabelle takes some ideas from HOL to represent formal syntax and logical inference rules [35]. Notational features such as variable binding (the i in ) and principles such as mathematical induction could be expressed straightforwardly, and conformance to traditional presentations could be proved [32]. Such meta-logical systems are known as logical frameworks [36], the first of which was the AUTOMATH language created by de Bruijn [37].
Another fundamental Isabelle principle is to provide the basic components of automatic proof search at the lowest level. The resolution techniques of first-order logic had already shown promise. It seemed right to adopt some of those techniques to provide automation for other formalisms. The joining together of implications, which is Isabelle’s core inference mechanism, is itself a form of resolution. Over the years, increasingly sophisticated forms of automatic proof search were added to Isabelle [38].
Have these ideas stood the test of time? Certainly, Isabelle is the only widely used interactive theorem prover to be based on a logical framework. Meanwhile, efforts to exploit the power of resolution theorem proving finally resulted in strong integration of the two technologies: the sledgehammer tool [39] applies a battery of resolution theorem provers to a problem. It often finds a proof with the help of obscure facts from Isabelle’s extensive libraries. Some of the most impressive work done using Isabelle—from the verification of security protocols [40] to a verified operating system kernel [41] to formalized mathematical theorems [42,43]—was only possible because of Isabelle’s automation.
On the other hand, the idea of generic theorem proving is less successful. It is true that Isabelle has been used to support a great many formalisms [35], including a version of Martin-Löf’s intuitionistic-type theory, first-order logic, set theory and some modal logics. Isabelle/ZF develops a substantial amount of Zermelo–Fraenkel set theory [44]. One can argue that, even in the case of a single formalism, the use of a logical framework makes for a more flexible notion of proof construction than that found in traditional LCF-like systems. However, Isabelle’s facility to support multiple formalisms is hardly used. Its most popular instantiation by far is Isabelle/HOL, which at bottom is yet another implementation of Church’s HOL [9]. Constructive-type theories tend to be demanding syntactically, so the research community has always preferred to build its own tools.
Isabelle’s core ideas were in place by 1989 [32]. Since then, Isabelle development has increasingly moved to the Technical University of Munich, Germany, under the leadership of Prof. Tobias Nipkow. Of the innumerable contributions made there, the most important one is the Isar structured proof language [45], which although completely formal includes many mathematical idioms and often allows legible, natural-looking proofs. (Hitherto, Isabelle followed LCF in expecting users to generate proofs directly through the ML programming language.) Extensions to Isabelle’s core ideas to support HOL are also due to Nipkow and his colleagues, in particular the notion of type classes [46]. The years have seen a long series of refinements made and case studies undertaken. One notable example is counterexample detection [47], whereby Isabelle can frequently inform the user that the statement under consideration is not a theorem.
5. Formalizing mathematics
The idea of performing mathematics by machine goes back at least as far as Leibniz, who invented a calculating machine. The first mathematician to properly realize this dream was de Bruijn [37], whose AUTOMATH project started in the 1960s and culminated in the formalization of a monograph on the construction of the reals [48]. De Bruijn’s approach was based on dependent-type theories of his own devising. At around the same time, Andrzej Trybulec had the idea of formalizing mathematics using a form of typed set theory; a substantial amount of mathematics was translated into his Mizar formalism, with a particularly elegant treatment of algebraic structures [49].
However, the main impetus for formalized mathematics has arisen from the requirements of verification, which has been a priority area for research funding. Harrison’s early work [28] to verify a floating-point algorithm for the exponential function naturally led to the formalization of increasing amounts of analysis. In order to verify probabilistic algorithms (which are controlled by virtual coin tosses), Hurd [50] formalized some probability theory, including measure theory and Lebesgue integration. Recent work in this direction includes Hölzl’s comprehensive formalization [51] of Markov chains, an abstract model of probabilistic systems that has numerous applications. Such work has generally been done using versions of HOL and Isabelle/HOL, but floating-point algorithms have also been verified using Coq [52].
Over time, competition arose between the proponents of different verification systems on the basis of how many well-known theorems had been formalized in each. Harrison [26] took the lead here, formalizing numerous results in his HOL Light system. These included much of complex analysis [53] as well as a proof of the prime number theorem [54] and many other results. Later, Harrison played a major role in the effort to formally verify Hales’s 1998 proof [55] of the Kepler conjecture, a 400-year-old problem concerning the optimal packing of spheres. Referees had objected to this proof because of its reliance on extensive computer calculations. In response, Hales et al. [56] launched the Flyspeck project to formally verify his proof. Flyspeck was ultimately successful, confirming and simplifying Hales’s argument [57]. Some of the formal proofs were done using Isabelle [58].
This was not the first time a mathematical result was formalized in order to confirm a contested proof. In 2004, Gonthier formalized the four colour theorem within Coq [59]. The 1976 proof by Appel and Haken was notorious for its use of a bespoke computer program to check nearly 2000 cases. Gonthier used a similar strategy, with a crucial difference: the cases were checked by algorithms that had been verified using Coq.
Other work of this sort concerned proofs that, while enjoying the full confidence of mathematicians, had other issues. My own formalization of Gödel’s second incompleteness theorem [43] falls into this category: most published proofs are sorely lacking in detail. Fleuriot’s formalization [60] of some material from Newton’s Principia is remarkable for embracing Newton’s use of infinitesimals: Fleuriot formally justifies the original proofs through non-standard analysis, which gives a rigorous foundation to infinitesimal reasoning. Above all, we have Gonthier et al.’s [31] monumental project to verify the odd order theorem; the issue here is simply size, as the theorem was originally proved in a 255-page journal article.
Generally speaking, the point of formalized mathematics is to clear up doubts, resolve ambiguities and identify errors. Mathematicians regularly make mistakes, as examined by Lakatos [61] in his history of Euler’s polyhedron formula; he points out that even definitions can be wrong. Famous modern mathematicians can also be fallible:
When the Germans were planning to publish Hilbert’s collected papers and to present him with a set on the occasion of one of his later birthdays, they realized that they could not publish the papers in their original versions because they were full of errors, some of them quite serious. Thereupon they hired a young unemployed mathematician, Olga Taussky-Todd, to go over Hilbert’s papers and correct all mistakes. Olga laboured for three years. [62, p. 201]
Increasingly, the impetus for formalizing mathematics comes from mathematicians themselves. Fields medallist Vladimir Voevodsky wrote of his horror at finding errors in his own work:
And who would ensure that I did not forget something and did not make a mistake, if even the mistakes in much more simple arguments take years to uncover? [63, p. 8]
Voevodsky’s homotopy-type theory [64] is a new approach to the formalization of mathematics. It gives Martin-Löf’s intuitionistic-type theory a new interpretation involving topological spaces, where types denote homotopy types.5 Moreover, this interpretation suggests the new univalence axiom, which says that isomorphic constructions can be identified. These ideas have generated widespread excitement, and experiments are underway using the Coq proof assistant.
In the opposite direction, axiomatic set theory has also been used to formalize mathematics. Sets and types both denote collections, with one great difference: typically a variable can have only one type but can belong to any number of sets. So set theory may offer the greatest flexibility in formalizing mathematics. Work of my own concerns proving a number of properties to be equivalent to the axiom of choice [65] as well as formalizing Gödel’s proof of the relative consistency of the axiom of choice [44]. This work can be criticized as being only about set theory itself. However, recent work by Zhan [66] shows that set theory can also be used to formalize traditional mathematical results from group theory, point-set topology and real analysis.
6. Obstacles to formalizing mathematics
The obstacles to the formalization of mathematics are perhaps not what people imagine. Gödel’s theorem is not especially relevant to mathematics in practice. Even the Entscheidungsproblem is not that relevant: when we do have decision procedures, they can be too expensive to actually run. Anyway, our ambition is to assist mathematicians, not to replace them by computers. The obstacles we face take a variety of forms.
(a) Foundational
A mathematician’s imagination is unconstrained, but every formal calculus is a rigid framework and ultimately a prison. For example, axiomatic set theory begins by creating a world big enough for nineteenth century mathematics, then escalates it through powerful principles to create an inconceivably large universe of mathematical objects. It contains such oddities as the limit of {ℵ0,ℵℵ0,ℵℵℵ0,…}, a cardinal λ satisfying the equation λ=ℵλ. (The ℵ-sequence enumerates the infinite cardinalities, starting with ℵ0: the cardinality of the integers.) Boolos [67] frankly states that he cannot believe that such an object exists. And yet, category theory [68] begins by creating the category of sets and functions, assuming at a stroke the existence of a mathematical object incorporating the entire universe of sets, and building upon it. So one moment our universe seems far too large, and a moment later it has become too small.
(b) Conceptual
Intuition plays a huge role in mathematics. A mathematician easily sees that a certain function—perhaps defined by a complicated integral—is continuous. Many convergence properties are obvious. But, in some cases, the formal proofs are enormous [54]. There are a variety of responses to this situation. One is to provide additional automation to deal with specific tasks, such as proving asymptotic properties [69]. Another is to accept that some apparently obvious facts, such as the Jordan curve theorem, which states that every simple closed curve partitions the plane into an interior and an exterior region, really are extremely difficult to prove rigorously [70]. Another is to accept that many apparently obvious statements are actually false [61]. But, for all that, the use of any logical formalism requires immense patience.
(c) Notational
The language of mathematics has evolved over centuries. It can be inconsistent and ambiguous—compare with y2x—but it is familiar and reasonably effective. Any formalism implemented on a computer will be code, and look like it. For example, the fundamental theorem of algebra states that if n is a positive integer and a0,a1,…,an are complex numbers with an≠0, then anzn+⋯+a1z+a0=0 for some complex z. The HOL Light version relaxes the precondition to a0=0; otherwise, the ak are not all zero for k=1,…,n:
The elegance of the mathematical notation is completely lost. Isabelle does a bit better, allowing the Greek alphabet and many mathematical symbols, but all such languages are still code.
One further complication deserves a mention. Many expressions of mathematics can be undefined, e.g. some limits, derivatives, integrals or simply x/y when y=0. A formal calculus can deal with this problem in a number of different ways [42, §5.1]. So we may find that x/y=x/y (when x/0 denotes some arbitrary value) or that x/y=x/y if and only if y≠0 (when we regard primitive assertions about undefined values as necessarily false) or we may find that the very act of writing x/y is forbidden unless we can prove y≠0. This last approach can be inconvenient [71, §4]; indeed, each approach has its good and bad points. The idea that x/y is always something horrifies some mathematicians (especially, if we go further, as in Isabelle, and actually stipulate that x/0=0). Elaborate treatments of definedness yield greater expressive power: paid for, as usual, by more complicated notations and harder proofs.
7. The way forward
In the course of this paper, we have narrowed our focus several times, deliberately seeking out research as opposed to applications. This is necessary because computational logic tools, interpreted broadly, are now widely used in the development of hardware, software and in the provision of online services. We are in danger, perhaps, of overlooking the main motivation for computational logic: proving the correctness of computer systems. So let us take a moment to consider a basic problem in computer science: compiler correctness.
A compiler is a piece of software that translates program code (written in a programming language such as C or Java) into some sort of binary form that can be executed on computer hardware. Compilers are complex, not least because they use a wide variety of sophisticated techniques to generate binary code that will run as fast as possible. If the compiler is somehow faulty then it could introduce errors into the code that is actually executed. People have sought to prove compilers correct using computational logic techniques [30]. But even if we finally obtain a correct compiler, how do we translate it into executable binary code? A further complication is that the semantics of a language like C has little in common with the way C is used in real software, which relies on innumerable undefined behaviours.
Kumar et al. [72] have solved this chicken-and-egg problem. They have designed their own programming language, CakeML, with a clean semantics. They express a simple compiling algorithm as a mathematical function in HOL4 (a descendant of Gordon’s original HOL system). They translate the same algorithm to CakeML and compile it to binary (for the popular ×86 architecture) using HOL4 itself, giving high assurance of correctness. This yields a trustworthy version of the compiler in binary form. Their approach is an instance of the well-known technique of bootstrapping, where a compiler is built for a simple but adequate language and then used as a tool to further its own development until the full language is supported. In this case, the bootstrapping process includes verification, and is possible because HOL4 can cope with the full binary code of the CakeML compiler. This is one representative of a large body of research concerned with treating mathematical functions defined within a theorem prover as executable code.
Readers who want to pursue these topics further will find surveys available on formally verified mathematics [73] and on the history of interactive theorem proving [74,75].
Data accessibility
Isabelle can be downloaded from https://isabelle.in.tum.de.
Competing interests
I declare I have no competing interest.
Funding
Much of the research reported here was supported by the EPSRC or its predecessors, or by German and EU funding agencies, going back 40 years.
Acknowledgements
Angeliki Koutsoukou-Argyraki commented on a draft of this paper. The referees also made insightful comments. I would also like to thank Mike Gordon FRS and Robin Milner FRS for their mentorship.
author profile
Lawrence Paulson FRS is Professor of Computational Logic at the University of Cambridge, where he has held established positions since 1983. He has written over 100 refereed conference and journal papers as well as four books. He introduced the popular Isabelle theorem proving environment in 1986, and made contributions to the verification of cryptographic protocols, the formalization of mathematics, automated theorem proving technology, and other fields. He achieved a formal analysis of the ubiquitous TLS protocol, which is used to secure online shopping, and the formal verification of Gödel’s second incompleteness theorem. In 2008, he introduced MetiTarski, an automatic theorem prover for real-valued functions such as logarithms and exponentials.
Footnotes
1 Thus, a form of the axiom of choice is actually provable in this system [14, p. 516].
4 Mathematical proofs also introduce temporary quantities, as when we allow x to denote a root of some polynomial. Therefore, contexts may include a string of bound variables, but the details [32] are too complicated to present here.