Automated theorem proving (ATP) is concerned with the development and use of systems that automate sound reasoning: the derivation of conclusions that follow inevitably from facts. This project is specifically concerned with ATP for classical first-order logic, which has well understood and attractive computational properties. ATP lies at the heart of many important computational tasks, and current ATP systems are capable of solving non-trivial problems, e.g., EQP solved the Robbins problem, NASA uses ATP systems to certify aerospace software, and Microsoft uses ATP to develop and maintain high-quality software. In practice, however, the search complexity for finding proofs of most interesting theorems is enormous, and many problems cannot currently be solved within realistic resource limits. Therefore a key concern of ATP research is the development of more powerful systems, capable of proving more difficult theorems within the same resource limits.

Automated reasoning in large theories - theories in which...

  • There are many predicate and function symbols
  • There are many axioms
  • There are many theorems to be proved from the same set of axioms
  • The theorems are typically provable from a subset of the axioms
- is becoming more prevalent as large knowledge bases, e.g., ontologies and large mathematical knowledge bases, are translated into forms suitable for ATP. In large theories it is useful to develop and apply intelligent reasoning techniques that go beyond "black box" reasoning. Of particular interest is to improve performance of ATP when solving many problems from the same axiom set, but each problem requires use of only a subset of the axioms. The focus of this project is the extraction of a sufficient subset of axioms for proving that a given conjecture is a theorem, using machine learning (ML) to learn from existing proofs which axioms are more likely to be used in a proof of the theorem.

Given a large theory with multiple conjectures to be proved from a common set of axioms, an ATP system, a CPU time limit on each proof attempt, and proofs of some of the conjectures (i.e., some of the conjectures have already been proved to be theorems, but the others have not due to the large number of axioms swamping the ATP system's search space, making it unable to find a proof within the CPU time limit), iteratively use ML to learn from existing proofs which axioms are more likely to be used in proofs of further theorems. The ML is used to suggest small sets of axioms that are likely to be sufficient for finding a proof - more likely than a random selection - so that the ATP system's search space is reduced, and the system is thus able to find a proof within the CPU time limit.

The learning objectives of the project are:
  • Competence in the operation of ATP systems for first-order logic.
  • Understanding how ML can be used to map problem features to axiom sets.
  • Ability to use information output by an ML system to select small axiom sets sufficient for finding a proof of a theorem.
It is hoped that the ML structures built can be analysed to provide insight as to what properties of the selected axioms make them appear useful to the ML system. The ML techniques to be used will include nearest neighbour and Bayesian learning, and may be extended to use neural netowkrs and descision tree learning.
Students must have knowledge of standard concepts in Discrete Mathematics, including classical first-order logic. Knowledge of first-order reasoning techniques, e.g., resolution and superposition, and existing ATP systems, e.g., Vampire, SPASS, Otter, will be advantageous, but can also be learned through the project. The SystemOnTPTP online ATP service (http://www.cs.miami.edu/~tptp/cgi-bin/SystemOnTPTP) may be used to gain experience with ATP tools.

Students must have knowledge of standard concepts of ML. Experience with an ML package such as SNoW (http://l2r.cs.uiuc.edu/~danr/snow.html) or WEKA (http://www.cs.waikato.ac.nz/ml/weka/) will be advantageous, but can also be learned through the project.

For the implementation competence in programming is required. In particular, programming in a scripting language such as perl is required. Competence in other programming languages that can be used to link the various software components may help, e.g., Java, C, Prolog.
To understand the basic concepts of ATP an introductory text is appropriate. For example:
  • Wos, L. and Overbeek, R. and Lusk E.L. and Boyle, J., Automated Reasoning: Introduction and Applications. McGraw-Hill, 1992.
For an introduction to ML, students can read the corresponding chapter in any good AI book. For example, Chapter 18 of: The following book is recommended for the basic principles of ML:
  • Witten, I. and Frank, E., Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, 2000.
The detailed project description is available in the PDF file MLAR.pdf. You will need the free Adobe Acrobat Reader to view this file.
This project is customizable to accommodate different ATP and ML techniques. Extended challenges are included for advanced students.
A sample ATP syllabus used at the University of Miami when this project was assigned is available at:
Syllabus for ATP Course at the University of Miami

A sample AI syllabus used at the University of Miami when this project was assigned is available at:
Syllabus for AI Course at the University of Miami

Additional readings are included in the Background section above.