Overview

Bayesian (also called Belief) Networks (BN) are a powerful knowledge representation and reasoning mechanism. BN represent events and causal relationships between them as conditional probabilities involving random variables. Given the values of a subset of these variables (evidence variables) BN can compute the probabilities of another subset of variables (query variables). BN can be created automatically (learnt) by using statistical data (examples). The well-known Machine Learning algorithm, Naïve Bayes is actually a special case of a Bayesian Network.

The project allows students to experiment with and use the Naïve Bayes algorithm and Bayesian Networks to solve practical problems. This includes collecting data from real domains (e.g. web pages), converting these data into proper format so that conditional probabilities can be computed, and using Bayesian Networks and the Naïve Bayes algorithm for computing probabilities and solving classification tasks.
The aim of this project is to expose students to two important reasoning and learning algorithms – Naïve Bayes and Bayesian Networks, and to explore their relationship in the context of solving practical classification problems. In particular, the objectives of the project are:
  • Learning the basics of Bayesian approach to Machine Learning and the Bayesian Networks approach to Probabilistic Reasoning in AI.
  • Gaining experience in using recent software applications in these areas for solving practical problems.
  • Better understanding of fundamental concepts of Bayesian Learning and Probabilistic Reasoning and their relationship in the more general context of knowledge representation and reasoning mechanisms in AI.
Students should have basic knowledge of algebra, discrete mathematics and statistics. Another prerequisite is the data structures course. While not necessary, experience with programming in Java would be helpful as the project uses Java-based packages. These packages are open source and students may want to use specific parts of their code to implement stand-alone applications for Bayesian learning or reasoning.

The project is customizable and can accommodate different teaching approaches and different implementations depending on the choice of particular problems to be solved and tools to be used. The data collection step can be implemented manually or by using some software tools. The learning and reasoning steps use implementations of Naïve Bayes and Bayesian Networks algorithms available from free open source software packages. This allows the project to be extended to building stand-alone applications depending on the particular teaching goals and student experience in programming.

The software packages and data sets used in the project are freely available on the Web:
It is recommended that before starting the project students read Chapters 13 and 14 of Russell and Norvig’s book ([1]), Chapters 1, Chapter 3 (Section “Probability-based clustering”) and Chapter 5 (Section “Naïve Bayes Algorithm”) of Markov and Larose’s book ([2]), or Chapter 6 of Mitchell’s book ([3]).

While working on the project students can use Witten and Frank’s book ([4]) and Bouckaert’s documentation ([5]) to get additional information on how to use the algorithms for Naïve Bayes classification and Bayesian Reasoning.
  1. Stuart Russell and Peter Norvig. Artificial Intelligence: A Modern Approach, 2nd edition. Prentice Hall, Upper Saddle River, NJ, USA, 2003. Chapters 13, 14.
  2. Zdravko Markov and Daniel T. Larose. Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage. Wiley, 2007. Chapter 1 is available for download from Wiley.
  3. Tom Mitchell. Machine Learning. McGraw Hill, 1997. Chapter 6.
  4. Ian H. Witten and Eibe Frank. Data Mining: Practical Machine Learning Tools and Techniques, 2nd edition. Morgan Kaufmann, 2005.
  5. Remco R. Bouckaert. Bayesian Network Classifiers in Weka. Online at http://www.cs.waikato.ac.nz/~remco/weka_bn/index.html, or download a PDF file at http://www.cs.waikato.ac.nz/~remco/weka.bn.pdf.
The detailed project description is available in the PDF file ProbabilisticReasoning.pdf. You will need the free Adobe Acrobat Reader to view this file.
This project is customizable to accommodate different approaches to teaching and different implementations. Additional exercises are also included for students seeking more extended challenges.
A sample syllabus is not available.

Additional readings are included in the Background section above.