











Although this project will focus on SVMs and machine learning techniques, students will learn enough about the biology behind DNA analysis for the project to make sense. Students will gain experience using SVM software and will emerge from this project with an improved understanding of how machine learning can be used to recognize important patterns in vast amounts of data. The specific objectives of the project are to learn:
 The basic concepts and techniques of supervised machine learning.
 Some of the issues involved in the implementation of a learning system.
 The vector space model for representing microarray (and other) data.
 How to design a simple learning machine experiment using our own data set.
 To appreciate some of the challenges involved in data mining in general and microarray analysis in particular.
Because microarray data are so esoteric, especially for those without extensive training in genetics, many of these objectives will be addressed through the analysis of a large set of baseball statistics.













Although the theory underlying SVMs involves linear algebra, Langrangian multipliers, and other concepts from advanced mathetics, this basic ideas on how SVMs work will be presented in nonmathematical terms. Therefore, to complete the project students should have a basic knowledge of algebra, discrete mathematics, statistics, and data structures.
We will be using an open source software too named libsvm, which can be downloaded from http://www.csie.ntu.edu.tw/~cjlin/libsvm/. The software comes with a variety of supporting documentation, including a A Practical Guide to SVM Classification. No special knowledge, beyond what is covered in this module, is required to use the software.













For an introduction to general concepts in machine learning students can read the corresponding chapter in any good AI book. For example:
 Chapter 18 of Stuart Russell and Peter Norvig. Artificial Intelligence: A Modern Approach, 2nd edition. Prentice Hall, Upper Saddle River, NJ, USA, 2003.
 Part IV, especially Chapter 11, of George Luger's Artificial Intelligence: Structures and Strategies for Complex Problem Solving, 5th edition, AddisonWesley, Reading, MA, USA, 2005.
There are a number of good online tutorials and primers available on microarrays, especially:
See also:
To understand the basic concepts of SVMs and how they are used in classification problems, students are encouraged to read the following short articles. The first article describes, in nonmathematical terms, how an SVM classifier works.
The following article provides a mathematical introduction to SVMs (for those with advanced math knowledge):
The following tutorial provides a concise explanation of basic concepts in statistics and probability:













The detailed project description is available in the PDF file svm_project.pdf. You will need the free Adobe Acrobat Reader to view this file.


This project is customizable to accommodate different approaches to teaching and different implementations. Additional exercises are also included for students seeking more extended challenges.













