*Projects are posted as they become available

Web User Profiling
Web searches provide large amounts of information about web users. Data mining techniques can be used to analyze this information and create web user profiles. A key application of this approach is in marketing and offering personalized services, an area referred to as "data gold rush". The aim of this project is to develop a system that can be used to develop an intelligent web browser. This project focuses on the use of Decision Tree learning to create models of web users.

Character Recognition and Learning with Neural Networks
The power and usefulness of artificial neural networks have been demonstrated in several applications including speech synthesis, diagnostic problems, medicine, business and finance, robotic control, signal processing, computer vision and many other problems that fall under the category of pattern recognition. The goal of this project is to develop a character recognition system based on a neural network model.

Solving the N-Puzzle Problem
The N-puzzle game provides a good framework for illustrating conceptual AI search in an interesting and motivating way. The objective of this project is to introduce the student to Analytical (Explanation-Based) Learning using the classical AI framework of search. Hands-on experiments with search algorithms combined with an Explanation Based Learning (EBL) component give students a deep, experiential understanding of the basics of EBL.

Solving the Dice Game Pig
The jeopardy dice game Pig is very simple to describe, yet the optimal policy for play is far from trivial and was only recently solved. Using the computation of the optimal solution as a central challenge problem, we give the student a deep, experiential understanding of dynamic programming and value iteration through explanation, implementation examples, and implementation exercises.

Web Document Classification
Along with search engines, topic directories are the most popular sites on the Web. Topic directories organize web pages in a hierarchical structure according to their content. The aim of the project is to investigate the process of tagging web pages using the topic directory structures and apply Machine Learning techniques for automatic tagging. This would help in filtering out the responses of a search engine or ranking them according to their relevance to a topic.

The Game of Clue
The popular board game Clue serves as a fun focus problem for this introduction to propositional knowledge representation and reasoning. After covering fundamentals of propositional logic, students first solve basic logic problems with and without the aid of a satisfiability solver. Students then represent the basic knowledge of Clue in order to solve a Clue mystery. Could the current best stochastic SAT solver, Adaptive Novelty+, benefit from machine learning elements.

Using a Support Vector Machine to Analyze DNA Microarrays
A support vector machine (SVM) is a powerful machine learning technique that is used in a variety of data mining applications, including the analysis of DNA microarrays. A DNA microarray is a small silicon chip that is covered with thousands of spots of DNA of known sequence. Biologists use microarrays to study gene expression patterns in a wide range of medical and scientific applications. The goal of this project is to learn how to use an SVM to recognize patterns in microarray data. Using publicly available data sets, we will train an SVM to distinguish between two forms of leukemia.

Genetic Algorithms
Anyone familiar with the theory of natural selection can imagine how a difficult search problem might be attacked by combining the elements of reasonably good solutions and randomly mutating the resulting "offspring" to preserve variety. At a deeper level, there is often a disconnect between real-world problems and the kind of problems that students work on in an undergraduate AI or machine-learning course. This project focuses on learning and applying two recently developed genetic algorithms for solving real-world problems in multi-objective optimization and evolving behaviors for competitive robotic agents.

Learning Relational Knowledge
Most approaches to machine learning assume the knowledge to be learned can be expressed as a set of attribute-value pairs, i.e., an entity and its properties. However, much richer knowledge can be found in the relationships between multiple entities. This project explores the challenge of learning relational knowledge by experimenting with two relational learning methods, one logic based and one graph based.

Biomedical Term Classification
Due to the explosive growth of knowledge in biotechnologies (about 1500 research abstracts are added every single day to MEDLINE, an electronic repository of biomedical papers) an acute need for knowledge management tools has arisen. Natural Language Processing (NLP) can help fulfilling this acute need. A major problem in BioNLP, the area of research at the intersection of Biotechnologies and NLP, is that same biomedical term can be frequently used with different meanings in biological texts. For instance, SBP2 can refer both to a protein or a gene. In this project, we combine NLP with Machine Learning techniques, namely decision trees and Naïve Bayes, to build software tools that classify biomedical terms based on the surrounding contexts in which they appear. In particular, we work with terms that refer to the following categories: DNA, RNA, protein and cell_line, cell_type.

General-Purpose Problem Solver
Genetic programming (GP) is perhaps the most general of local search algorithms. It is particularly useful in solving design and optimization problems. Strengths of GP include the ability to work with heterogeneous data, and that a relatively low amount of information needs to be specified to achieve success. A number of patents now exist that have been achieved using genetic programming. The goal of this project is to learn about machine learning (problem formulation, search, and knowledge representation) by building a basic genetic programming framework and to use it to solve problems. The framework will be built piece-by-piece, as concepts are introduced.

Supervised Learning of Sign Language Characters
Recognition of images is a key technological advancment that has come out of the field of artificial intelligence. Machine learning technologies can be used to learn and recognize a variety of objects contained in images. The results can be used for face recognition, character recognition, gesture recognition, and a range of additional applications. The goal of this project is to train several different classification algorithms to recognize the alphabet character that is being signed using the American Sign Language (ASL) gesture.

Probabilistic Reasoning with Naïve Bayes and Bayesian Networks
Bayesian (also called Belief) Networks (BN) are a powerful knowledge representation and reasoning mechanism. BN represent events and causal relationships between them as conditional probabilities involving random variables. Given the values of a subset of these variables (evidence variables) BN can compute the probabilities of another subset of variables (query variables). BN can be created automatically (learnt) by using statistical data (examples). The well-known Machine Learning algorithm, Naïve Bayes is actually a special case of a Bayesian Network.

The project allows students to experiment with and use the Naïve Bayes algorithm and Bayesian Networks to solve practical problems. This includes collecting data from real domains (e.g. web pages), converting these data into proper format so that conditional probabilities can be computed, and using Bayesian Networks and the Naïve Bayes algorithm for computing probabilities and solving classification tasks.

Relational Learning for Web Document Classification
Most of the content-based approaches to text and web document classification are based on the bag of words model, well known from the area of Information Retrieval. This model is simple and efficient, but fails to capture many additional document features such as the internal HTML structure, language structure and inter-document link structure. All this however may be a valuable source of information for the classification task. The basic problem with incorporating this information into the classification algorithm is the need for uniform representation. For example, the content-based classification works well with the vector space representation, while hyperlink-based classification can be implemented by using graph models. This project introduces students to Relational (First-Order) Learning that allows various kinds of information to be represented in a uniform way and used for document classification. One of the most successful relational learning systems, FOIL is used to create relational representation of web documents and to solve classification problems.

Machine Learning for Automated Reasoning
Automated theorem proving (ATP) is concerned with the development and use of systems that automate sound reasoning: the derivation of conclusions that follow inevitably from facts. A key concern of ATP research is the development of more powerful systems, capable of proving more difficult theorems. Automated reasoning in large theories is becoming more prevalent as large knowledge bases are translated into forms suitable for ATP. Of particular interest is to improve performance of ATP when solving many problems from the same axiom set, but each problem requires use of only a subset of the axioms. The focus of this project is the extraction of a sufficient subset of axioms for proving that a given conjecture is a theorem, using machine learning (ML) to learn from existing proofs which axioms are more likely to be used in a proof of the theorem.

Robot Defense: Intelligent Behavior in a Real-Time Strategy Game
Interactive real-time games form the basis of many modern research and simulation environments. These dynamic environments allow a wide range of real-world issues to be explored such as the how computation time can negatively impact decision making. In this project, students provide the AI for an interactive real-time game called Robot Defense (similar to the highly popular Flash game Desktop Tower Defense). Assignments leverage search, knowledge representation and reinforcement learning to create low level controls for game entities and high level decision making for an agent that can play the game autonomously.

Generating Dense Boggle Boards with Genetic Algorithms
The problem of generating dense boards (high-scoring boards) in the game of Boggle is a challenging and fun task. This project explores the fundamentals of genetic algorithms, which are well suited for this task for two reasons: first, it is impractical to exhaustively examine or maintain all candidate boards; and second, information gained along one local search path can be integrated in multiple meaningful ways with information gained along other paths.

Application of Associative Matrices to Recognize DNA Sequences in Bioinformatics
Associative matrices are considered a type of neural network topology used to recall and recognize previously known or unknown patterns. For example, the Hebbian linear associative matrices can be trained to recognize a particular DNA sequence into another specimen sequence that may help biologist to identify similarities and other characteristics important in the knowledge and recognition of a particular sequence in another specimen. This approach may result especially beneficial in mutated sequences where mutations or other changes in the sequence as deletions and insertions are present. Associative matrices have been used to recognize characters, shapes, or specific objects from an image. Such changes are considered noisy patterns that are one of the important features of using associative matrices in this field.

The project introduces students to the use of associative matrices concepts in learning and pattern recognition. Students will experiment the use of the matrices to recognize DNA sequences and its possible mutations.

Creating Robotic Intelligent Agents
An intelligent agent approach to the understanding of basic AI principles has become widely accepted. This project uses robotics to create simple reflex, model based, goal based, utility, and learning agents. The simple reflex, model based, goal based, and utility robot agents are programmed to play a modified soccer game. The learning robot agent uses a neural network to learn line following behavior.

Competitive Learning in Checkers
One of the earliest investigations of machine learning was Arthur Samuel's work with Checkers, developing a program that learned a board evaluation function through the experience of playing many games. Starting from Samuel's approach, students will develop Checkers programs that learn to play by competing with other students' programs. Competitions will take place over the internet using a simple server that communicates moves between clients, checks for the validity of moves, keeps track of (and limits) the time spent by each player, and reports the winners and losers.

In addition to the reinforcement learning aspects, the project deals with the problems of search, efficient representation of knowledge, and reasoning about time as a resource. Using the same server guarantees that each checkers program learns and can compete within the same environment. At the end of the semester, the project teams compete in a tournament, which adds yet another level of excitement and provides a straightforward, if relative, comparison of the resulting programs.

Route Planning, State-Space Search, and Case-Based Learning
Route planning is the process of determining a route, or path, to follow to move from a starting location to a goal location. It is a good application to be tackled by simple state-space search methods, at least on the small scale. Re-generating routes from scratch can be expensive, and seems wasteful. Therefore case-based reasoning will be explored as a method for storing and re-using previously generated routes.

Case-based reasoning systems store information, route plans, in this case, according to a set of features that describe the context in which they apply. For route plans, the context might be just the starting and goal locations, but could also include more elaborate features like passing particular obstacles, going past particular landmarks, or being scenic or fastest. This project introduces students to the costs and benefits of state-space search algorithms, and examines CBR as an alternative or adjunct to a brute-force method. It experiments with how to break route plans into cases, how to choose useful indices, how to judge similarity, how to adapt a partial-match to fit a new context, and how to learn by storing new cases in the CBR memory.

Solving the Traveling Salesman Problem
The Traveling Salesman Problem (TSP) is a classic optimization problem: find the least-cost round-trip route amongst N cities, visiting all cities exactly once. In this set of assignments students will develop alternative machine learning approaches to solving the TSP. Some of the possible techniques include: genetic algorithms, simulated annealing, Boltzmann machines, Kohonen SOM, ant colony systems etc. The learning goal of these assignments is to use the TSP as a test-bed problem to study various machine learning techniques individually and comparatively.

Recognizing Images of American Sign Language Letters Using Principal Component Analysis
Digital images are now ubiquitous and easy to acquire. While humans easily recognize the objects and other semantic content in images, it has been much more difficult to do so automatically. To obtain higher-level semantics, a set of features must be computed from each image for comparison to the features of the reference image. Finding the right set of features by trial and error can be time-consuming and difficult. Therefore, in this project, we will use Principal Component Analysis (PCA) to learn an appropriate set of features from a training set of images, and then apply them to a test set of images. The methodology used in this project has been used succesfully for face recognition.


Discovering Optimal Policies with Value Iteration, Policy Iteration, and Q-learning
Value and Policy Iteration provide an excellent means for agents in a nondeterministic environment to determine an optimal series of actions through the solving of a Markov decision process (MDP). However, solving an MDP requires that an agent have a great deal of knowledge about its environment: specifically, the rewards for each state and the transition probabilities between states. When this knowledge is not available to the agent, it can be learned through experience. Reinforcement learning, specifically Q-learning, is a method for doing this. Q-learning is a form of model-free learning; a Q-learning agent can learn an optimal policy without any knowledge about its environment, given enough experience. In this problem, students implement value iteration, policy iteration, and Q-learning to discover optimal policies for both a toy map and for a realistic campus map. Applying these two approaches to the same set of problems provides students with an understanding of how learning can be used to make up for a lack of available domain knowledge.

Understanding Genetic Algorithms through Interactive Play
Genetic algorithms explore the hypothesis space in search of a best hypothesis to solve a problem through the manipulation of a string representation of those hypotheses. Inspired by biological evolution, parts of good hypotheses are combined and used with limited exploration to evolve new hypotheses similar to how different genes could potentially be combined into better chromosomes making a better creature. This project presents an interactive game environment where the player (student) has to evolve members of a tribe of creatures, Modabus, in order to have the correct genetic makeup to overcome an obstacle impacting their society. Through manipulation of crossover, mutation, and selection the player (student) will guide the genetic algorithm to find a solution (a Modabu citizen with the proper level of intelligence, strength, and speed to overcome each obstacle) over the course of three increasingly difficult levels.

Evolution of Language and Intelligence
The aim of this project is to explore the self-organized language and behavior that can develop between and among simple robots in a simulated environment. The project will utilize two general tools from machine learning: the genetic algorithm, and the artificial neural network. You will be provided with the infrastructure for evolving your own robot behaviors. You will then design your own neural network and design a simulated world for which your agents will evolve. Finally, you will analyze the language which results.

Machine Learning for Computer Security
As computers and the Internet become increasingly popular, malicious activities in the cyberspace have increased significantly. Intrusion detection is an area of computer security that focuses on detecting these attacks reliably. Intrusion detection systems (IDS) usually have a knowledge base containing rules that characterize attacks. Building such knowledge base manually can be time consuming. Machine learning can help build such knowledge base in a more efficient manner.

In order to detect attacks, we need to differentiate between instances of normal and attack behavior. Based on previous instances of normal and attack behavior, a machine learning algorithm can gain the knowledge on how to differentiate between the two types of behavior and represent the knowledge in a form than can be used to predict if current instances are malicious or not.


Machine Learning for Games
Computer games are prevalent as entertainment. Machine learning techniques can be used to help the computer player become smarter by learning from its experiences. For example, the computer player can learn which moves to make in different situations.