|
|
 |
 |
|
 |
 |
 |
 |
 |
 |
*Projects are posted as they become available
Web User Profiling
Web searches provide large amounts of information about web users. Data mining techniques can be used to analyze this information and create web user profiles. A key application of this approach is in marketing and offering personalized services, an area referred to as "data gold rush". The aim of this project is to develop a system that can be used to develop an intelligent web browser. This project focuses on the use of Decision Tree learning to create models of web users.
Character Recognition and Learning with Neural Networks
The power and usefulness of artificial neural networks have been demonstrated in several applications including speech synthesis, diagnostic problems, medicine, business and finance, robotic control, signal processing, computer vision and many other problems that fall under the category of pattern recognition. The goal of this project is to develop a character recognition system based on a neural network model.
Solving the N-Puzzle Problem
The N-puzzle game provides a good framework for illustrating conceptual AI search in an interesting and motivating way. The objective of this project is to introduce the student to Analytical (Explanation-Based) Learning using the classical AI framework of search. Hands-on experiments with search algorithms combined with an Explanation Based Learning (EBL) component give students a deep, experiential understanding of the basics of EBL.
Solving the Dice Game Pig
The jeopardy dice game Pig is very simple to describe, yet the optimal policy for play is far from trivial and was only recently solved. Using the computation of the optimal solution as a central challenge problem, we give the student a deep, experiential understanding of dynamic programming and value iteration through explanation, implementation examples, and implementation exercises.
Web Document Classification
Along with search engines, topic directories are the most popular sites on the Web. Topic directories organize web pages in a hierarchical structure according to their content. The aim of the project is to investigate the process of tagging web pages using the topic directory structures and apply Machine Learning techniques for automatic tagging. This would help in filtering out the responses of a search engine or ranking them according to their relevance to a topic.
The Game of Clue
The popular board game Clue serves as a fun focus problem for this introduction to propositional knowledge representation and reasoning. After covering fundamentals of propositional logic, students first solve basic logic problems with and without the aid of a satisfiability solver. Students then represent the basic knowledge of Clue in order to solve a Clue mystery. Could the current best stochastic SAT solver, Adaptive Novelty+, benefit from machine learning elements.
Using a Support Vector Machine to Analyze DNA Microarrays
A support vector machine (SVM) is a powerful machine learning technique that is used in a variety of data mining applications, including the analysis of DNA microarrays. A DNA microarray is a small silicon chip that is covered with thousands of spots of DNA of known sequence. Biologists use microarrays to study gene expression patterns in a wide range of medical and scientific applications. The goal of this project is to learn how to use an SVM to recognize patterns in microarray data. Using publicly available data sets, we will train an SVM to distinguish between two forms of leukemia.
Genetic Algorithms
Anyone familiar with the theory of natural selection can imagine how a difficult search problem might be attacked by combining the elements of reasonably good solutions and randomly mutating the resulting "offspring" to preserve variety. At a deeper level, there is often a disconnect between real-world problems and the kind of problems that students work on in an undergraduate AI or machine-learning course. This project focuses on learning and applying two recently developed genetic algorithms for solving real-world problems in multi-objective optimization and evolving behaviors for competitive robotic agents.
Learning Relational Knowledge
Most approaches to machine learning assume the knowledge to be learned can be expressed as a set of attribute-value pairs, i.e., an entity and its properties. However, much richer knowledge can be found in the relationships between multiple entities. This project explores the challenge of learning relational knowledge by experimenting with two relational learning methods, one logic based and one graph based.
Biomedical Term Classification
Due to the explosive growth of knowledge in biotechnologies (about 1500 research abstracts are added every single day to MEDLINE, an electronic repository of biomedical papers) an acute need for knowledge management tools has arisen. Natural Language Processing (NLP) can help fulfilling this acute need. A major problem in BioNLP, the area of research at the intersection of Biotechnologies and NLP, is that same biomedical term can be frequently used with different meanings in biological texts. For instance, SBP2 can refer both to a protein or a gene. In this project, we combine NLP with Machine Learning techniques, namely decision trees and Naïve Bayes, to build software tools that classify biomedical terms based on the surrounding contexts in which they appear. In particular, we work with terms that refer to the following categories: DNA, RNA, protein and cell_line, cell_type.
General-Purpose Problem Solver
Genetic programming (GP) is perhaps the most general of local search algorithms. It is particularly useful in solving design and optimization problems. Strengths of GP include the ability to work with heterogeneous data, and that a relatively low amount of information needs to be specified to achieve success. A number of patents now exist that have been achieved using genetic programming. The goal of this project is to learn about machine learning (problem formulation, search, and knowledge representation) by building a basic genetic programming framework and to use it to solve problems. The framework will be built piece-by-piece, as concepts are introduced.
Supervised Learning of Sign Language Characters
Recognition of images is a key technological advancment that has come out of the field of artificial intelligence. Machine learning technologies can be used to learn and recognize a variety of objects contained in images. The results can be used for face recognition, character recognition, gesture recognition, and a range of additional applications. The goal of this project is to train several different classification algorithms to recognize the alphabet character that is being signed using the American Sign Language (ASL) gesture.
Probabilistic Reasoning with Naïve Bayes and Bayesian Networks
Bayesian (also called Belief) Networks (BN) are a powerful knowledge representation and reasoning mechanism. BN represent events and causal relationships between them as conditional probabilities involving random variables. Given the values of a subset of these variables (evidence variables) BN can compute the probabilities of another subset of variables (query variables). BN can be created automatically (learnt) by using statistical data (examples). The well-known Machine Learning algorithm, Naïve Bayes is actually a special case of a Bayesian Network.
The project allows students to experiment with and use the Naïve Bayes algorithm and Bayesian Networks to solve practical problems. This includes collecting data from real domains (e.g. web pages), converting these data into proper format so that conditional probabilities can be computed, and using Bayesian Networks and the Naïve Bayes algorithm for computing probabilities and solving classification tasks.
This project desription will be updated soon.
Relational Learning for Web Document Classification
Most of the content-based approaches to text and web document classification explored in other related projects are based on the bag of words model, well known from the area of Information Retrieval. This model is simple and efficient, but fails to capture many additional document features such as the internal HTML structure, language structure and inter-document link structure. All this however may be a valuable source of information for the classification task. The basic problem with incorporating this information into the classification algorithm is the need for uniform representation. For example, the content-based classification works well with the vector space representation, while hyperlink-based classification can be implemented by using graph models. This project introduces an approach that allows various kinds of information to be represented in a uniform way and used for document classification. The idea is known as Relational Learning or First-Order Learning. Another term
also used in this context is Inductive Logic Programming (ILP), which uses the language of logic programming (or Prolog) as a representation language for learning. Some relational learning techniques have been successfully used for Data Mining applications (Relational Data Mining).
The project allows students to study the basics of relational learning and reasoning in the context of solving practical problems. One of the most successful relational learning systems, FOIL is used to create relational representation of web documents and to solve classification problems.
This project desription will be updated soon.
|
|
 |
 |
 |
 |
|
 |
 |
|
 |
|
|