|
|
 |
 |
|
 |
|
|
 |
 |
 |
 |
 |
 |
The object of this project is to give the student a deep, experiential understanding
of dynamic programming and value iteration through explanation, implementation examples,
and implementation exercises
- Dynamic Programming:
- Demonstrate the need for dynamic programming through Fibonacci number computation.
- Guide the student through the details of a Java solution to a simple Pig variant
with an acyclic state space.
- Provide problem solving exercises that will ground the student's understanding of
dynamic programming in experience.
- Value Iteration:
- Introduce the method of value iteration.
- Demonstrate its application to a simple coin variant of Pig called Piglet.
- Guide the student through the details of a Java solution to Piglet.
- Provide problem solving exercises that will ground the student's understanding of
value iteration in experience.
|
|
 |
 |
 |
 |
|
 |
 |
 |
 |
 |
 |
The student should understand basic probability and algebraic systems of equations
(although knowledge of linear algebraic solution techniques are not required). The
student should also understand the syntax of Java. Students who program in C++ should
be able to follow most of the Java examples.
Before starting the
project, the student will want to understand the definition
of a Markov decision process, perhaps by covering the recommended background reading below. |
|
 |
 |
 |
 |
|
 |
 |
 |
 |
 |
 |
|
To understand the terminology of Markov decision processes and general application
of value iteration, we recommend that students read sections 17.1 (Sequential Decision
Problems) and 17.2 (Value Iteration) of:
Students are encouraged to play the game of Pig in order to have a
better understanding
of the problem. An applet
with an optimal computer player is available at the
Game
of Pig page.
|
|
 |
 |
 |
 |
|
 |
 |
 |
 |
 |
 |
|
The detailed project description is available in the PDF file pig.pdf. You will need the free Adobe Acrobat Reader to view this file.
|
|
This code was generated from the same source file that generated
pig.pdf using the
literate programming
tool noweb.
We recommend having students do at least one dynamic programming exercise and one
value iteration exercise.
Students seeking
more extended challenges will also find descriptions of related
advanced projects for dynamic programming and value iteration.
|
|
|
 |
 |
 |
 |
|
 |
 |
 |
 |
 |
 |
Here is a sample syllabus used in CS 371 - Introduction to AI at Gettysburg College:
First class: Tuesday 9/28/04
Preparatory Reading: Sections 1-2 of pig.pdf.
Dynamic programming
Homework due Thursday 9/30/04: One complete exercise from section 2.4.
Second class: Thursday 9/30/04
Preparatory Reading: Remaining sections of pig.pdf, Russell & Norvig sections 17.1, 17.2.
Value Iteration
Homework due Thursday 10/7/04: One complete exercise from section 3.5.
(There was a two-day break 10/4-10/5. Without the break, Tuesday 10/5 rather than 10/7 would have been appropriate.)
The following additional coverage is not included in this project, but may provide the instructor additional ideas:
Third class: Thursday 10/7/04:
(no preparatory reading.)
Mountain Car Problem of Example 8.2 of Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: an introduction. MIT Press, pp. 214-215. (Online at http://www.cs.ualberta.ca/~sutton/book/8/node9.html.)
See also: http://www-anw.cs.umass.edu/rlr/domains.html, http://www.cs.ualberta.ca/~sutton/MountainCar/MountainCar.html.
Our approach to this problem is to simply apply value iteration, discretizing the
state space into a 400-by-400 grid of cells, and approximating the dynamics of each
time step as a mapping from one cell to another. In other words, overlay the 2D
state space (position, velocity) with a 400-by-400 grid. Simulate one time-step
from each grid intersection and note which is the nearest intersection to the resulting
state. This defines a mapping to discretely approximate the system dynamics. Then
apply value iteration with a reward of -1 for each step that the system is not at
the rightmost position.
|
|
 |
 |
 |
 |
|
|
 |
|
|