Cost-Based Policy Mapping for Imitation


Imitation represents a powerful approach for programming and autonomous learning in robot and computer systems. An important aspect of imitation is the mapping of observations to an executable control strategy even in cases where the behavioral capabilities of the demonstrator and imitator differ. This paper adresses this by locally optimizing a cost function representing the deviation from the observed state sequence and the cost of the selected actions. The result are strategies that as closely as possible resemble the observations of the demonstrating agent. The performance of this approach is illustrated within the context of a simulated multi-agent environment.