Learning from Reinforcement and Advice Using Composite Reward Functions


Reinforcement learning is a popular methodology for creating intelligent agents. However, its performance deteriorates as inter-reinforcement times increase. This paper presents an approach to integrate additional advisory feedback with the task rewards to attain faster learning speed and policies that are tuned towards the advisor's preferences. The advice is converted to "tuning" rewards that, together with the task rewards, more accurately define the advisor's perception of the task. At the same time, achievement of the original task objective is ensured using formal bounds on the user reward component. This approach is illustrated using a robot navigation task.