Full example: Training ====================== .. |FILE| replace:: examples/q_learning.py This example is a full training process for a very basic agent capable of navigating trivial mazes. Under the hood, it uses a :class:`~amaze.simu.controllers.tabular.TabularController` to map discrete states to discrete actions. Only the most important pieces of the code will be presented here, with the reader being redirected to the |FILE| for the unabridged sources. Configuration ------------- .. kgd-literal-include:: 10-12 Here, we use the verbose version of the :class:`~amaze.simu.robot.Robot.BuildData` initializer to also specify what kind of controller it will use and to provide the necessary parameters. We rely on the simulation to give the list of possible discrete actions and set the exploration rate and seed for the controller's random number generator. The inputs and outputs are specified via the corresponding enumerations instead of single characters for increased readability. Training loop ------------- The training process itself, detailed below, mostly boils down to three things: - pick training (and evaluation) maze(s) - create a controller - simulate a lot of episodes and apply the appropriate training operator .. kgd-literal-include:: 4-6 :pyobject: train This time around, we use the explicit initializer for the :class:`~amaze.simu.maze.Maze.BuildData`. .. kgd-literal-include:: 8-10 :pyobject: train We then tweak it slightly to get different maze for the agents to be evaluated in so that we can ensure some small measure of generalized performance. .. kgd-literal-include:: 12-13 :pyobject: train The robot data is used to instantiate one of the builtin controller to which we provide specific arguments. Using that same robot data we create a simulation with any one maze. .. kgd-literal-include:: 18-19 :pyobject: train Then for a certain number of episodes: .. kgd-literal-include:: 35-38 :pyobject: train we let the agent experience a maze and learn from it ... .. kgd-literal-include:: 42-44 :pyobject: train ... while also monitoring its performance on unseen mazes. Learning -------- .. kgd-literal-include:: :pyobject: q_train In the training process, we can no longer use the helpful :meth:`~amaze.simu.simulation.Simulation.run` function to encapsulate everything as we need to correlate actions to rewards. Instead we apply the policy to the current state to get an action. This action is then used to :meth:`~amaze.simu.simulation.Simulation.step` the simulation, resulting in a reward that we can feed back to the policy. The builtin :class:`~amaze.simu.controllers.tabular.TabularController` has both sarsa and q-learning natively implemented the latter being used here to drive the learning process. Evaluating ---------- .. kgd-literal-include:: :pyobject: q_eval In essence, evaluating the performance of an agent on non-training mazes is very similar to the training process except that we make sure to never use exploration. Thus we instead ask the tabular policy to only use :meth:`~amaze.simu.controllers.tabular.TabularController.greedy_action`. Generalization -------------- .. kgd-literal-include:: :pyobject: evaluate_generalization :emphasize-lines: 20 Finally, we illustrate two methods to evaluate the generalization performance of an AMaze agent. As we no longer need to explore with this policy, we start by setting epsilon to 0, ensuring the agent will always take the greedy action. The first method then consists in generating a large number of random mazes and, for each, creating a simulation and letting it run until completion. Thanks to the :meth:`~amaze.simu.simulation.Simulation.normalized_reward`, we can know if the agent has followed the optimal trajectory by verifying that it is equal to 1. By performing this on a large enough sample, we can get a measure of how well the agent adapts to unseen mazes. The second method is more straightforward (and computationally cheaper): when inputs are discrete (either pre-processed with :attr:`~amaze.simu.types.InputType.DISCRETE`/ :attr:`~amaze.simu.types.OutputType.DISCRETE` or aligned images with :attr:`~amaze.simu.types.InputType.CONTINUOUS`/:attr:`~amaze.simu.types.OutputType.DISCRETE`) it is possible to actually enumerate all possible combinations. Such an approach has advantages compared to the more straightforward maze-navigation as a single error has no potential for catastrophic failure. At the same time, by being more abstract, it only evaluates the subset of the agents capabilities responsible for immediate action. The returned values describe, with various levels of detail, the agents performance. The main ---------- .. kgd-literal-include:: :pyobject: main To tie it all up, the main calls both the training and generalization functions while also showcasing how to save a fully trained controller. The :meth:`~amaze.simu.controllers.control.save` function allows for additional information to be stored alongside the policy's archive for later retrieval.