Stable baselines 3 ================== .. |FILE| replace:: examples/extensions/sb3.py Training -------- In this example, we showcase how the built-in stable baselines 3 (sb3) extension can be used to smoothly leverage the large associated collection of algorithm and policies. .. kgd-literal-include:: 1-21 :emphasize-lines: 13-20 As usual, we start by importing the necessary packages and we define some global configuration options. Note that, in addition to the traditional amaze classes, we also import extension-specific items (detailed below). .. kgd-literal-include:: 7-11 :pyobject: train The training function is defined much more shortly than in the hand-written q-learning case thanks to the added functionalities of stable baselines 3 and added wrappers. While, creating mazes and robots should be familiar by now, we see a new extension-specific function :meth:`~amaze.extensions.sb3.maze_env.make_vec_maze_env` used to create Vectorized Environments (:class:`~stable_baselines3.common.vec_env.VecEnv`) .. kgd-literal-include:: 13 :pyobject: train We also, sometimes, need access to the underlying environments (regular mazes) as illustrated below. There we collect the average optimal reward by calling :meth:`~amaze.extensions.sb3.maze_env.MazeEnv.optimal_reward` on every maze used for intermediate performance evaluation thanks to :meth:`~amaze.extensions.sb3.maze_env.env_method`. .. kgd-literal-include:: 14-28 :pyobject: train Next we create a :class:`~amaze.extensions.sb3.callbacks.TensorboardCallback`, an illustrative built-in callback that uses Tensorboard to provide an overview of the training process. In addition to logging numerical data such as the average rewards it also automatically generates trajectory images whenever the :class:`~stable_baselines3.common.callbacks.EventCallback` is triggered. The following lines define such an object, in a traditional SB3 fashion, while adding our own tensorboard callback and also using the optimal reward to stop as soon as the agent is behaving optimally. .. kgd-literal-include:: 31-44 :pyobject: train Finally, we create the sb3 model, using the dedicated wrapper :meth:`~amaze.extensions.sb3.sb3_controller`, by providing the robot data and the of underlying model type (one of :meth:`~amaze.extensions.sb3.compatible_models`) and, afterwards, the usual parameters. Then after setting up the logger and letting the training process run its course, we perform a final step of the callback to render the final trajectories. Using ----- .. kgd-literal-include:: 1-2 :pyobject: evaluate Once the training process is complete, we evaluate the resulting agent's generalization capability in the same manner as in :doc:`training`. The only difference is the use of the dedicated loading function :meth:`~amaze.extensions.sb3.load_sb3_controller` which is a verbose alias to :meth:`~amaze.simu.controllers.control.load`. The reminder of this function being the same, we refer the reader to the previous example (:ref:`usage/training:Generalization`), if needed. .. kgd-literal-include:: :pyobject: main Finally, the main should also be familiar from the previous example. One thing to note, however, is that, due to incompatibilities between the current opencv and PyQT5 libraries, one should use :class:`~amaze.extensions.sb3.guard.CV2QTGuard` when combining stable baselines 3 with the native Qt5 components.