Suggested exercise list:
- Change the epsilon-probability during testing to e.g. 0.001 or 0.05. Which gives the best results? Could you use this value during training? Why/not?
- Continue training the agent for the Breakout game using the downloaded checkpoint. Does the agent get better or worse the more you train it? Why? (You should run it in a terminal window as described above.)
- Try and change the game-environment to Space Invaders and re-run this Notebook. The checkpoint can be downloaded automatically. It was trained for about 150 hours, which is roughly the same as for Breakout, but note that it has processed far fewer states. The reason is that the hyper-parameters such as the learning-rate were tuned for Breakout. Can you make some kind of adaptive learning-rate that would work better for both Breakout and Space Invaders? What about the other hyper-parameters? What about other games?
- Try different architectures for the Neural Network. You will need to restart the training because the checkpoints cannot be reused for other architectures. You will need to train the agent for several days with each new architecture so as to properly assess its performance.
- The replay-memory throws away all data after optimization of the Neural Network. Can you make it reuse the data somehow? The ReplayMemory-class has the function
estimate_all_q_values()
which may be helpful. - The reward is limited to -1 and 1 in the function
ReplayMemory.add()
so as to stabilize the training. This means the agent cannot distinguish between small and large rewards. Can you use batch normalization to fix this problem, so you can use the actual reward values? - Can you improve the training by adding L2-regularization or dropout?
- Try using other optimizers for the Neural Network. Does it help with the training speed or stability?
- Let the agent take up to 30 random actions at the beginning of each new episode. This is used in some research papers to further randomize the game-environment, so the agent cannot memorize the first sequence of actions.
- Try and save the game at regular intervals. If the agent dies, then you can reload the last saved game. Would this help training the agent faster and better, because it does not need to play the game from the beginning?
- There are some invalid actions available to the agent in OpenAI Gym. Does it improve the training if you only allow the valid actions from the game-environment?
- Does the MotionTracer work for other games? Can you improve on the MotionTracer?
- Try and use the last 4 image-frames from the game instead of the MotionTracer.
- Try larger and smaller sizes for the replay memory.
- Try larger and smaller discount rates for updating the Q-values.
- If you look closely in the states and actions that are display above, you will note that the agent has sometimes taken actions that do not correspond to the movement of the paddle. For example, the action might be LEFT but the paddle has either not moved at all, or it has moved right instead. Is this a bug in the source-code for this tutorial, or is it a bug in OpenAI Gym, or is it a bug in the underlying Atari Learning Environment? Does it matter?
Last Revised: March 9, 2018