Part II: Policy Creation
For the next few questions, use DiscountGrid. Create an optimum policy for the following situations by modifying discount, noise, and reward. For example:
python gridworld.py -a value -i 100 -g DiscountGrid --discount 0.9 --noise 0.0 --livingReward 0
- Travel to the close exit (+1) and risk the cliffs below (-10)
- Travel to the close exit (+1) and avoid the cliffs
- Travel to the distant exit (+10) and risk the cliffs
- Travel to the distant exit (+10) and avoid the cliffs
- Avoid both the cliffs and the exit
Part III: Learning Parameters
Load the Crawler: (python crawler.py) With the default settings, the crawler will get to the end of the screen and wraparound after 1700 steps. Can you change a few values a bit and destroy the learning process, or make the agent train faster than the default settings?
Explain in your own words what these settings are:
- discount
- epsilon
- learning rate.
Last Revised: March 19, 2018