Part II: Policy Creation
For the next few questions, use DiscountGrid. Create an optimum policy for the following situations by modifying discount, noise, and reward. For example: 
python gridworld.py -a value -i 100 -g DiscountGrid --discount 0.9 --noise 0.0 --livingReward 0
  1. Travel to the close exit (+1) and risk the cliffs below (-10)
  2. Travel to the close exit (+1) and avoid the cliffs
  3. Travel to the distant exit (+10) and risk the cliffs
  4. Travel to the distant exit (+10) and avoid the cliffs
  5. Avoid both the cliffs and the exit
Part III: Learning Parameters
Load the Crawler: (python crawler.py)  With the default settings, the crawler will get to the end of the screen and wraparound after 1700 steps. Can you change a few values a bit and destroy the learning process, or make the agent train faster than the default settings?
Explain in your own words what these settings are:
  1. discount
  2. epsilon
  3. learning rate.
                                                                                     Last Revised: March 19, 2018