David's Simple Game
This applet shows how a simple game on a 5x5 grid. The agent (shown as a circle) can move up, down, left or right.
The Game
There can be a prize at one of the 4 corners (the prize shown in the color cyan when it is there). When the agent lands on a prize it gets a reward of +10 and the prize disappears. When there is no prize, a prize can appear randomly at one of the 4 corners. The prize stays there until the agent lands on it.
There are 5 locations where monsters can appear randomly. The monsters are shown as red in the square. If a monster appears when the agent is at that location, the agent gets damaged if it wasn't already damaged. If it was damaged, the agent has a penalty of 10 (i.e., a reward of -10). The monsters are at the locations independently at each time. The agent can get repaired by visiting the repair station (second to left location on the top row, shown in magenta). The agent is yellow when it isn't damaged and is pink when it is damaged.
There are 4 actions available to the agent: up, down, left and right. If the agent carries out one of these actions, it have a 0.7 chance of going one step in the desired direction and a 0.1 change in going one step in any of the other three directions. If it bumps into the outside wall or an inside wall (i.e., the square computed as above is outside the grid or through an internal wall), there is a penalty on 1 (i.e., a reward of -1) and the agent doesn't actually move.
(You can ignore the blue arrows and the numbers. They are there because when we do a Q-learning controller. I left them there because you might find it useful to use them when you add a reinforcement learning controller.)
The Controller
You can control the agent yourself (using the up, left, right, down buttons) or you can step the agent for a number of times. When you step the agent a number of times, the agent has a simplistic strategy: It act greedily using the parameter set by the applet, otherwise it acts randomly. To act greedily: if the prize is to the left it does the "left" action, otherwise if the prize is to the right it does the "right" action, otherwise if the prize is down it does the "down" action and if the prize is up it does the "up" action. If there is no prize, it acts randomly.
There are some parameters you can change, but the applet only
uses the "Step" and the "Greedy Exploit" parameters. You may want to
use the other parameters if you write your own controller.
The applet reports the number of steps and the total reward
received. It specifies the minimum accumulated reward (which indicates when it has
started to learn), and the point at which the accumulated reward
changes from negative to positive. Reset initializes these to zero. Trace on console lists the
steps and rewards on the console, if you want to plot it.
The commands "Brighter" and "Dimmer" change the contrast (the mapping between non-extreme values and colour). "Grow" and "Shrink" change the size of the grid.
Other Applets for the game:
- hand-controller (a rather simplistic rule-based controller)
- Q-learning controller
- Model-based reinforcement learning controller
- linear function controller
- Adversary controller (Q-learning, but where an adversary chooses the prize location).
You can get the code: SGameGUI.java is the GUI. The environment code is at SGameEnv.java. The controller is at SGameController.java. You can also get the javadoc for a number of my applets. This applet comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it under certain conditions, see the code for more details. Copyright © David Poole, 2010.