foundations of computational agents
This applet shows how Q-learning works for a simple 10x10 grid world. The numbers in the squares shows the Q-values of the square for each action. The blue arrows show the optimal action based on the current value function (when it looks like a star, all actions are optimal). To start, press one of the 4 action buttons.
In this example, there are four rewarding states (apart from the walls), one worth +10 (at position (9,8); 9 across and 8 down), one worth +3 (at position (8,3)), one worth -5 (at position (4,5)) and one -10 (at position (4,8)). In each if these states the agent gets the reward when it carries out an action in that state (when it leaves the state, not when it enters). (These are the same as in the value iteration applet).
There are 4 actions available: up, down, left and right. If it carries out one of these actions, it have a 0.7 chance of going one step in the desired direction and a 0.1 change in going one step in any of the other three directions. If it bumps into the outside wall (i.e., the square computed as above is outside the grid), there is a penalty on 1 (i.e., a reward of -1) and the agent doesn't actually move. When an agent acts in one of the states with positive reward, it is flung, at random, to one of the 4 corners of the grid world (no matter what action it does). Again this is the same as the value iteration applet.
The initial discount rate is 0.9. It is interesting to try the Q-learning at different discount rates (using the "Increment Discount" and "Decrement Discount" buttons, or just typing it in).
You can control the agent yourself (using the up, left, right, down buttons) or you can step the agent for a number of times. The agent acts greedily the percentage of the time specified and act randomly the rest of the time.
The alpha value, by default uses the counts (so the value is the average of the experiences). You can also make it a fixed value.
Reset initializes all Q-values to the given "Initial value".
The commands "Brighter" and "Dimmer" change the contrast (the mapping between non-extreme values and colour). "Grow" and "Shrink" change the size of the grid.
You can get the code: Q_GUI.java is the GUI. The controller code is at Q_Controller.java, and the environment simulation is at Q_Env.java. You can get the javadoc for various applets.
This applet comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it under certain conditions, see the code for more details.
Copyright © David Poole, 2003,2004. All rights reserved.