![]() | ![]() | ![]() | Question 2 [10 marks] |
There are monsters at the squares marked with an X. Each monster randomly and independently, at each time, checks if the robot is on their square. If the robot is on the square when the monster checks, it has a reward of -10 (i.e., it loses 10 points). At the centre point the monster checks at each time with probability p2=0.4, at the other 4 squares marked with an X, the monsters check at each time with probability p3=0.2.
The robot has 8 actions corresponding to the 8 neighbouring squares. The diagonal moves are noisy; there is a p4=0.6 probability of going in the direction chosen and an equal chance of going to each of the 4 neighboring squares that are closest to the desired direction. The vertical and horizontal moves are also noisy; there is a probability p5=0.8 chance of going in the requested direction and an equal chance of going to one on the adjacent diagonal squares. For example, the actions up-left and up have the following result:
There is a discount factor of p6=0.9.
Assume that the rewards are immediate on entering a state (i.e., if the robot enters a state where there is a monster, it gets the (negative) reward on entering the state, and if the robot enters the state where there is a treasure it gets the reward on entering the state, even if the treasure arrives at the same time). [Note that there are other possible ways to model the game, but for this exam you are to assume these dynamics and reward model.]
Suppose we are using the inefficient state space representation with 125 states.
Suppose we are running value iteration, and have the following values for each state: [The numbers in the square represent the value of that state, and where empty squares have a zero value. It is irrelevant to this question how these values got there]:
Consider the next step of value iteration. For state s13, which is marked by * in the above figure, and the action a2 which is "up", what value is assigned to Q[s13,a2] on the next iteration of value iteration? You need to show all working, but don't need to do any arithmetic (i.e., leave it as an expression). Explain each terms in your expression.
![]() | ![]() | ![]() | Question 2 [10 marks] |