Download
Close
![]() |
![]() |
Message no. 1 Posted by David Poole (cpsc_422_term2) on Thursday, December 23, 2004 1:06pm Subject: Welcome to CPSC 422 We will only be using WebCT for discussions and for seeing grades. See the standard web page for all other information (e.g., assignments). I hope you enjoy the course. David |
![]() |
![]() |
Message no. 2 Posted by David Burns Cameron (s66878984) on Tuesday, January 11, 2005 1:50pm Subject: queries proved every time point (goals?) I've been playing with the robot control program for the first assignment. It seems like the robot tries to prove the same queries at each time step, specifically: 1. assign(robot_pos,(X,Y),T). 2. assign(to_do,R,T). 3. assign(goal_pos, Coords, T). 4. assign(compass,C,T). Is this actually what is happening? In other words, is the robot actually trying to prove these at each time step? And, is proving them what makes the robot "go" (it certainly seems to be)? My first intuition of how to approach question 1 involved changing one of these queries, or at least adding to this set. Is this even possible? How is this set of goal (is that the right word?) queries determined? My Prolog is a bit weak so apologies if the answers to these are obvious or fully covered in 312. Dave |
![]() |
![]() |
Message no. 3 Posted by David Burns Cameron (s66878984) on Tuesday, January 11, 2005 7:37pm Subject: using val/3 is mangling unrelated lists Hi I know that Dr. Poole mentioned in class today that we didn't need to use val, but I found that in answering question 1 assign(goal_pos,_,_) and assign(to_do,_,_) were too closely intertwined to get away with only using was/4. One needs to see the consequences of the other. Unfortunately, when I use val/3 it is somehow mangling unrelated lists in some way. For example, when starting out my to do list looks like [goto(@N1), goto(@N2), goto(@N3), goto(@N4), goto(@N5), goto(@N6)] but after successfully arriving at N1, my to_do list becomes goto(@N2)[goto(@N3), goto(@N4), goto(@N5), goto(@N6)] Note the position of the "["! And I am not even using val/3 to assign to_do, only to assign goal_pos. Unfortunately the mangled list is no longer recognized by any of the clauses and the robot's brain dies. Avoiding the use of val/3 entirely results in a nice tidy list deconstruction [goto(@N1), goto(@N2), goto(@N3), goto(@N4), goto(@N5), goto(@N6)] [goto(@N2), goto(@N3), goto(@N4), goto(@N5), goto(@N6)] [goto(@N3), goto(@N4), goto(@N5), goto(@N6)] But this approach has other problems. I can see workarounds, but they're very untidy and don't seem to be in the spirit of the assignment. I don't know how CILog stores data, but could the tricky way of val/3 guaranteeing things are only calculated once be calculating and storing them incorrectly? Please fix val/3. (code available if it'll help you fix val) Dave |
![]() |
![]() |
Message no. 4 Posted by Christopher John Hawkins (s93985018) on Tuesday, January 11, 2005 9:09pm Subject: problems with fluents i am having a hard time getting the robot simulator to recognize fluents that i have created. i tried the code "assign(time,T,T) <- arrived(T)" as a test and it was never executed. is there something i am missing here (it's been a while since i took 312 and my prolog is rusty)? thanks! |
![]() |
![]() |
Message no. 5[Branch from no. 4] Posted by David Burns Cameron (s66878984) on Tuesday, January 11, 2005 9:55pm Subject: Re: problems with fluents This is rougly what I was trying to do when I discovered that only those 4 assigns() are proven at each time point (see the queries proved every time thread). And I can't figure out how to sneak an assign() in to the body of another assign and still have things make sense. This at least would cause it to be evaluated. Dave |
![]() |
![]() |
Message no. 6[Branch from no. 2] Posted by David Poole (cpsc_422_term2) on Tuesday, January 11, 2005 10:27pm Subject: Re: queries proved every time point (goals?) In message 2 on Tuesday, January 11, 2005 1:50pm, David Burns Cameron writes: >I've been playing with the robot control program for the first assignment. It seems like >the robot tries to prove the same queries at each time step, specifically: >1. assign(robot_pos,(X,Y),T). >2. assign(to_do,R,T). >3. assign(goal_pos, Coords, T). >4. assign(compass,C,T). >Is this actually what is happening? In other words, is the robot actually trying to prove >these at each time step? Yes. That is how it implements its belief state. >And, is proving them what makes the robot "go" (it certainly >seems to be)? AT the bottom level it just calls the compass and goasl_pos and plots them. >My first intuition of how to approach question 1 involved changing one of these queries, >or at least adding to this set. Is this even possible? How is this set of goal (is that the >right word?) queries determined? Think about what it has to remember. And then what it has to do based on what it remembers. >My Prolog is a bit weak so apologies if the answers to these are obvious or fully covered >in 312. They were not covered in 312. But that course gives the logic programming familiarity that I am assuming. >Dave > David |
![]() |
![]() |
Message no. 7[Branch from no. 3] Posted by David Poole (cpsc_422_term2) on Tuesday, January 11, 2005 10:29pm Subject: Re: using val/3 is mangling unrelated lists We will have a new version of the controller that doesn't use val available tonight. If you really need val, you can write it yourself. I didn't use val in my solution. David |
![]() |
![]() |
Message no. 8[Branch from no. 5] Posted by David Poole (cpsc_422_term2) on Tuesday, January 11, 2005 10:30pm Subject: Re: problems with fluents In message 5 on Tuesday, January 11, 2005 9:55pm, David Burns Cameron writes: >This is rougly what I was trying to do when I discovered that only those 4 assigns() are >proven at each time point (see the queries proved every time thread). And I can't figure >out how to sneak an assign() in to the body of another assign and still have things make >sense. This at least would cause it to be evaluated. It proves all assigns at each time step. David |
![]() |
![]() |
Message no. 9[Branch from no. 4] Posted by David Poole (cpsc_422_term2) on Tuesday, January 11, 2005 10:37pm Subject: Re: problems with fluents In message 4 on Tuesday, January 11, 2005 9:09pm, Christopher John Hawkins writes: >i am having a hard time getting the robot simulator to recognize fluents >that i have created. i tried the code "assign(time,T,T) <- arrived(T)" >as a test and it was never executed. is there something i am missing >here (it's been a while since i took 312 and my prolog is rusty)? Why do you think it was never executed? It works for me. I had the appropriate assign in the controller log. David |
![]() |
![]() |
Message no. 10[Branch from no. 6] Posted by David Poole (cpsc_422_term2) on Tuesday, January 11, 2005 10:41pm Subject: Re: queries proved every time point (goals?) In message 6 on Tuesday, January 11, 2005 10:27pm, David Poole writes: >Yes. That is how it implements its belief state. It proves assign(F,V,T) for every time T. For every solution it remembers the value proved. Then it plots the compass and the robot_pos. David |
![]() |
![]() |
Message no. 11[Branch from no. 9] Posted by Christopher John Hawkins (s93985018) on Tuesday, January 11, 2005 10:54pm Subject: Re: problems with fluents when i run it, the assign does not show up in the controller log. maybe i should reinstall the simulator. |
![]() |
![]() |
Message no. 12[Branch from no. 10] Posted by David Burns Cameron (s66878984) on Tuesday, January 11, 2005 11:09pm Subject: Re: queries proved every time point (goals?) >It proves assign(F,V,T) for every time T. For every solution it It works for arbitrary F now. I'm very sure that it didn't before. I notice the comments are gone. Maybe there was a glitch in the old controller code that was corrected at the same time? Dave |
![]() |
![]() |
Message no. 13[Branch from no. 11] Posted by Christopher John Hawkins (s93985018) on Tuesday, January 11, 2005 11:19pm Subject: Re: problems with fluents if i access the applet directly through the CIspace webpage the code works fine but it will NOT run with the simulator compiled for Win32. another quick question; what is the syntax for negation or NAF? thanks, CJ |
![]() |
![]() |
Message no. 14 Posted by David Poole (cpsc_422_term2) on Wednesday, January 12, 2005 12:19pm Subject: Assignment 1 notes We have fixed up the controller. Please make sure you are using version 4.6 (you may need to clear the cache in your browser). This is also available for a download. We removed "val" from the controllers. Please don't use it; you can always write your own equivalent function. Also, don't use a predicate name that starts with "val". The solution to question 1 is straightforward. You have to think about "what should the agent remember?" All of the problems I have seen are because the proposed solution gets the agent to remember too much (and they interfere with each other, usually resulting in two locations being taked off the to_do list). There is negation as failure in the controller, but you don't need to use it. (I didn't in my solution). The documentation is wrong about built-in arithmetic. The current version supports "is" and comparisons such as "<". There may still be a problem with a solution to 2a. I will post a revision to that. (I will not change the question, but will change the way it is tested). This assignment is not designed to be really difficult. If you are having real problems, ake a step back and think about the two questions: what should the agent remember and what what should the agent do. Have fun! That's all I can think of for now. Please post your questions here and we will do our best to answer them. David |
![]() |
![]() |
Message no. 15 Posted by David Poole (cpsc_422_term2) on Wednesday, January 12, 2005 12:42pm Subject: Alternate question 2(a) Here is a revised question 2(a) that you can optionally do instead of the current question. Suppose you get a job with "Future Software for Flakey Robots Inc". Your job is to write a controller for a robot that has not yet been released. You have a prototype for the robot, but that is buggy. You do, however, have a specification of how it is supposed to work. You need to be able to deliver a controller that will work when a robot that fulfills the specification comes available. Unfortunately, you won't be able to do any empirical tests on how it compares to the existing robot until the robot is released. (a) Change the CIspace controller so that it is opportunistic; when it selects the next location to visit, it selects the location that closest to its current position. It should still visit all of the locations. (b) Explain how you could test your controller. While the robot controller may be buggy, you can assume that it implements that same language as CILog and that CILog is not buggy. What queries could you ask in CILog that will convince you (and your boss) that your controller will work in the to-be-released controller. You need to explain (in English) why you think that your tests are adequate. (c) Show your tests in CILog. (d) Try your controller in the existing applet. Why does it not work? (Be as specific as you can to pinpoint exactly what the bug (or bugs) in the current applet is.) Note that, we are not assuming you solution will not work in the current applet. We have a solution that doesn't. There may be solutions that do work. |
![]() |
![]() |
Message no. 16 Posted by David Poole (cpsc_422_term2) on Wednesday, January 12, 2005 4:29pm Subject: TA office hours Frank Hutter's office hours are rugularly at Wednesdays 1:30-2:30 in room CICSR 341. He will have an extra office hour this week on Friday from 2:00-3:00. Michael Chiang's office hours are regularly on Thursdays 12:50-1:50 (i.e., the hour before class) in the CICSR atrium. He will hold a special office hour on Monday from 2-3. David |
![]() |
![]() |
Message no. 17 Posted by Robert McGregor (s92140011) on Wednesday, January 12, 2005 9:06pm Subject: Problems with controller Hi, I created a simple controller similar to the one in class that turns on a "follow_wall" variable when a wall is hit. I can run it in the online applet, but when I try to copy/paste in the identical program from notepad, it will not work as expected. In the debugger for the simple controller, it tries to evaluate the expression: assign(follow_wall,Var1,T) but in the copy/pasted one it tries to evaluate: assign(follow_wall,Var1,Var2) (which of course is true since assign(follow_wall,off,0) is a fact) So it can never assign follow_wall to on because it will always assign it to off first. I sort of solved the problem once by restarting the applet and typing it all bit by bit (starting from scratch each time). Bob |
![]() |
![]() |
Message no. 18[Branch from no. 17] Posted by David Poole (cpsc_422_term2) on Wednesday, January 12, 2005 10:15pm Subject: Re: Problems with controller I am not sure what you are asking, but if you replace the code for steer in the middle level controlle with the following it will follow a wall when it hits one. Does this work for you? steer(D,T) <- was(wallfollow,off,_,T) & whisker_sensor(off,T) & goal_is(D,T). steer(left,T) <- whisker_sensor(on,T). steer(right,T) <- was(wallfollow,on,_,T) & whisker_sensor(off,T). assign(wallfollow,on,T) <- whisker_sensor(on,T) & was(wallfollow,off,T1,T). assign(wallfollow,off,0.0). |
![]() |
![]() |
Message no. 19[Branch from no. 13] Posted by Wing Hang Chan (s84098011) on Friday, January 14, 2005 9:55am Subject: Re: problems with fluents Ah I've been using the win32 executable...thanks for the hint. |
![]() |
![]() |
Message no. 20[Branch from no. 19] Posted by David Poole (cpsc_422_term2) on Friday, January 14, 2005 10:07am Subject: Re: problems with fluents In message 19 on Friday, January 14, 2005 9:55am, Wing Hang Chan writes: >Ah I've been using the win32 executable...thanks for the hint. Make sure you are using version 4.6 of the applet (look at the header of the applet itself; it looks like the web page version number has not been updated). If you downloaded it a few days ago, you may have to reload it again. David |
![]() |
![]() |
Message no. 21 Posted by Danelle Abra Wettstein (s86800018) on Saturday, January 15, 2005 2:20pm Subject: Java errors?? I'm getting a StringIndexOutOfBoundException when I try running the applet with the code I entered. There aren't even any strings in what I wrote, and why am I getting a JAVA error??? Aughh... this assignment may not be meant to be difficult, but it's sure got me turned inside out and upside down. |
![]() |
![]() |
Message no. 22[Branch from no. 21] Posted by Christopher John Hawkins (s93985018) on Saturday, January 15, 2005 3:29pm Subject: Re: Java errors?? I got these messages but only when i had a syntax error in my prolog code, usually a missing "." or "&". |
![]() |
![]() |
Message no. 23[Branch from no. 21] Posted by David Poole (cpsc_422_term2) on Saturday, January 15, 2005 8:04pm Subject: Re: Java errors?? I have never seen this error. Remember that the controller is supposed to use CILog syntax. Try loading your controller into CILog and seeing if you get an error. The "check" command in CILog is very useful as it tells you what you need to add to test your code. David |
![]() |
![]() |
Message no. 24[Branch from no. 21] Posted by David Burns Cameron (s66878984) on Sunday, January 16, 2005 4:05pm Subject: Re: Java errors?? In message 21 on Saturday, January 15, 2005 2:20pm, Danelle Abra Wettstein writes: >StringIndexOutOfBoundException when I try running the applet with the code I entered. I saw that last night alot, along with NullPointerExceptions. They all seemed to be caused by syntax errors. "," instead of "." Missing ")" and all that. Dave |
![]() |
![]() |
Message no. 25[Branch from no. 23] Posted by Danelle Abra Wettstein (s86800018) on Sunday, January 16, 2005 6:31pm Subject: Re: Java errors?? What is the command to open cilog (it's been too long)? |
![]() |
![]() |
Message no. 26[Branch from no. 25] Posted by David Poole (cpsc_422_term2) on Sunday, January 16, 2005 9:20pm Subject: Re: Java errors?? In message 25 on Sunday, January 16, 2005 6:31pm, Danelle Abra Wettstein writes: >What is the command to open cilog (it's been too long)? See: http://www.cs.ubc.ca/spider/poole/ci/code/cilog/cilog_man.html CILOG User Manual; Section 1 says how to get it and use it. Just download http://www.cs.ubc.ca/spider/poole/ci/code/cilog/cilog_swi.pl and load it using SWI prolog. David |
![]() |
![]() |
Message no. 27 Posted by David Burns Cameron (s66878984) on Monday, January 17, 2005 6:51pm Subject: CISpacecraft In question 2aa, How can we go about maintaining the to_do list after the best goal has been chosen. A single predicate that calculates the best goal and the new list, for example % closest is true if Bestgoal is a goal that is closer than all of the goals on Goals. But where % did Bestgoal come from? It couldn't have come from Goals, as would be possible in an % imperative language. closest(Bestgoal,Goals,T). thanks |
![]() |
![]() |
Message no. 28[Branch from no. 27] Posted by David Poole (cpsc_422_term2) on Monday, January 17, 2005 8:33pm Subject: Re: CISpacecraft In message 27 on Monday, January 17, 2005 6:51pm, David Burns Cameron writes: >In question 2aa, > >How can we go about maintaining the to_do list after the best goal has >been chosen. A single predicate that calculates the best goal and the >new list, for example > >% closest is true if Bestgoal is a goal that is closer than all of the >goals on Goals. But where >% did Bestgoal come from? It couldn't have come from Goals, as would be >possible in an >% imperative language. >closest(Bestgoal,Goals,T). I don't understand your question. You take in a list and return the smallest element and the rest of the list. (Which imperative languageis it easier in?) I could even imagine asking this in the first CPSC312 assignment. David |
![]() |
![]() |
Message no. 29 Posted by Ryan Yee (s81483042) on Monday, January 17, 2005 8:45pm Subject: Gripes with cilog. Some things I learned while doing Assignment 1. Comments are buggy. A percent (comment) on the end of the line causes a syntax error (huh what?!). Seriously. some_predicate(stuff_here). % Furthermore, where are the cuts and semicolons?! How else can we do mutual exclusion without rewriting vast blocks of code? (i.e. why can't cilog be more like prolog?) And lastly, debugging is a PITA with the applet. When a syntax error is made, the entire goal + body is transformed into one long line (which usually spans wider than the screen). Kinda discourages me from using meaningful names. |
![]() |
![]() |
Message no. 30 Posted by Christopher John Hawkins (s93985018) on Monday, January 17, 2005 9:43pm Subject: The followwall example Was the code from the followwall example given in class last Tuesday (11th) posted online anywhere? Thanks! |
![]() |
![]() |
Message no. 31[Branch from no. 28] Posted by David Burns Cameron (s66878984) on Monday, January 17, 2005 9:50pm Subject: Re: CISpacecraft It seems like you're suggesting closest(Goals, BestGoal, GoalsThatAreNotBest, T) which we tried. During the recursive call CILog was backing out like a Mac truck in a runaway lane whenever it needed to assign a value to BestGoal. It seemed that BestGoal needed to be known before any two goals could be compared (since its position needed to be known for comparison), and at the same time couldn't be known until it had been compared to the other goals in Goals. Would a "helper rule" be appropriate here so a worklist could be used? Dave PS, it's easier (for me) in just about every imperative language since closest(BestGoal, Goals, T) could use pass-by-reference/in-out parameter semantics for Goals. On the way in it would be the complete list, on the way out the diminished list. |
![]() |
![]() |
Message no. 32[Branch from no. 30] Posted by David Poole (cpsc_422_term2) on Monday, January 17, 2005 10:44pm Subject: Re: The followwall example In message 30 on Monday, January 17, 2005 9:43pm, Christopher John Hawkins writes: >Was the code from the followwall example given in class last Tuesday >(11th) posted online anywhere? > >Thanks! Yes. It was posted on the bulletin board under "Problems with controller". David |
![]() |
![]() |
Message no. 33[Branch from no. 31] Posted by David Poole (cpsc_422_term2) on Monday, January 17, 2005 11:00pm Subject: Re: CISpacecraft In message 31 on Monday, January 17, 2005 9:50pm, David Burns Cameron writes: >It seems like you're suggesting > >closest(Goals, BestGoal, GoalsThatAreNotBest, T) > >which we tried. During the recursive call CILog was backing out like a >Mac truck in a runaway lane whenever it needed to assign a value to >BestGoal. I have no idea what that means. Is a Mac truck (in this context) like a MS truck that just crashes less often? >It seemed that BestGoal needed to be known before any two >goals could be compared (since its position needed to be known for >comparison), and at the same time couldn't be known until it had been >compared to the other goals in Goals. Did you try it in CILog itself (see the message "Alternate question 2(a)")? It is easier to debug there. >Would a "helper rule" be appropriate here so a worklist could be used? Perhaps. This is a pretty basic 312-like question. (It would even be a simple assignment in CPSC 124 when we offered that). >Dave > >PS, it's easier (for me) in just about every imperative language since >closest(BestGoal, Goals, T) >could use pass-by-reference/in-out parameter semantics for Goals. On the >way in it would be the complete list, on the way out the diminished list. That is almost always a *bad* thing to do, as you lose the previous list of Goals. What if you add to the program and need to know what used to be the goal? What if you are not actually going to do it, but are just thinking about what would happen if you did do it? Side effects are generally bad things in programs unless there is no way to get around it; they prevent sensible debugging (where you know what the symbols mean) and most optimizations that you would like to be carried out to your code. But that is an entirely different debate. David |
![]() |
![]() |
Message no. 34 Posted by Michael Chiang (s27992023) on Wednesday, January 19, 2005 1:11pm Subject: TA hours (Michael C.) Hi everyone, This is just to inform you that I will be moving the location of my TA consultation hour from the CICSR atrium to lab #106, which is only accessible from the atrium itself. The time will be unchanged, Thursdays 12.50pm ~ 1.50pm. Thanks, Michael |
![]() |
![]() |
Message no. 35 Posted by David Poole (cpsc_422_term2) on Friday, January 21, 2005 8:01pm Subject: Updated notes on decision-theoretic planning and reinforcemnt learning There are some more notes available from: http://www.cs.ubc.ca/spider/poole/cs422/2005/slides.html On Tuesday, I am planning on continuing to talk about reinforcement learning. It is really important that you understand the basics, so please come with lots of questions unless it is all very obvious to you. David |
![]() |
![]() |
Message no. 36 Posted by David Poole (cpsc_422_term2) on Sunday, January 23, 2005 10:17pm Subject: Assingmnet 1 solution A solution to assignment 1 (the controller changes) are posted to the course home page: http://www.cs.ubc.ca/spider/poole/cs422/2005/#assignments David |
![]() |
![]() |
Message no. 37 Posted by David Poole (cpsc_422_term2) on Sunday, January 23, 2005 10:20pm Subject: Assignment 2 There will be bonus marks to the students who give the smallest number of states for question 1 (assuming that it is correct). Think about it! David |
![]() |
![]() |
Message no. 38 Posted by Wing Hang Chan (s84098011) on Friday, January 28, 2005 12:10am Subject: assignment 2: simulating random rewards Hi, I am having trouble simulating the rewards at each of the corners of the grid. The assignment says that they appear with a probability of 0.2, but how do we simulate that? would it be: discount * 10 * 0.2 ? or am I totally wrong. Thanks in advance. |
![]() |
![]() |
Message no. 39[Branch from no. 38] Posted by Frank Hutter (s62336011) on Saturday, January 29, 2005 5:34pm Subject: Re: assignment 2: simulating random rewards I misread this part as well the first time, but the wording is actually pretty clear: "When there is no treasure, at each time step, there is a probability P1 = 0.2 that a treasure appears, and it appears with equal probability at each corner." (there is a probability 0.2 for it to appear and if it appears, this happens with probability 0.25 in any of the corners) That means, there can be only one treasure at a time, or no treasure at all. Once a treasure appeared, it remains there until it is collected (and there are no other treasures until this one is collected) This domain is fully observable, i.e. you know at each time step whether there is a treasure or not and if there is one where it is. You also know where you are. (but you can't plan ahead deterministically due to the randomness in the actions) |
![]() |
![]() |
Message no. 40 Posted by Frank Hutter (s62336011) on Saturday, January 29, 2005 5:55pm Subject: Some clarifications on assignment 2 I got a few questions about assignment 2 in my office hour, so here's some general clarifications. Question 1 is very important, so think about it before you start programming something. The robot can be in any of the 25 fields; the treasure can be in any of the 4 corners or not there at all; each of the monsters can check or not check. Which of these do you need to represent as states, and which ones do you get around ? Recall that the monsters do not move and that they check independently of earlier checks. Also, you possibly can exploit a lot of symmetries in the domain which reduces the number of states. There will be a bonus on the lowest correct number of states. (but if you do fancy things make sure to explain them well and not just give a number) I guess the standard solution is easiest to implement (I did it this way, too), but if you exploit the symetries in a clever way the problem may actually get small enough to do it by hand. You're totally not required to hand in any code, just the optimal policy, i.e. which action to perform in each state (not just in every field, the states subsume more than that!). In my office hour, I advised people to check out the Java code for the applet and possibly use this implementation of value iteration as a code base. After implementing it myself I don't really see the need for this anymore. The algorithm can be written in a few lines anyways and the applet code has a lot of special purpose elements to it, so don't get hung up with it. Micheal Chiang (the other TA) e.g. uses Matlab. Again, you may be able to do it by hand, too ... Hope that helps, Frank |
![]() |
![]() |
Message no. 41 Posted by Guan Wang (s77942019) on Sunday, January 30, 2005 9:04am Subject: optimal policy For the optimal policy can I assume that the treasure appears in just one corner, and show the actions based on that assumption? |
![]() |
![]() |
Message no. 42[Branch from no. 41] Posted by Guan Wang (s77942019) on Sunday, January 30, 2005 9:18am Subject: Re: optimal policy ... but I guess that means that there's more than one optimal policy? Is this correct? Thanks |
![]() |
![]() |
Message no. 43[Branch from no. 39] Posted by Wing Hang Chan (s84098011) on Sunday, January 30, 2005 12:21pm Subject: Re: assignment 2: simulating random rewards Ooh thanks for clearing that up. So how would we implement that in our Q function? I am guessing that we don't multiply the reward of 10 by 0.2 and 0.25, as that would make the reward value very small. Thanks in advance. |
![]() |
![]() |
Message no. 44[Branch from no. 42] Posted by Frank Hutter (s62336011) on Sunday, January 30, 2005 6:45pm Subject: Re: optimal policy > For the optimal policy can I assume that the treasure appears in just one corner, and show the actions based on that assumption? The treasure is always in only one corner (or not there). In every possible state, the robot needs to know what to do (that's what a policy is for). If you can treat the case "treasure in some corner" in one by using symetry that's fine. (In general you can't do this though. If the treasure appears in, say the lower left corner and you go there, you already need to take into respect that it could next appear in the opposite corner or in one of the adjacent corners, so you need to factor in future uncertainty. But don't bother with this comment for the assingment) > ... but I guess that means that there's more than one optimal policy? Is this correct? There is only one optimal policy. Think about what's part of the state ! Frank |
![]() |
![]() |
Message no. 45[Branch from no. 43] Posted by Frank Hutter (s62336011) on Monday, January 31, 2005 5:51pm Subject: Re: assignment 2: simulating random rewards > I am guessing that we don't multiply the reward of 10 by 0.2 and 0.25, as that would make the reward value very small. If the treasure is in some corner and you know that, then it's gonna stay there until you get there. So you have the full reward in that case. If there is no treasure you need to reason about all possible future states at once. Here, for each corner you have a 0.2*0.25 = 0.05 probability for the treasure to appear there, and you're reasoning about all future states at once by weighting them by the probability of getting there. Frank |
![]() |
![]() |
Message no. 46 Posted by Frank Hutter (s62336011) on Monday, January 31, 2005 6:54pm Subject: Re: CPSC 422 A2 Question [I'm posting this email question I got and my answer here. Please ask your questions here, not via email] > I do not understand how the optimal policy relates to the position of > the treasure. It seems to me that the optimal action at any square > depends on the current location > of the treasure, but that would mean that there isn't one optimal > policy. Do I want 4 arrays of qvalues, one for each possible position of > treasure, or what am I misunderstanding? > Does it follow the same policy not planning based on the location of the > treasure, and in this way form a plan for what is generally best? > > Thanks, > Blake You're misunderstanding the difference between a robot position and a state. The state can subsume much more, whatever you need. You want one utility value for each state, and one Q value for each state-action pair. You can e.g. code the states as a multidimensional array, where each dimension defines some part of the state. For Q, the action is yet another dimension. Noone holds you back from including where the treasure is in your state. Then the state |
![]() |
![]() |
Message no. 47 Posted by Frank Hutter (s62336011) on Monday, January 31, 2005 7:03pm Subject: Re: CPSC 422 A2 Question [Sorry for the double posting, I first posted to Main by accident] [I'm posting this email question I got and my answer here. Please ask your questions here, not via email] > I do not understand how the optimal policy relates to the position of > the treasure. It seems to me that the optimal action at any square > depends on the current location > of the treasure, but that would mean that there isn't one optimal > policy. Do I want 4 arrays of qvalues, one for each possible position of > treasure, or what am I misunderstanding? > Does it follow the same policy not planning based on the location of the > treasure, and in this way form a plan for what is generally best? > > Thanks, > Blake You're misunderstanding the difference between a robot position and a state. The state can subsume much more, whatever you need. You want one utility value for each state, and one Q value for each state-action pair. You can e.g. code the states as a multidimensional array, where each dimension defines some part of the state. For Q, the action is yet another dimension. Noone holds you back from including where the treasure is in your state. Then the state is different from the state , and you're all good. Frank |
![]() |
![]() |
Message no. 48[Branch from no. 47] Posted by Frank Hutter (s62336011) on Monday, January 31, 2005 7:08pm Subject: Re: CPSC 422 A2 Question Ups, apparently WebCT doesn't like it when you enclose text with the keys "smaller" and "larger". I did this with my states. The last sentence was supposed to say: "Then the state [robot in square X, treasure in square Z1] is different from the state [robot in square X, treasure in square Z2], and you're all good." Frank |
![]() |
![]() |
Message no. 49 Posted by Frank Hutter (s62336011) on Monday, January 31, 2005 7:14pm Subject: Change of office hour Sorry, I need to reschedule my office hour. (One of the reading groups I am attending now got scheduled exactly for that time slot.) My new slot is also on Wednesdays, but now from 9:50 to 10:50 am. This is only gonna be updated on the course webpage by next week when David returns. Sorry for any inconvenience ! Frank |
![]() |
![]() |
Message no. 50[Branch from no. 45] Posted by Danelle Abra Wettstein (s86800018) on Monday, January 31, 2005 9:52pm Subject: Re: assignment 2: simulating random rewards >If there is no treasure you need to reason about all possible future >states at once. What does this mean? Reason about all possible future states at once? That sounds scary, and like a lot to do in one step. >probability for the treasure to appear there, and you're reasoning about >all future states at once by weighting them by the probability of >getting there. Also confused about what this means... reasoning about all future states at once by weighting them by the probability of getting there. Clarification? TIA |
![]() |
![]() |
Message no. 51[Branch from no. 44] Posted by Vivian Luk (s82215013) on Monday, January 31, 2005 10:50pm Subject: Re: optimal policy I'm a little confused as to how/what format should the optimal policy be expressed (couldn't find any examples in our notes). Would inference rules suffice? Thanks. |
![]() |
![]() |
Message no. 52[Branch from no. 51] Posted by Samuel Douglas Davis (s85850014) on Monday, January 31, 2005 11:56pm Subject: Re: optimal policy I would also like some clarification on this, if it's not too late. Right now I have a *very* crude ascii art output of a multidimensional array (which I suppose I could hand-copy to make clearer), along the lines of the VI applet. Is that what we were expected to produce? Simply making a huge list of rules mapping from states to actions seems like a really bad idea. I'm tempted to hand in my Java code too, just in case. |
![]() |
![]() |
Message no. 53 Posted by David Poole (cpsc_422_term2) on Tuesday, February 1, 2005 6:25am Subject: I am not here Greetings from Germany. I forgot to tell everyone that I won't be in my office hour today. Alan Mackworth will be be teaching the classes for this week. When I get back next week I will post a message saying what will be on the midterm. I hope you had fun with the assignment, and learnt lots, David |
![]() |
![]() |
Message no. 54[Branch from no. 52] Posted by Frank Hutter (s62336011) on Tuesday, February 1, 2005 12:44pm Subject: Re: optimal policy An ascii output of what to do in which state is good. I've also done it like this, along the lines of the applet, but with ascii If you want, you can also write the rules down in English (e.g. "if there is no treasure, move straight away from walls that are not corners", but they need to cover all states for full marks) Frank |
![]() |
![]() |
Message no. 55[Branch from no. 50] Posted by Frank Hutter (s62336011) on Tuesday, February 1, 2005 12:47pm Subject: Re: assignment 2: simulating random rewards > Also confused about what this means... reasoning about all future states at > once by weighting them by the probability of getting there. > Clarification? Check out the classnotes. This is just in English language what the formula for the update of the Q-values is saying. Frank |
![]() |
![]() |
Message no. 56 Posted by David Poole (cpsc_422_term2) on Sunday, February 6, 2005 8:40pm Subject: Slides page has been updated The slides page at http://www.cs.ubc.ca/spider/poole/cs422/2005/slides.html has been updated. Note that there is a new version of the reinforcement learning draft notes. If you have any comments or questions about these, please ask. There are planned to be expanded into part of a new chapter of the second edition of our book; I'm happy to exand parts that you think need to be exanded now. So let me know what you want to be explained better! Also note that the tentative date for the midterm was 22 Feb, and that would be confirmed two weeks before. So let's discuss it on Tuesday. I will have a more detailed outline of what will be on the midterm later this week. David |
![]() |
![]() |
Message no. 57 Posted by Danelle Abra Wettstein (s86800018) on Sunday, February 6, 2005 10:38pm Subject: Chapters What chapters in the textbook should we have read by now? Thanks. |
![]() |
![]() |
Message no. 58[Branch from no. 57] Posted by David Poole (cpsc_422_term2) on Monday, February 7, 2005 5:47pm Subject: Re: Chapters In message 57 on Sunday, February 6, 2005 10:38pm, Danelle Abra Wettstein writes: >What chapters in the textbook should we have read by now? We have covered or will have covered by the midterm: Chapter 12 Notes on decision-theoretic planning (see slides page) Notes on reinforcement learning (see slides page) Sections 6.3-6.6 Section 7.3 Chapter 9 David |
![]() |
![]() |
Message no. 59 Posted by Stephen Shui Fung Mak (s36743003) on Monday, February 7, 2005 11:32pm Subject: MT and Assignment question 1) Will there be a practice MT posted before the study break? 2) When will we get back our assignment? 3) Will there be office hour (or TA office hour) on Feb 21 Monday? Thanks, Stephen |
![]() |
![]() |
Message no. 60[Branch from no. 59] Posted by David Poole (cpsc_422_term2) on Wednesday, February 9, 2005 10:50am Subject: Re: MT and Assignment question In message 59 on Monday, February 7, 2005 11:32pm, Stephen Shui Fung Mak writes: >1) Will there be a practice MT posted before the study break? Yes. >2) When will we get back our assignment? Thursday (tomorrow) for both assignments. >3) Will there be office hour (or TA office hour) on Feb 21 Monday? The midetem is now on Thursday 24th. There will be office hours on the Tuesday, Wednesday and Thursday. David |
![]() |
![]() |
Message no. 61 Posted by David Poole (cpsc_422_term2) on Wednesday, February 9, 2005 12:23pm Subject: Midterm, new date As discussed in class yesterday, the midterm will now be on Feb 24th. David |
![]() |
![]() |
Message no. 62 Posted by Danelle Abra Wettstein (s86800018) on Wednesday, February 9, 2005 8:38pm Subject: Assignment 3, Question 1 Hi, I don't feel like reading the notes (or the "rough notes") has helped me understand how we're supposed to do the assignment. Could you give an example as to how this question should be done? TIA |
![]() |
![]() |
Message no. 63[Branch from no. 62] Posted by David Poole (cpsc_422_term2) on Thursday, February 10, 2005 1:00pm Subject: Re: Assignment 3, Question 1 In message 62 on Wednesday, February 9, 2005 8:38pm, Danelle Abra Wettstein writes: >Hi, > >I don't feel like reading the notes (or the "rough notes") has helped me understand how >we're supposed to do the assignment. Could you give an example as to how this question >should be done? > >TIA For question 1, In the notes, (p 411, bottom) is the formula that specifies when it is guaranteed to converge in theory. Hint: 1/k is guaranteed to converge. {It is only "in theory" because it only guarantees convergence eventaully, it does not guarantee how long it takes.). Some of the ways to vary alpha follow these conditions and some do not. Run the applet to find the answer to (b). You may have to change a line in the code to test the 10/(9+k), but I presume you can all read Java. I wanted you to get to look at the code, (the core of the algorithm happens at the bottom of do-step). For part (c), you just have to think about it. As to question 2&3, I'd suggest doing question 2 only if you want to get your hands dirty in Java code. 90% of the code is the UI which doesn't need to be changed. The rest is as given in the notes. Question 3 is a matter of getting you to think about the qualitative notion of the algorithm. It is an exteremely simple problem. The mubers are chosen so that it takes a long time to jump out of the states with a low probability of exiting; just think about what happens eventually. This is one of theose "ah ha" problems, where being able to explain it will make sure you understand it. I hope that helps. Please ask us if you have more specific questions. David |
![]() |
![]() |
Message no. 64[Branch from no. 63] Posted by Robin McQuinn (s12331039) on Tuesday, February 15, 2005 12:14am Subject: Re: Assignment 3, Question 1 for question 3, aka 2b (I hope) it seems as if you're saying that the actual formula/algorithm to calculate Q(lambda) does not need to be known to answer the question, yet I find it important, in as much as I don't quite understand how it works. I've looked through the draft notes (which seems to be the only place where this is covered) where the concepts are flesched much more thoroughly than in class. The concept of the eligibility trace, is also only covered in the draft notes, strangely. The closest that the description comes to deriving an algorithm for calculating Q(lambda) is at the top of page 417. This algorithm only obscures the concept more, which leads me to think that I'm barking up a tree a mile from the explanation. More specifically, my question is, to calculate an eligibility trace from a state-action pair, are all possible future paths accounted for? and if so, how? simply added? I'm also wondering if the draft notes are intended for publication in the next version of the book, because there are some minor spelling errors. |
![]() |
![]() |
Message no. 65[Branch from no. 60] Posted by Robin McQuinn (s12331039) on Tuesday, February 15, 2005 12:18am Subject: Re: MT and Assignment question I'm just wondering if discussing the posted midterm here is sanctioned? |
![]() |
![]() |
Message no. 66[Branch from no. 64] Posted by David Poole (cpsc_422_term2) on Tuesday, February 15, 2005 10:46am Subject: Re: Assignment 3, Question 1 In message 64 on Tuesday, February 15, 2005 12:14am, Robin McQuinn writes: >for question 3, aka 2b (I hope) it seems as if you're saying that the >actual formula/algorithm to calculate Q(lambda) does not need to be >known to answer the question, yet I find it important, in as much as I >don't quite understand how it works. I've looked through the draft >notes (which seems to be the only place where this is covered) where the >concepts are flesched much more thoroughly than in class. The concept >of the eligibility trace, is also only covered in the draft notes, >strangely. >The closest that the description comes to deriving an algorithm for >calculating Q(lambda) is at the top of page 417. This algorithm only >obscures the concept more, which leads me to think that I'm barking up a >tree a mile from the explanation. Use the SARSA(lambda) algorithm for this question. The one that is given on page 418 of the notes. I was going to describe only Q(lambda) but for Q(lambda) the mathematics doesn't work for Q(lambda) as you are using different values for V(s) in different parts of the algorithm (one for the best action and one for the action the agent is actually following). You only need to know about SARSA(lambda). >More specifically, my question is, to calculate an eligibility trace >from a state-action pair, are all possible future paths accounted for? >and if so, how? simply added? YES! Think of it this way, imagine there was a infinite sequence of people and each person had to get a penny from each of the people after them. The easiest way to implement this is for each pseron to give a penny to all of the people before them in the sequence. The eligibility trace is an implementation of this, where the e[s,a] number indicates how much Q[s,a] should be updated by the current error. Does this make sense? >I'm also wondering if the draft notes are intended for publication in >the next version of the book, because there are some minor spelling errors. Yes; we are currently working on the second edition. We'd rather hear about things you don't understand and need to be explained better at the moment rather than minor spelling errors (but we want to hear about these too). Please email these to me or post them here. David |
![]() |
![]() |
Message no. 67[Branch from no. 65] Posted by David Poole (cpsc_422_term2) on Tuesday, February 15, 2005 10:47am Subject: Re: MT and Assignment question In message 65 on Tuesday, February 15, 2005 12:18am, Robin McQuinn writes: >I'm just wondering if discussing the posted midterm here is sanctioned? Certainly. David |
![]() |
![]() |
Message no. 68[Branch from no. 63] Posted by Danelle Abra Wettstein (s86800018) on Tuesday, February 15, 2005 3:05pm Subject: Re: Assignment 3, Question 1 >Run the applet to find the answer to (b). You may have to change a line >in the code to test the 10/(9+k), but I presume you can all read Java. I >wanted you to get to look at the code, (the core of the algorithm >happens at the bottom of do-step). > In regards to the applet, how do we know if it "converges"? |
![]() |
![]() |
Message no. 69[Branch from no. 68] Posted by Danelle Abra Wettstein (s86800018) on Tuesday, February 15, 2005 3:15pm Subject: Re: Assignment 3, Question 1 And what is meant by "the environment changes slowly" in part c? |
![]() |
![]() |
Message no. 70 Posted by David Poole (cpsc_422_term2) on Wednesday, February 16, 2005 1:50pm Subject: Assignment 3, question 2(a) For updating the applet, you *only* need to change the method doStep and add some global variables. The only trick is to keep it very clear in your mind which is the current state and the previous state, and the current action and the previous action. (Be clear about which S and A in SARSA each variable is referring to). Good luck. If you spend more than a couple of hours on this, you are doing something wrong, and you should step back and rethink what you are doing. Of course, if you are not familiar with Java, I'd recommend not doing this question. David |
![]() |
![]() |
Message no. 71 Posted by Frank Hutter (s62336011) on Saturday, February 19, 2005 8:15pm Subject: Q(lambda) description Q(lambda) is mentioned in David's notes, but it's not explained in detail. So if you're having problems with it, you may want to have a look at section 7.6 in the Sutton and Barto book whose HTML version is linked of the course webpage under "slides used in class". You don't need to understand this section in detail, just give it a look - that should already clarify many questions ... Cheers, Frank |
![]() |
![]() |
Message no. 72 Posted by Vivian Luk (s82215013) on Sunday, February 20, 2005 11:59am Subject: A3 Ques2B - Eligibility Trace Hi, Does anyone know any good sites that go into more detail about eligibility traces? The class notes, text, and rough notes don't really say much about it. Thanks :) Vivian |
![]() |
![]() |
Message no. 73[Branch from no. 72] Posted by Vivian Luk (s82215013) on Sunday, February 20, 2005 12:09pm Subject: Re: A3 Ques2B - Eligibility Trace Ah, nvm. I just noticed Frank's post above. Thanks anyways. |
![]() |
![]() |
Message no. 74 Posted by David Burns Cameron (s66878984) on Sunday, February 20, 2005 2:49pm Subject: Comments on reinforcement notes I just read over the reinforcement notes, and I made some comments as I went. Some of these comments may be unnecessary, I'm still working on understanding reinforcement learning, but I think these would be trouble areas. Particularly for someone who didn't have the benefit of the lectures and was only working from the text. I'd be especially interested to hear if my comments suggest a deeper misunderstanding, what with the midterm coming up. p410There could be more context for the v values at the beginning of the temporal differences section. p411"you increase the predicted value in proportion to that difference. If the new value is less than the old prediction, we decrease the predicted value by that amount." makes it sound as if increases are affected by alpha (proportional) but decreases are not (by that amount), but they are both affected by alpha, aren't they? "but it may be a better estimate of the next value if the dynamics is non-stationary." is rather jargon loaded. Could this refer to the environment changing or the real value of vk changing rather than non-stationary dynamics? p414
Ok, but if all the actions are good, and it's hallucinating that they're all good, then isn't that fairly realistic? At least it isn't mistaking bad actions for good actions. I think this is unclear because the "these" that I've starred refers to the states the actions lead to, but on first reading seems to refer to a second state where the actions lead to good results that is being compared to a state where the actions lead to bad results. At the bottom of the page, the text jumps directly from explaining backups to talking about lookaheads. Some definition or explanation of lookaheads would be good. p415"However, this is provides an improved estimate of the policy that the agent is actually following. If the agent is following policy p this gives an improved estimate of Qp ." This wasn't clear to me either in class or in the text. What is "the policy that the agent is actually following"? If it's the current function used to select actions, then how can we be estimating it? Don't we know it's exact value? And isn't its value changing as we continue to update Q? What is the policy converging on, if it's an estimate of what we're already doing? The following paragraph does a good job of explaining this, but this paragraph standing alone sounds incredibly confusing. p417
I really don't know where we made what values cancel out, or how we did it. By using all of the n-step lookaheads? There is this passage:
but it doesn't explictly refer to cancelling out. The notes were certainly a help, particularly with the assignment. Dave |
![]() |
![]() |
Message no. 75 Posted by Kaili Elizabeth Vesik (s83834010) on Sunday, February 20, 2005 4:48pm Subject: Assignment 3, Question 2(a) I'm working on Question 2 (a), so I've been wandering around the java code for the Q(lambda) applet, as well as figuring out the differences in implementation between Q(lambda) and SARSA(lambda) according to the algorithms in section 7.5 and 7.6 of the Sutton/Barto online text. Now here's the fun part: I was trying to decide what parts of the code I needed to change in order to convert the applet to do SARSA(lambda), and along the way, realized that it looks to me suspiciously like the java code implements an algorithm that is a hybrid of the two we're concerned with. Has anyone else noticed this, or am I just horribly confused? It'd be nice to know either way. ;) Kaili |
![]() |
![]() |
Message no. 76 Posted by David Burns Cameron (s66878984) on Sunday, February 20, 2005 4:58pm Subject: alpha in Q(lambda) demo I'm checking out the Q(lambda) applet, and noticed this in the instructions: "The alpha value, by default uses the counts (so the value is the average of the experiences). You can also make it a fixed value." But the usual "fixed" checkbox isn't there. Is this an error? Does alpha have to be fixed for Q(lambda)? It seems like you could have an alpha for each state based off a K for each state just like in Q-learning. Alpha doesnt' show up in too many of the Q(lambda) equations, but it does make an appearance in the first one on p415: Q[s,a] <- Q[s,a] + alpha*delta-t and the rest of the section doesn't seem incompatible with either fixed or dynamic alphas. I guess I should dive in to the code and find out? Dave |
![]() |
![]() |
Message no. 77[Branch from no. 76] Posted by David Burns Cameron (s66878984) on Sunday, February 20, 2005 7:10pm Subject: Re: alpha in Q(lambda) demo I found it in the code and it's definetly fixed. But the code is also there to count visits. Why would it not allow a variable alpha based on the number of visits? Nothing about SARSA would seem to forbid this. Dave |
![]() |
![]() |
Message no. 78[Branch from no. 71] Posted by David Poole (cpsc_422_term2) on Sunday, February 20, 2005 7:24pm Subject: Re: Q(lambda) description It should be SARSA(lambda). Please answer question 2(b) assuming SARSA(lambda). Sorry about that, David In message 71 on Saturday, February 19, 2005 8:15pm, Frank Hutter writes: >Q(lambda) is mentioned in David's notes, but it's not explained in >detail. So if you're having problems with it, you may want to have a >look at section 7.6 in the Sutton and Barto book whose HTML version is >linked of the course webpage under "slides used in class". >You don't need to understand this section in detail, just give it a look >- that should already clarify many questions ... > >Cheers, >Frank |
![]() |
![]() |
Message no. 79[Branch from no. 75] Posted by David Burns Cameron (s66878984) on Sunday, February 20, 2005 7:37pm Subject: Re: Assignment 3, Question 2(a) It looks to me like Q(lambda) because it only looks at max(a') Q[s',a'] for the expected future value (ie for the V term in the delta equation (using Poole's notation from his notes)). I'll take a peak at the Sutton/Barto material before I go and modify anything rashly though. Why do you think it's a mixture? If you want to chat about it, I'm on msn at davcamer@hotmail.com Dave |
![]() |
![]() |
Message no. 80[Branch from no. 76] Posted by David Poole (cpsc_422_term2) on Sunday, February 20, 2005 8:00pm Subject: Re: alpha in Q(lambda) demo I could not work out how to use alpha=1/k using Q(lambda). [It isn't clear what to use as k]. The naive way of doing it doesn't work. I asked Rich Sutton (who is the world expert on Reinforcement learning) and he said that it wasn't appropriate to use alpha varying. I thought I had removed all references to a varying alpha in the code, but obviously I didn't. In any case, alpha should be fixed. David |
![]() |
![]() |
Message no. 81[Branch from no. 74] Posted by David Poole (cpsc_422_term2) on Sunday, February 20, 2005 9:44pm Subject: Re: Comments on reinforcement notes In message 74 on Sunday, February 20, 2005 2:49pm, David Burns Cameron writes:
|
![]() |
![]() |
Message no. 82[Branch from no. 75] Posted by David Poole (cpsc_422_term2) on Sunday, February 20, 2005 9:59pm Subject: Re: Assignment 3, Question 2(a) In message 75 on Sunday, February 20, 2005 4:48pm, Kaili Elizabeth Vesik writes: >I'm working on Question 2 (a), so I've been wandering around the java >code for the Q(lambda) applet, as well as figuring out the differences >in implementation between Q(lambda) and SARSA(lambda) according to the >algorithms in section 7.5 and 7.6 of the Sutton/Barto online text. > >Now here's the fun part: I was trying to decide what parts of the code I >needed to change in order to convert the applet to do SARSA(lambda), and >along the way, realized that it looks to me suspiciously like the java >code implements an algorithm that is a hybrid of the two we're concerned >with. Has anyone else noticed this, or am I just horribly confused? It'd >be nice to know either way. ;) Yes. It is. Sutton and Barto describe 3 versions of Q(lambda), none of which actually work (very well). I implemented what they called the naive version (or at least I tried to). Fortunately (for you), you have to implement SARSA(lambda) for which there is a well defined algorithm. Instead of using max_a Q(s',a) like I did, you should use Q(s',a') where a' is the action actually used. This makes the math work. In the next version of the book, Q(lambda) will not be mentioned. I was originally trying to minimize the number of things to present. I didn't want to pretend that Q-learning was the only thing in reinforcement learning. I originally was trying to present averaging over k-step lookaheads without introducing SARSA, but it doesn't really work. You only need to change one method; doStep and some global variables. You don't even need to look at the other methods. Most of which are just doing UI stuff or calling doStep with appropriate arguments. David >Kaili |
![]() |
![]() |
Message no. 83[Branch from no. 81] Posted by David Poole (cpsc_422_term2) on Sunday, February 20, 2005 10:37pm Subject: Re: Comments on reinforcement notes I just typed in a big long reply using the HTML editor and it didn't seem to work at all!!! |
![]() |
![]() |
Message no. 85[Branch from no. 84] Posted by David Poole (cpsc_422_term2) on Sunday, February 20, 2005 10:45pm Subject: Re: Comments on reinforcement notes In message 84 on Sunday, February 20, 2005 10:41pm, David Poole writes: >As far as I can tell the HTML editor doesn't work at all. Here is as >much as I saved of my reply. There were more comments at the end I will >try to reconstruct tomorrow. Let's see if it works better inline... (This is the same as the previous message; it has not been updated). Did I say I hated WebCT? In message 74 on Sunday, February 20, 2005 2:49pm, David Burns Cameron writes:
|
![]() |
![]() |
Message no. 86[Branch from no. 83] Posted by David Poole (cpsc_422_term2) on Sunday, February 20, 2005 10:48pm Subject: Re: Comments on reinforcement notes In message 83 on Sunday, February 20, 2005 10:37pm, David Poole writes: >I just typed in a big long reply using the HTML editor and it didn't >seem to work at all!!! > > Here is as much of the message as I can reconstruct. There was more at the end, but my brain isn't working now. I'll try to reconstruct it tomorrow. Did I say I hated WebCT? In message 74 on Sunday, February 20, 2005 2:49pm, David Burns Cameron writes:
|
![]() |
![]() |
Message no. 87[Branch from no. 80] Posted by David Burns Cameron (s66878984) on Sunday, February 20, 2005 11:23pm Subject: Re: alpha in Q(lambda) demo It's mostly gone. There are some lines commented out, so it seemed like you had probably taken it out, but I wasn't sure why. There more confusing part was that the documentation at the bottom of the webpage the applet is served from specifically says that alpha can be fixed, or varied. I'm still curious why alpha won't work, so I'll try to explain how I was thinking it would, which is undoubtedly the broken naive approach. So... Every action taken results in an infinite series of datum that will be reported back, but will eventually stop being relevant. The factors (1-lambda)*lambda that these datum are discounted by are chosen to converge to 1. This makes them equivalent in magnitude to a one-step backup. An action can be taken again, and when this happens we want to average the resulting infinite series in some way. Because they are equivalent in magnitude to a one-step backup, we can use the same technique of temporal differences. That's where alpha comes in. Alpha is proportional to the number of items in the average, and this can be tracked by counting the number of times the action has been taken. However, there is an implmentation issue because the value of alpha changes on subsequent visits, but the eligibility traces generated on each visit are summed and collapsed together. This could be avoided by changing the eligibility values to be tuples of alpha and e, and then updating with an equation like where alpha and eligibility are now subscripted to indicate the visit which first generated them. (alpha,e) pairs could be removed from the list when e decayed past a certain threshold. Storing a list of tuples would be a lot of extra work but I can't see a way to simplify it. Would it be worth it? Q(lambda) in particular seems unstable, and a dynamic alpha is designed to increase stability. But then that's trading off against adaptability again. Dave |
![]() |
![]() |
Message no. 88[Branch from no. 85] Posted by David Burns Cameron (s66878984) on Sunday, February 20, 2005 11:58pm Subject: Re: Comments on reinforcement notes Oh dear, I didn't realize what I set off by using the HTML Editor. And it was enough trouble getting it to work in the first place! p410
|
![]() |
![]() |
Message no. 89[Branch from no. 87] Posted by David Poole (cpsc_422_term2) on Monday, February 21, 2005 9:00am Subject: Re: alpha in Q(lambda) demo As you show, it could be done if you were to store more information for each state. However, storing two numbers for each state-action pair is still too much for real applications. You need to be able to approximate for realistic size applications. Question 1 will help you answer whether it is better if you actually reduce alpha as 1/k. This is how research works. You make conjectures about what will work. Work out the details and then test to see what works in practice. David |
![]() |
![]() |
Message no. 90 Posted by Frank Hutter (s62336011) on Monday, February 21, 2005 5:30pm Subject: Detailed hints on question 2(b) Hi everybody, unfortunately, we only clarified today that question 2(b) should be done for SARSA(lambda) instead of Q(lambda) (see David's reply to "Q(lambda) description" above) Given that the assignment is already due tomorrow, I thought it may only be fair to give some hints on how to attack that question. You only need to trace the algorithm, and you're likely to learn more about it doing so. Just trace SARSA(lambda) on page 418 of the draft notes for the following cases (write down the values it assigns to the different variables): (Eligibility traces and say also all the Qs are initialized to zero.) 1) you're in state B and go left the first time. (Q[B,left] is increased) ... staying in state A for a long time (=> eligibility trace for (B,left) goes to 0) 2) you're in state B and go left the second time. (Q[B,left] is increased more) ... staying in state A for a long time (=> eligibility trace for (B,left) goes to 0) 3) you're in state B and go left the third time. Do you see a pattern ? Against which value will Q[B,left] converge eventually ? Now the slightly more complicated case: 1a) you're in state B and go right the first time. Let's asume you end up in state B this time (this only happens with high probability, you will eventually end up in state A at some point) 1b) you're in state B and go right the second time. Again, let's assume you end up in B. 1c) you're in state B and go right the second time. Again, let's assume you end up in B. The eligibility trace e[B,right] is converging to something close to 4. 1d) You're in state B and go right the Nth time for rather large N. Assume N is large enough such that the eligibility trace converged to 4 already. Now assume finally you end up in A. You're getting a reward, and since your eligibility trace is 4+1, Q[B,right] becomes quite large. ... staying in state A for a long time (=> eligibility trace for (B,right) goes to 0) 2a) you're visiting state B the second time and go right. Do the same thing as in 1a) - 1c) Do you see a pattern ? Where do you end up after this ? To which value does Q[B,right] converge ? Is it any different than the Q[B,right] value after 1c) ? Doing the same thing as in 1d) then increases Q[B,right] lots again. After 1d) (after ending up in A when going right in state B), Q[B,right] is indeed larger than what Q[B,left] converges to. But Q[B,right] is not stable. How could you prevent such a funny behaviour of SARSA(lambda) ? Would a change in alpha make any difference ? Hope that helps. It's more than we wanted to reveal but I guess it's just fair since you only got today. Good luck, Frank |
![]() |
![]() |
Message no. 91[Branch from no. 90] Posted by David Poole (cpsc_422_term2) on Monday, February 21, 2005 7:49pm Subject: Re: Detailed hints on question 2(b) In message 90 on Monday, February 21, 2005 5:30pm, Frank Hutter writes: >Hi everybody, > >unfortunately, we only clarified today that question 2(b) should be done >for SARSA(lambda) instead of Q(lambda) (see David's reply to "Q(lambda) >description" above) From message 66 (last Tuesday the 15th): Use the SARSA(lambda) algorithm for this question. The one that is given on page 418 of the notes. David |
![]() |
![]() |
Message no. 92[Branch from no. 90] Posted by David Poole (cpsc_422_term2) on Monday, February 21, 2005 9:51pm Subject: Re: Detailed hints on question 2(b) One more clarification for this question. While thinking about Frank's questions will help you answer the question, answering them will not answer question 2(b). You need to actually answer the questions posed in the assignment. We want a qualitative description. What would you say to your friend? What is wrong with your friend's argument? While answering all of Franks questions may help, they do not provide the answer. The answer is a concise description that would help your friend. Something to get them to say "oh I see; I didn't think about it enough." There are, of course, lots of possible answers to the second question "What does this example show?" You need to write something sensible. This isn't meant to be tricky. This is a simple problem. There are only 2 states, 2 actions and only 2 (deterministic) policies (as it doesn't matter what you do in state A). The two policies are "go right in state B" and "go left in state B". David |
![]() |
![]() |
Message no. 93[Branch from no. 88] Posted by David Poole (cpsc_422_term2) on Monday, February 21, 2005 10:16pm Subject: Re: Comments on reinforcement notes I have downloaded a revised version to http://www.cs.ubc.ca/spider/poole/ci2/excerpts/reinforcementlearning.pdf These have not been proofread as carefully as I would like, but I thought I'd post them just the same. Note that you are not expected to know the details of SARSA(lambda) just the general idea for the midterm. (See the "what is on the midterm" pointer from the homapage). Please post or send me any comments on the revised version. Thanks for your feedback. David |
![]() |
![]() |
Message no. 94 Posted by Stephen Shui Fung Mak (s36743003) on Monday, February 21, 2005 10:20pm Subject: Is tomorrow (Tues) going to be a review session? Just wondering...because the material presented relating reinforcement learning is quite chaotic and would be great if Professor Poole can go over the key concepts one more time and some examples before the MT. |
![]() |
![]() |
Message no. 95[Branch from no. 90] Posted by Vivian Luk (s82215013) on Monday, February 21, 2005 10:53pm Subject: Re: Detailed hints on question 2(b) I'm a little confused on the wording in the 'draft notes'. On pg. 417, it states that when the state-action pair is first visited, the eligibility is set to 1 On pg. 418, it states that e[s,a] is initialized to 0. ? Thanks! |
![]() |
![]() |
Message no. 96[Branch from no. 94] Posted by Vivian Luk (s82215013) on Monday, February 21, 2005 10:58pm Subject: Re: Is tomorrow (Tues) going to be a review session? I second a review session. That would help to clarify some (most) concepts. :) |
![]() |
![]() |
Message no. 97[Branch from no. 92] Posted by Frank Hutter (s62336011) on Monday, February 21, 2005 11:07pm Subject: Re: Detailed hints on question 2(b) Uups, so it was clear since last week that you should use SARSA(lambda). >While thinking about Frank's questions will help you answer > the question, answering them will not answer question 2(b). True, I guess my posting could be misinterpreted a bit - it was only meant to give you a couple of questions I would suggest to answer for yourself in order to understand the problem (and the friend's argument) better. At least, they helped some students I explained stuff to. Clearly, you still need to answer (solely) the actual assignment questions for full marks. Frank |
![]() |
![]() |
Message no. 98[Branch from no. 95] Posted by Frank Hutter (s62336011) on Monday, February 21, 2005 11:11pm Subject: Re: Detailed hints on question 2(b) > I'm a little confused on the wording in the 'draft notes'. > On pg. 417, it states that when the state-action pair is first visited, > the eligibility is set to 1 > On pg. 418, it states that e[s,a] is initialized to 0. Yes, the eligibility traces are all initialized to 0. When you first visit a state-action pair, its eligibility trace is set to 1 since you increase its eligibility trace by 1 and 0+1 = 1 So both formulations are equivalent. Frank |
![]() |
![]() |
Message no. 99[Branch from no. 96] Posted by Michael Nightingale (s98742018) on Tuesday, February 22, 2005 12:53am Subject: Re: Is tomorrow (Tues) going to be a review session? Yes, perhaps we could go over some of the practice mt questions? |
![]() |
![]() |
Message no. 100[Branch from no. 99] Posted by Christopher John Hawkins (s93985018) on Tuesday, February 22, 2005 1:21am Subject: Re: Is tomorrow (Tues) going to be a review session? I agree. Even a bit of time at the beginning of class where we could ask questions would be great. |
![]() |
![]() |
Message no. 101[Branch from no. 94] Posted by Guan Wang (s77942019) on Tuesday, February 22, 2005 9:29am Subject: Re: Is tomorrow (Tues) going to be a review session? yeah, go over sample mt questions is goos idea :-) |
![]() |
![]() |
Message no. 102[Branch from no. 94] Posted by David Poole (cpsc_422_term2) on Tuesday, February 22, 2005 9:32am Subject: Re: Is tomorrow (Tues) going to be a review session? In message 94 on Monday, February 21, 2005 10:20pm, Stephen Shui Fung Mak writes: >Just wondering...because the material presented relating reinforcement >learning is quite chaotic and would be great if Professor Poole can go over >the key concepts one more time and some examples before the MT. It wasn't going to be, but I am more than happy to answer questions. David |
![]() |
![]() |
Message no. 103[Branch from no. 95] Posted by David Poole (cpsc_422_term2) on Tuesday, February 22, 2005 9:35am Subject: Re: Detailed hints on question 2(b) In message 95 on Monday, February 21, 2005 10:53pm, Vivian Luk writes: >I'm a little confused on the wording in the 'draft notes'. > >On pg. 417, it states that when the state-action pair is first visited, >the eligibility is set to 1 >On pg. 418, it states that e[s,a] is initialized to 0. > >? Yes. What is the problem? When you initialize, no state-action pair has ben visited. When you fist visit a state-action pair you add 1 to 0. David |
![]() |
![]() |
Message no. 104[Branch from no. 100] Posted by David Poole (cpsc_422_term2) on Tuesday, February 22, 2005 9:37am Subject: Re: Is tomorrow (Tues) going to be a review session? In message 100 on Tuesday, February 22, 2005 1:21am, Christopher John Hawkins writes: >I agree. Even a bit of time at the beginning of class where we could >ask questions would be great. There is always time at the start of any lecture to ask as many questions as you like. Please ask lots; if you don't ask questions, I assume that you understand. David |
![]() |
![]() |
Message no. 105 Posted by Kaili Elizabeth Vesik (s83834010) on Tuesday, February 22, 2005 9:57pm Subject: Assignment 3 solutions David: When will the solutions for assignment 3 be posted? Thanks Kaili |
![]() |
![]() |
Message no. 106 Posted by Kaili Elizabeth Vesik (s83834010) on Wednesday, February 23, 2005 1:49am Subject: Sample midterm In section one (robot control) of the sample midterm, question (g) reads "Why don't we run the logical specification of a hierarchical controller using SLD resolution? How can it be implemented efficiently?" My question is this: what does "it" refer to? The logical specification of a hierarchical controller in a general sense? Or using SLD Resolution? Thanks to anyone who can offer some insight on this. |
![]() |
![]() |
Message no. 107[Branch from no. 106] Posted by David Poole (cpsc_422_term2) on Wednesday, February 23, 2005 9:37am Subject: Re: Sample midterm In message 106 on Wednesday, February 23, 2005 1:49am, Kaili Elizabeth Vesik writes: >In section one (robot control) of the sample midterm, question (g) reads >"Why don't we run the logical specification of a hierarchical controller >using SLD resolution? How can it be implemented efficiently?" > >My question is this: what does "it" refer to? The logical specification >of a hierarchical controller in a general sense? Or using SLD Resolution? A hierarchical controller (like the CIspace applet). It does not use the logical specification of was: was(Fl,V,T0,T) <- assign(F1,V,T0) & T0 < T & ~ assignedbetween(FL,T0,T). assignedbetween(FL,T0,T) <- assign(FL,V1,T1) & T0 < T1 & T1 < T. Why doesn't it? And how can the controller be implemented effieciently without just running those clauses in a logic programming language. David |
![]() |
![]() |
Message no. 108[Branch from no. 105] Posted by David Poole (cpsc_422_term2) on Wednesday, February 23, 2005 10:20am Subject: Re: Assignment 3 solutions CPSC 422 - Assignment 3 - Solution - Spring 2005 Question 1 (a) (i) and (ii) converge in theory. (iii) doesn't converge. (iv) converges too quickly (10000 step may not be enough). (b) (i) doesn't converge to the correct answer in any reasonable time (e.g., for 1000000 steps for initializing at 10.0). (ii) converges to within 2 significant digits of the correct answer after 1000000 steps. (It even works if the initial value is set to 100 or -100, after a few million steps). See: http://www.cs.ubc.ca/spider/poole/demos/rl/q10.html (iii) It gets a reasonable approximation reasonably quickly. It is within one significant digit. (iv) It converges, but not to the correct answer. Even if 10,000 is replaced by 1,000,000 it doesn't seem to converge to the right answer. Question 2a See http://www.cs.ubc.ca/spider/poole/demos/rl/sarsaLambda.html Question 2b This was discussed in class. The brief answer is that Q[B,left] converges to 10 independently of the value of the parameters. When following the policy of going right in state B, Q[B,right] has a high value when it ends up in state A, and then it decays to zero. The height of this value is sensitive to the parameters lambda and alpha. If alpha is reduced (as it is supposed to be) there is no problem. |
![]() |
![]() |
Message no. 109 Posted by David Burns Cameron (s66878984) on Wednesday, February 23, 2005 10:53am Subject: race a robot car in the desert Hi All Have you heard about the DARPA Grand Challenge? It's a race across the California desert, but only for entirely robotic cars! It was last run in March 2004, and because no teams even came close to completing the course, it is being run next in October 2005. More background on the race and last year's running can be found here: http://en.wikipedia.org/wiki/DARPA_Grand_Challenge UBC has a team, originally started by the Mining Engineering school that has been working on a vehicle since last fall: http://www.ubcthunderbird.com/ . The roboticization of the vehicle is nearly complete, and the Discovery channel will be taping the first teleoperation test this weekend. But, remote-control won't cut it for the actual race and the software for the robot has yet to be written. If you're interested in applying AI techniques, this is a chance to do it on a real world project. The software challenges include the actual decision making, as well as integrating many different hardware and software sensing and actuation systems, and ensuring the whole thing runs in real time. Software team meetings happen Mondays at 5 o'clock in the Frank Forward Building, room 519a: http://www.maps.ubc.ca/PROD/index_detail.php?locat1=562 . If you're interested, post a reply here or track me down before or after class. I'm a keener up in the second row with brown hair, square black glasses and a red backpack. Dave |
![]() |
![]() |
Message no. 110 Posted by Guan Wang (s77942019) on Wednesday, February 23, 2005 1:03pm Subject: textbook exercises Hi, I tried doing some exercises: Ex.7.1 in page 278, and came up with {e,g},{h},and {d} as the set of minimal conflicts. Ex.9.1 page 343 a) {hunting},{robbing} are all minimal explanations of get(gun) b) {robbing},{hunting,banking} min. explanations of get(gun) ^goto(bank) c) If observe goto(forest) then you can remove {robbing} Do these look right? Thanks ! |
![]() |
![]() |
Message no. 111[Branch from no. 109] Posted by Christopher John Hawkins (s93985018) on Wednesday, February 23, 2005 2:04pm Subject: Re: race a robot car in the desert this sounds really interesting. are there any prerequisites for getting involved? |
![]() |
![]() |
Message no. 112[Branch from no. 110] Posted by Kaili Elizabeth Vesik (s83834010) on Wednesday, February 23, 2005 3:45pm Subject: Re: textbook exercises In no way am I saying that my answers are correct, but here's what I got: Exercise 7.1 (p 278) - I got the same answers as you did. Exercise 9.1 (p 343) - I got the same answers as you did for (a) and (b), but for (c), I think that if you observe puton(goodShoes), then you can remove {robbing}. My reasoning is this: If your minimal explanation is {robbing}, then you can infer get(gun) and goto(bank), neither of which could possibly lead to any contradictions. However, if you look at the other explanation, {hunting, banking}, then you can infer get(gun), goto(forest), and goto(bank). I noticed that goto(forest) might lead to a contradiction if we also have puton(goodShoes), so that is what I chose as my answer. If we observe puton(goodshoes) as well get(gun) ^ goto(bank), then we reach a "false" conclusion, which allows us to remove the {hunting, banking} explanation. Does anyone else have any opinions on these exercises? |
![]() |
![]() |
Message no. 113 Posted by Stephen Shui Fung Mak (s36743003) on Wednesday, February 23, 2005 3:54pm Subject: Questions about practice midterm 1. When the question says "show one step of something (Q-Learning,SARSA,etc)", what do I need to answer? Do I just write down the procedure and explains how each steps in one iteration would do? Is this how I answer the question? 2. Where can I find more information regarding alpha and lambda in SARSA(lambda)? I read the draft notes already and went through the alpha=1/k proofing yet I don't know how I can describe in words what it does. Also, I can seem to find reference related to lambda anywhere... 3. Ragarding the question about "explain why alpha_k should be reduced as a function of k.explain why you may not want to reduce alpha", I do not really understand the question. Can someone who have done this question give me some hints or reference on solving this problem? Thanks, Stephen |
![]() |
![]() |
Message no. 114[Branch from no. 113] Posted by Michael Nightingale (s98742018) on Wednesday, February 23, 2005 4:14pm Subject: Re: Questions about practice midterm I would also like to get some feedback on my answers for these two practice midterm questions, below are the questions that were asked above, and my responses to them: Explain why alpha_k should be reduced as function of k. Explain why you may not want to reduce alpha. - if alpha_k is a function of k, then each TD-error is given the same weight, allowing you to weigh more recent values accurately - you may not want to reduce if you do not want alpha_k as a function of k when the dynamics are non-stationary There are a number of parameters in SARSA-lambda: alpha, gamma, lambda. Explain what each does. Which one affects what is the correct answer? Which one affects whether it will converge? - α: learning rate, affects the convergence γ: how much the future value is discounted λ: rate at which the eligibility traces fade with each step, correctness |
![]() |
![]() |
Message no. 115[Branch from no. 113] Posted by David Poole (cpsc_422_term2) on Wednesday, February 23, 2005 7:38pm Subject: Re: Questions about practice midterm In message 113 on Wednesday, February 23, 2005 3:54pm, Stephen Shui Fung Mak writes: >1. When the question says "show one step of something (Q-Learning,SARSA,etc)", what >do I need to answer? Do I just write down the procedure and explains how each steps in >one iteration would do? Is this how I answer the question? You will be told exactly what is expected. You should expect to show what value is changed and how. We will give some values (Q or V) and ask you to show what values get changed. (I will only ask the details about Value iteration and/or Q-learning; you need to know the general idea of SARSA(lambda) though). >2. Where can I find more information regarding alpha and lambda in SARSA(lambda)? I >read the draft notes already and went through the alpha=1/k proofing yet I don't know >how I can describe in words what it does. Also, I can seem to find reference related to >lambda anywhere... alpha provides a way to average a number of values. For a detailed description of lambda, see the Sutton and Barto book references from the slides web page. >3. Ragarding the question about "explain why alpha_k should be reduced as a function of >k.explain why you may not want to reduce alpha", I do not really understand the >question. Can someone who have done this question give me some hints or reference on >solving this problem? That is what you should have learned doing question 1 of assignment 3. What happens when it is not reduced? That question gave 3 ways it could be reduced all with different properties. David |
![]() |
![]() |
Message no. 116[Branch from no. 110] Posted by David Poole (cpsc_422_term2) on Wednesday, February 23, 2005 7:49pm Subject: Re: textbook exercises Warning - don't read this till you have tried the Ex.7.1 in page 278, and Ex.9.1 page 343, as it gives away the answer..... In message 110 on Wednesday, February 23, 2005 1:03pm, Guan Wang writes: >Hi, > > I tried doing some exercises: > > Ex.7.1 in page 278, and came up with {e,g},{h},and {d} as the set of >minimal conflicts. That is what I got too. > > Ex.9.1 page 343 > a) {hunting},{robbing} are all minimal explanations of get(gun) Yes. > b) {robbing},{hunting,banking} min. explanations of >get(gun) ^goto(bank) Yes. > c) If observe goto(forest) then you can remove {robbing} No, not really. It adds the explanation {robbing,walking}. If you observed puton(goodShoes) then they can't be hunting. So that removes {hunting,banking} - it is no longer consistent. David > Do these look right? >Thanks ! |
![]() |
![]() |
Message no. 117[Branch from no. 114] Posted by David Poole (cpsc_422_term2) on Wednesday, February 23, 2005 8:25pm Subject: Re: Questions about practice midterm In message 114 on Wednesday, February 23, 2005 4:14pm, Michael Nightingale writes: >I would also like to get some feedback on my answers for these two >practice midterm questions, below are the questions that were asked >above, and my responses to them: > > >Explain why alpha_k should be reduced as function of k. Explain why you >may not want to reduce alpha. > >- if alpha_k is a function of k, then each TD-error is given the same >weight, allowing you to weigh more recent values accurately In assignment 3, question 1, there were 3 different functions of k, only one of which gave each TD-error the same weight. >- you may not want to reduce if you do not want alpha_k as a function of >k when the dynamics are non-stationary Right. (Do you know what the means?) >There are a number of parameters in SARSA-lambda: alpha, gamma, lambda. >Explain what each does. Which one affects what is the correct answer? >Which one affects whether it will converge? > >- α: learning rate, affects the convergence it also affects the correctness. > γ: how much the future value is discounted it affects what is the correct answer (i.e., correctness is defined in terms of gamma). > λ: rate at which the eligibility traces fade with each step, correctness lambda doesn't affect correctness. It only affects convergence. |
![]() |
![]() |
Message no. 118[Branch from no. 111] Posted by David Burns Cameron (s66878984) on Wednesday, February 23, 2005 10:05pm Subject: Re: race a robot car in the desert enthusiasm :) |
![]() |
![]() |
Message no. 119 Posted by Stephen Shui Fung Mak (s36743003) on Tuesday, March 1, 2005 5:14pm Subject: Anyone needs a group? Or any group needs member? I am looking for a group to join or people to form a group as I don't know anyone in this class. Please leave a message at this dicussion board if anyone is interested. Thanks. |
![]() |
![]() |
Message no. 120 Posted by Michael Chiang (s27992023) on Tuesday, March 1, 2005 11:07pm Subject: announcement: assignment marks Hi all, Marks for assignments #1 and #2 have been entered in the webct system. Assignment #3 marks as well as that of the midterm will be posted soon. Thanks for your patience, Michael |
![]() |
![]() |
Message no. 121 Posted by Samuel Douglas Davis (s85850014) on Wednesday, March 2, 2005 1:55am Subject: Project Description Could the project description please be posted on the website? I think this was handed out today but I was late and forgot to pick one up after class. Thanks, Sam |
![]() |
![]() |
Message no. 122[Branch from no. 121] Posted by David Poole (cpsc_422_term2) on Wednesday, March 2, 2005 6:39pm Subject: Re: Project Description In message 121 on Wednesday, March 2, 2005 1:55am, Samuel Douglas Davis writes: >Could the project description please be posted on the website? I think >this was handed out today but I was late and forgot to pick one up after >class. > >Thanks, >Sam It is available from the cs422 home page: http://www.cs.ubc.ca/spider/poole/cs422/2005/#assignments David |
![]() |
![]() |
Message no. 123 Posted by Michael Chiang (s27992023) on Wednesday, March 2, 2005 11:48pm Subject: assignment #3 and midterm marks Dear all, These marks are up, enjoy! Michael |
![]() |
![]() |
Message no. 124[Branch from no. 123] Posted by Dan Shu-Zan Liu (s80395015) on Thursday, March 3, 2005 1:51am Subject: Re: assignment #3 and midterm marks Is it just me? Non of my marks seem to be up. Maybe only the prof has the function in webct to allow the students to view their marks, and the TA's can only enter them in. |
![]() |
![]() |
Message no. 125[Branch from no. 124] Posted by Danelle Abra Wettstein (s86800018) on Thursday, March 3, 2005 3:00pm Subject: Re: assignment #3 and midterm marks None of my grades are up, either. (Not that I really want to see them) |
![]() |
![]() |
Message no. 126 Posted by Danelle Abra Wettstein (s86800018) on Thursday, March 3, 2005 3:26pm Subject: Midterm average What was the class average on the midterm, and will there be any scaling? |
![]() |
![]() |
Message no. 127[Branch from no. 124] Posted by David Poole (cpsc_422_term2) on Thursday, March 3, 2005 10:15pm Subject: Re: assignment #3 and midterm marks In message 124 on Thursday, March 3, 2005 1:51am, Dan Shu-Zan Liu writes: >Is it just me? >Non of my marks seem to be up. >Maybe only the prof has the function in webct to allow the students to >view their marks, and the TA's can only enter them in. Try it now. I changed some of the settings, but I can't really test them. David |
![]() |
![]() |
Message no. 128[Branch from no. 126] Posted by David Poole (cpsc_422_term2) on Thursday, March 3, 2005 10:19pm Subject: Re: Midterm average In message 126 on Thursday, March 3, 2005 3:26pm, Danelle Abra Wettstein writes: >What was the class average on the midterm, and will there be any scaling? The stats will be available on the marks page. Yes, there will be some scaling. I am not sure why the marks seem lower this year; it is perhaps that I rearranged the course and misjudged the difficultly of some of the topics. But I did tell you what was on the exam. I will post a solution to the midterm tomorrow. Please look at the solutions. One of the questions (perhaps reworded) will be on the final exam! David |
![]() |
![]() |
Message no. 129[Branch from no. 127] Posted by Samuel Douglas Davis (s85850014) on Thursday, March 3, 2005 10:23pm Subject: Re: assignment #3 and midterm marks >Try it now. I changed some of the settings, but I can't really test them. > >David > It still doesn't work. Sam |
![]() |
![]() |
Message no. 130 Posted by Robin McQuinn (s12331039) on Friday, March 4, 2005 11:31am Subject: Midterm/Lecture comments After looking over the midterm, and everything that I got wrong, I feel that many of the questions were a bit ambiguous. I realize this is a bit premature, not having the intended answers to compare to my own, but I really do feel like I know many of the concepts that the midterm thinks I don't. for example, it took me several readings of question 1b to incorrectly understand what was being asked. I though that "communication between layers and communication between time steps" reffered to the method and nature of the information passed between the layers of the controller. Now, I'm only sure whats NOT being asked (~horn clause, but incomplete to reason with) In a broader scope, I don't feel like the lectures prepare us well enough for implementing the algorithms in assignments or on tests. A significant portion of lecture time is spent on manipulating the Java applets, and testing them with different scenarios. The applets are very interesting but tend to consume a disproportionate amount of time in the lectures. The actual algorithms upon which the applets are based are covered in much less time than necessary. In short, I would suggest 3 things: Less time on applets in class, More time on Algorithms, Possible examples of specific cases in which the algorithms are applied (actually fitting numbers into the equation to calculate by hand the values for a specific scenario) Hope that helps Robin |
![]() |
![]() |
Message no. 131[Branch from no. 130] Posted by David Poole (cpsc_422_term2) on Saturday, March 5, 2005 11:05am Subject: Re: Midterm/Lecture comments In message 130 on Friday, March 4, 2005 11:31am, Robin McQuinn writes: >for example, it took me several readings of question 1b to incorrectly >understand what was being asked. I though that "communication between >layers and communication between time steps" reffered to the method and >nature of the information passed between the layers of the controller. >Now, I'm only sure whats NOT being asked (~horn clause, but incomplete >to reason with) This question was taken directly from the "what is on the midterm" web page. Did you look at this? >In a broader scope, I don't feel like the lectures prepare us well >enough for implementing the algorithms in assignments or on tests. A >significant portion of lecture time is spent on manipulating the Java >applets, and testing them with different scenarios. The applets are >very interesting but tend to consume a disproportionate amount of time >in the lectures. The actual algorithms upon which the applets are based >are covered in much less time than necessary. Most of these algorithms are very short. You also had to use them in the assignments. You were also told what you were expected to do. That same web page said you should be able to do one step of value iteration and steps of Q-learning. It also said you would use the game domain of assignment 2. The most imprtant part of these algorithms is the gestalt part; how a very simple control structure can give rise to complicated behaviour. Unfortunately it is difficult to ask questions about this in an exam. >In short, I would suggest 3 things: >Less time on applets in class, >More time on Algorithms, >Possible examples of specific cases in which the algorithms are applied >(actually fitting numbers into the equation to calculate by hand the >values for a specific scenario) OK. Thank you. For your comments. I appreciate the feedback. However, the aim of the lectures isn't to help you pass exams! It is to give you some background knowledge, to make you think about the possibilities and to motivate you to learn more. Unfortuately students want marks assigned (if not, they are more than welcome to sit in on lectures), so I have to think up questions, which indicate whether they have got the ideas. I even told you the essence of what was on the exam; on the ground that the details of some things was essential to understanding what was going on, even if for other things, you just need to get the main idea. What do others think? David >Hope that helps >Robin > |
![]() |
![]() |
Message no. 132[Branch from no. 131] Posted by Vivian Luk (s82215013) on Saturday, March 5, 2005 1:11pm Subject: Re: Midterm/Lecture comments I think Robin brought up many good points. Though we were told what will be covered on the midterm, the problem lies with the amount of practice we had for applying concepts/doing algorithms. In class, you manipulate Java applets. In assignments, we manipulate Java applets. In exams, there are no Java applets. It will be useful, as per Robin’s suggestion, to spend some more time on understanding underlying concepts and doing algorithms in class. I also felt that the midterm could have benefited from a second (or third) proofreading to ensure the wording is clear and easy to follow. With regards to Q3 on the midterm, I spent a lot of unnecessary time in understanding why there were 2 graphs instead of 1 and how they were connected (in space/time/etc). Looking at the midterm solutions, it’s painfully evident that, “ah, of course that’s the right answerâ€. I guess from a professor’s perspective, that is often what you feel. But as students who may not have been given adequate practice in class (on doing algorithms/applying concepts/etc), the exams/assignments become more difficult than it should be. Hope this helps, Vivian |
![]() |
![]() |
Message no. 133[Branch from no. 131] Posted by Sillard Jake Urbanovich (s82244013) on Saturday, March 5, 2005 6:16pm Subject: Re: Midterm/Lecture comments Midterm Feedback: I liked how the lectures tried to engage us in the subject matter, like the Java applets and when professor Poole brought some toys to class (robot control). In contrast, the midterm seemed very abstract to me. I came to the exam and I felt like I entered a Biology or French 200 exam. I could not use any skills/knowledge that I acquired in any previous computer science courses. Also, I agree with the previous posters about the wording of the questions. I might have gotten some questions completely wrong not because I didn't necessarily know the answer, but because the answer I gave answered a different question than what was asked. |
![]() |
![]() |
Message no. 134 Posted by Danelle Abra Wettstein (s86800018) on Sunday, March 6, 2005 2:35pm Subject: Propositions Okay... a bit confused. The glossary of the textbook gives the following statement: "Ground atoms denote propositions." When you look up ground, though, it states this: "A ground atom is one without any variables." This is all fine and dandy but, the notes say this: "A proposition is a boolean formula made from assignments of values to variables." I'm really, really confused how something that should be ground also uses the word 'variables' in it. Could you explain further, and perhaps give an example of a proposition? Thanks. |
![]() |
![]() |
Message no. 135[Branch from no. 134] Posted by Danelle Abra Wettstein (s86800018) on Sunday, March 6, 2005 2:55pm Subject: Re: Propositions Okay... I've mostly solved my confusion about this. Page 349 of the textbook was a big help. |
![]() |
![]() |
Message no. 136 Posted by Danelle Abra Wettstein (s86800018) on Sunday, March 6, 2005 3:38pm Subject: Understanding independence Could someone who wrote down the answers to questions on page 2 please let me know what they are? Thanks. |
![]() |
![]() |
Message no. 137[Branch from no. 119] Posted by Danelle Abra Wettstein (s86800018) on Sunday, March 6, 2005 3:55pm Subject: Re: Anyone needs a group? Or any group needs member? Hi Stephen, if you haven't yet formed a group, I have a group of 2 that needs more members. Let me know by emailing me at miznellie@shaw.ca |
![]() |
![]() |
Message no. 138 Posted by Danelle Abra Wettstein (s86800018) on Sunday, March 6, 2005 4:02pm Subject: Chapter 10, Lec 3 Can you explain how the two are independent given fire? It seems to be more like they are dependent, given fire, especially by the phrase "learning one can affect the other by changing your belief in fire". Thanks. |
![]() |
![]() |
Message no. 139 Posted by Danelle Abra Wettstein (s86800018) on Sunday, March 6, 2005 4:16pm Subject: d-seperation Can you explain this concept a little further? The B E Z etc is confusing me... what is Z and how is B part of it? Thanks. |
![]() |
![]() |
Message no. 140[Branch from no. 128] Posted by Danelle Abra Wettstein (s86800018) on Sunday, March 6, 2005 4:47pm Subject: Re: Midterm average Have the solutions been posted? |
![]() |
![]() |
Message no. 141 Posted by Danelle Abra Wettstein (s86800018) on Sunday, March 6, 2005 5:26pm Subject: Assignment 4 - 1a For conditional probabilities, do you do a probability for every node, for every value of its parents? (ie, a LOT of probabilities) You have an example in the notes, but I can't tell if it's just very incomplete or if I'm doing this wrong. TIA |
![]() |
![]() |
Message no. 142[Branch from no. 138] Posted by Samuel Douglas Davis (s85850014) on Sunday, March 6, 2005 6:03pm Subject: Re: Chapter 10, Lec 3 In message 138 on Sunday, March 6, 2005 4:02pm, Danelle Abra Wettstein writes: >Can you explain how the two are independent given fire? It seems to be more like they >are dependent, given fire, especially by the phrase "learning one can affect the other by >changing your belief in fire". > >Thanks. If fire is not given, then observing smoke might increase your belief in fire, which in turn increases your belief in alarm, and the 2 are thus dependent. If fire is given, it means you know for certain whether fire is true, so nothing that you learn about alarm or smoke can change your belief in fire, and there is no way that changing your belief in smoke can affect your belief in alarm, or vice versa. To put it another way, if you *know* that fire is true, then you expect alarm to be true with a certain probability, and this probability is not affected by whether or not you observe smoke. Sam |
![]() |
![]() |
Message no. 143[Branch from no. 139] Posted by Samuel Douglas Davis (s85850014) on Sunday, March 6, 2005 6:17pm Subject: Re: d-seperation In message 139 on Sunday, March 6, 2005 4:16pm, Danelle Abra Wettstein writes: >Can you explain this concept a little further? The B E Z etc is confusing me... what is Z >and how is B part of it? > >Thanks. I found this slide confusing because of the way the definition of a path is sort of inserted into the definition of d-separation; I think it might be clearer if path were defined first. If I understand correctly, X, Y, and Z are sets of variables, and A, B and C are individual variables. Z is the set of variables that are given, so "B in Z" means B is given, and "B not in Z" means B is not given. I think this slide is just stating the ideas of the 3 previous slides more formally. I'm curious what the d in d-separation stands for. HTH, Sam |
![]() |
![]() |
Message no. 144[Branch from no. 134] Posted by David Poole (cpsc_422_term2) on Monday, March 7, 2005 12:44pm Subject: Re: Propositions In message 134 on Sunday, March 6, 2005 2:35pm, Danelle Abra Wettstein writes: >I'm really, really confused how something that should be ground also uses the >word 'variables' in it. Could you explain further, and perhaps give an example of a >proposition? It is because logicians and probabilists use the term "variable" for different things. A random variable is not (what the logicians call) a variable. In probability, a variable (often called a random variable, but this is a misnomer as there is nothing random about them) is like an algebraic variable, that can take on a value, such as "todays maximum temperature" which could take on integer values, say. Then "todays maximum temperature = 14" is a proposition that is true or false. This is the same sort of "variable" you saw in CPSC 322 when you did CSPs. This is contrased to a logical variable which denotes an individual in a domian. These are then quantified to spcify whether you want a formula true for all indivivuals or whether it is true if these exists an individual for which the variable is true. When teaching this we have two choices: (a) we use the traditional notation and try to be clear when we are talking about a random variable or a logical variable or (b) we try to think up different names for the two different concepts. Unfortunately neither choice is very satisfactory as the course doesn't sit in isolation from the rest of what you have learnt or will learn. David |
![]() |
![]() |
Message no. 145[Branch from no. 141] Posted by David Poole (cpsc_422_term2) on Monday, March 7, 2005 2:55pm Subject: Re: Assignment 4 - 1a In message 141 on Sunday, March 6, 2005 5:26pm, Danelle Abra Wettstein writes: >For conditional probabilities, do you do a probability for every node, for every value of its >parents? (ie, a LOT of probabilities) You have an example in the notes, but I can't tell if >it's just very incomplete or if I'm doing this wrong. > >TIA Yes. The number of parameters is exponential in the number of parents. This only works well if there are few parameters (i.e., there is lots of conditional independencies). David |
![]() |
![]() |
Message no. 146[Branch from no. 143] Posted by David Poole (cpsc_422_term2) on Monday, March 7, 2005 9:30pm Subject: Re: d-seperation In message 143 on Sunday, March 6, 2005 6:17pm, Samuel Douglas Davis writes: >In message 139 on Sunday, March 6, 2005 4:16pm, Danelle Abra Wettstein >writes: >>Can you explain this concept a little further? The B E Z etc is >confusing me... what is Z >>and how is B part of it? >> >>Thanks. > >I found this slide confusing because of the way the definition of a path >is sort of inserted into the definition of d-separation; I think it >might be clearer if path were defined first. The trouble with doing it that way is that the notion of a path depends on what is observed (the Z's). Z is the set of observed variables. >If I understand correctly, X, Y, and Z are sets of variables, and A, B >and C are individual variables. Z is the set of variables that are >given, so "B in Z" means B is given, and "B not in Z" means B is not >given. I think this slide is just stating the ideas of the 3 previous >slides more formally. Exactly. >I'm curious what the d in d-separation stands for. d stands for "directed" There is a standard notion of separation for undiretced graphs. X and Y are separated by Z, where X,Y,Z are sets of variables if every path from an element of X to an element of Y contains an element of Z. |
![]() |
![]() |
Message no. 147[Branch from no. 131] Posted by Kaili Elizabeth Vesik (s83834010) on Tuesday, March 8, 2005 8:26am Subject: Re: Midterm/Lecture comments Snippet from David's message #131: This question was taken directly from the "what is on the midterm" web page. Did you look at this? ------------- While it was rather exciting to be given a superset of the questions that would be on the exam, it was also a fairly large drawback. That is, even though there were many questions that I couldn't do even with all of my notes and unlimited time while practicing (let alone trying to do them in an exam situation), I couldn't very well ask for help, because that would mean getting answers to the test questions. Perhaps a more advantageous way to express your kindness would be to provide a practice exam with solutions, instead of the actual questions on the exam. That way we would still have an idea of the types of things that would be required of us, but if we had any problems, at least we would find out prior to the exam how to fix them. Kaili |
![]() |
![]() |
Message no. 148 Posted by David Poole (cpsc_422_term2) on Tuesday, March 8, 2005 9:03pm Subject: Solution to miderm CPSC 422 Midterm Solution March 2005 Question 1 (a) The belief state consists of Q[S,A], s, a Observe s', r One possible control function is do a = argmax_a' Q[s',a'] with probability epsilon random action with probability 1-epsilon State Transition function Q[s,a] = Q[s,a] + alpha( r + gamma max_a' Q[s',a'] s = s' and remember the action it did (i.e., the "a" it selected in the control function). [Answers much simpler than this got full marks, as long as they had the right idea] (b) between time steps is handled by fluents (using the relations assign and was) between layes is handled by sharing predicate symbols (predicated defined in one layer can be used in another). Question 2. There are 15 possible states that could be entered, depending on which direction the robot actually went (up, left or right) and whether the treasure arrived, and where it arrived. Those that have a non-zero immediate reward and/or a future value give: Q[s13,a2] = 0.8 * 0.8 * ( 0 + 0.9 * 2) -- up, no treasure + 0.8 * 0.2 * 0.25 * ( 0 + 0.9 * 7) -- up, treasure at top right + 0.1 * 0.8 * ( 0.2 * -10 + 0.9 * 0) -- left, no treasure + 0.1 * 0.2 * ( 0.2 * -10 + 0.9 * 0) -- left, treasure appears + 0.1 * 0.2 * 0.25 (10 + 0.9*0) -- right, treasure appears there every other value is 0. The most common mistake was confusing the immediate reward and the estimated future value. Question 3 (a) Q[s4,right]=10 (b) Q[s2,right] = -10 /2 = -5 Q[s3,right]=0.9*10/2 = 4.5 Q[s4,right]=10+0.5*(10-10)=10 (c) Q[s1,right], q[s2,right], q[s3,right], q[s4,right] all get their values updated when it received the reward of 10 (i.e., when entering s5). (d) the first time, it only uses the new value (i.e, when k=1, alpha=1). It is guaranteed to converge (we know that as it is between 1/k and 10*1/k) More recent values are assigned higher weight than old values. Question 4 (a) {at(2,0)} {at(4,0)} {at(6,0)} {at(7,0)} (b) {at(2,0),do(right,0),do(right,1)} {at(4,0),do(right,0),do(right,1)} (c) explain(observe(door,0) & do(right,0) & observe(nodoor,1) & do(right,1) & observe(door, 2) & do(right,2) & at(L,3), E). |
![]() |
![]() |
Message no. 149 Posted by Stanley Chi Hong Tso (s58635020) on Wednesday, March 9, 2005 8:56pm Subject: assignment 4 - 1a the problem I have is, if the student don't know how to do bit carry and bit addition, he should do a guess, but he should have the even lower probability getting the answer right. is that make sense? I have the DoProblemWithGuess as a variable, do I have to make another one like 'understandingMaterial' so that it also affect the outcome? I'm not sure what I'm asking haha. but well I get all works fine but just the part when a student doesn't know both, what would be the affect of it? Stan |
![]() |
![]() |
Message no. 150[Branch from no. 149] Posted by David Poole (cpsc_422_term2) on Wednesday, March 9, 2005 10:35pm Subject: Re: assignment 4 - 1a In message 149 on Wednesday, March 9, 2005 8:56pm, Stanley Chi Hong Tso writes: >the problem I have is, if the student don't know how to do bit carry and bit addition, he >should do a guess, but he should have the even lower probability getting the answer right. > >is that make sense? Not really. If they guess, they have a 50-50 chance of getting any bit correct. >I have the DoProblemWithGuess as a variable, do I have to make >another one like 'understandingMaterial' so that it also affect the outcome? There are two parts of their understanding. Understanding basic arithmetic and understanding the carry. The guess means the conditional probability of an output bit given they don't understand is 0.5. If they do understand the conditional probability of getting the correct anser is much higher (but not 1 as students do make mistakes even if they know the material). >I'm not sure what I'm asking haha. but well I get all works fine but just the part when a >student doesn't know both, what would be the affect of it? Carrying and Adding affect different parts. For example, the value of C_0 is only affacted by their understanding of the basic arithmetic. They don't need to understand carrying to get this answer. However, they need to undertsand carrying to compute the carry bit that is needed to compute C_1 (but given the carry bit, they only need to understand basic arithmetic). I hope this helps. David >Stan |
![]() |
![]() |
Message no. 151 Posted by Michael Chiang (s27992023) on Wednesday, March 9, 2005 10:38pm Subject: tomorrow's TA hour moved Hi all, Unfortuntately it doesn't seem that I'd be able to attend to my TA hour tomorrow (10th March), as I'm feeling quite ill. I will notify again upon recovery about running an extra TA hour some time. Sorry for the late notice and any inconvenience this may cause, Michael |
![]() |
![]() |
Message no. 152[Branch from no. 151] Posted by Stephen Shui Fung Mak (s36743003) on Wednesday, March 9, 2005 11:59pm Subject: Re: tomorrow's TA hour moved Will it be possible that you can hold your office hour on Monday then? I just have some quesiton about the assignment that I want to ask... |
![]() |
![]() |
Message no. 153 Posted by Danelle Abra Wettstein (s86800018) on Thursday, March 10, 2005 9:11pm Subject: "Guess"? How do we emulate the "guess" in our graph? Can we just assume that if they guess, they get it wrong? :) |
![]() |
![]() |
Message no. 154[Branch from no. 153] Posted by David Poole (cpsc_422_term2) on Friday, March 11, 2005 8:15am Subject: Re: "Guess"? In message 153 on Thursday, March 10, 2005 9:11pm, Danelle Abra Wettstein writes: >How do we emulate the "guess" in our graph? Can we just assume that if they guess, >they get it wrong? :) No. Sometimes they guess right. Just make it so that they can pick a 1 or a 0 with uniform probability. The conditional probabilities can model any distribution. David |
![]() |
![]() |
Message no. 155[Branch from no. 154] Posted by Stanley Chi Hong Tso (s58635020) on Saturday, March 12, 2005 10:14pm Subject: Re: "Guess"? in the other word, its about the probability of getting it right. So guess make eg C0 become .5 as getting it or make a mistake. Stan |
![]() |
![]() |
Message no. 156 Posted by Stanley Chi Hong Tso (s58635020) on Saturday, March 12, 2005 10:18pm Subject: Probability question. Reguarding the the notes, the example David gave us about P(ABCDEFG), we break it initially by P(G|ABCDEF) * P(F|ABCDE) * P(C|ABDE) * P(D|ABE) * P(E|AB) * P(A|B) * P (B) I wonder how P(D|ABE) or P(E|AB) exist? why D isn't only given the condition on only E? why P(D|ABE) != P(D|E) ???? Stan |
![]() |
![]() |
Message no. 157 Posted by Danelle Abra Wettstein (s86800018) on Sunday, March 13, 2005 2:57pm Subject: Notes on summing What is the difference between P(Z|Y1 = y1, Y2 = y2...,YK = yK) and P(Z,Y1 = y1, Y2 = y2, .., YK = yK)? |
![]() |
![]() |
Message no. 158 Posted by Danelle Abra Wettstein (s86800018) on Sunday, March 13, 2005 3:44pm Subject: Question 2, a I get P(E) = 2 and P(not E) = 2 for this answer... what am I doing wrong? I have followed the chart way of doing this, given in class. |
![]() |
![]() |
Message no. 159 Posted by Guan Wang (s77942019) on Sunday, March 13, 2005 5:57pm Subject: Assignt 4 Question 1:Do I have to model all the letters as nodes(that is A0, A1, B0, B1, C0, C1, C2) + the knows guess nodes? Is this the way of solving this question? Question 2b)Querying p(e|~f) means F is false. I dont understand why according to online applet we have initial factor p(c) instead p(f|c). Thanks, Guan |
![]() |
![]() |
Message no. 160[Branch from no. 155] Posted by David Poole (cpsc_422_term2) on Sunday, March 13, 2005 8:39pm Subject: Re: "Guess"? In message 155 on Saturday, March 12, 2005 10:14pm, Stanley Chi Hong Tso writes: >in the other word, its about the probability of getting it right. So guess make eg C0 >become .5 as getting it or make a mistake. > > >Stan Right. For C0, there are two values. If you guess with a 0.5 chance for each there is a 50% chance of getting it right. It is like a multiple choice exam. If there are 4 alternatives, and you guess each one, you would expect to get a grade of 25%. I hope that helps, David |
![]() |
![]() |
Message no. 161[Branch from no. 156] Posted by David Poole (cpsc_422_term2) on Sunday, March 13, 2005 8:42pm Subject: Re: Probability question. In message 156 on Saturday, March 12, 2005 10:18pm, Stanley Chi Hong Tso writes: >Reguarding the the notes, the example David gave us about P(ABCDEFG), we break it >initially by P(G|ABCDEF) * P(F|ABCDE) * P(C|ABDE) * P(D|ABE) * P(E|AB) * P(A|B) * P(B) This is just a theorem of probability theory. >I wonder how P(D|ABE) or P(E|AB) exist? why D isn't only given the condition on only E? >why P(D|ABE) != P(D|E) ???? This is the assumption made in a belief network (a Bayes net): a node is independent of its predecessors given its parents. David >Stan |
![]() |
![]() |
Message no. 162[Branch from no. 157] Posted by David Poole (cpsc_422_term2) on Sunday, March 13, 2005 8:44pm Subject: Re: Notes on summing In message 157 on Sunday, March 13, 2005 2:57pm, Danelle Abra Wettstein writes: >What is the difference between P(Z|Y1 = y1, Y2 = y2...,YK = yK) and P(Z,Y1 = y1, Y2 = >y2, .., YK = yK)? The first is a conditional probability and the second is the probability of a conjunction. [Do you know what this means? Read the book/notes, and if it still doesn't make sense, please aks.] You can easily compute the first from the second. David |
![]() |
![]() |
Message no. 163[Branch from no. 158] Posted by David Poole (cpsc_422_term2) on Sunday, March 13, 2005 8:46pm Subject: Re: Question 2, a In message 158 on Sunday, March 13, 2005 3:44pm, Danelle Abra Wettstein writes: >I get P(E) = 2 and P(not E) = 2 for this answer... what am I doing wrong? I have followed >the chart way of doing this, given in class. I have no idea. But this is wrong. The algorithm I gave *always* produces number in the range [0.,1] as it is the linear interpolation of numbers in this range. Did you check your calculations using the applet? David |
![]() |
![]() |
Message no. 164[Branch from no. 159] Posted by David Poole (cpsc_422_term2) on Sunday, March 13, 2005 8:48pm Subject: Re: Assignt 4 In message 159 on Sunday, March 13, 2005 5:57pm, Guan Wang writes: >Question 1:Do I have to model all the letters as nodes(that is A0, A1, >B0, B1, C0, C1, C2) + the knows guess nodes? Is this the way of >solving this question? Yes, this is *a* way of solving the problem. It seems like a reasonable way as then you can set observations easily. >Question 2b)Querying p(e|~f) means F is false. I dont understand why >according to online applet we have initial factor p(c) instead p(f|c). Because you have observed F=false. It is just a function of C. David >Thanks, >Guan |
![]() |
![]() |
Message no. 165 Posted by David Burns Cameron (s66878984) on Sunday, March 13, 2005 10:17pm Subject: Assn 4 Q1 assumptions I'm tempted to assume that the value of a carried digit depends only on whether or not the student knows how to carry, and not whether or not the student knows binary addition. But this doesn't quite seem right, since carrying is the step you do after you have correctly done the addition step, if it is necessary. However, if I model the carry digit as depending on whether or not the student knows binary addition and whether or not the student knows how to carry, then that makes one variable dependent on 5 variables, and therefore having 2^5 probabilities. Many are the same, but this still feels excessive. I guess we are trying to model a person though, so things should get a little complicated. What does everyone else think? Dave |
![]() |
![]() |
Message no. 166 Posted by David Burns Cameron (s66878984) on Sunday, March 13, 2005 10:28pm Subject: Assn 4 Q1 prior probabilities Since we don't know whether the student knows binary addition or not what is a reasonable prior probability for it? Should we just guess, given no further information? Dave |
![]() |
![]() |
Message no. 167[Branch from no. 160] Posted by Danelle Abra Wettstein (s86800018) on Sunday, March 13, 2005 11:17pm Subject: Re: "Guess"? Yes, definitely. Thanks both of you! |
![]() |
![]() |
Message no. 168[Branch from no. 165] Posted by Danelle Abra Wettstein (s86800018) on Sunday, March 13, 2005 11:20pm Subject: Re: Assn 4 Q1 assumptions I made them independent of one another... ie, it's possible to not know how to carry even if they do know how to add, and the carry is only dependent on whether the person knows how to carry, not if he/she knows addition. |
![]() |
![]() |
Message no. 169[Branch from no. 165] Posted by Samuel Douglas Davis (s85850014) on Sunday, March 13, 2005 11:23pm Subject: Re: Assn 4 Q1 assumptions In message 165 on Sunday, March 13, 2005 10:17pm, David Burns Cameron writes: >I'm tempted to assume that the value of a carried digit depends only on >whether or not the student knows how to carry, and not whether or not >the student knows binary addition. But this doesn't quite seem right, >since carrying is the step you do after you have correctly done the >addition step, if it is necessary. I think it depends on what is meant by "knowing how to carry." I assumed that carrying is just a function from the inputs to a carry bit, so you carry always, not just when it is necessary (ie. sometimes you carry a 0). In this case carrying should be independent of whether the student knows how to do binary addition. I suppose "knowing how to carry" could be defined as knowing what to do with the extra bit you get when the sum is greater than 1, in which case things do get messy. It seems like the answer actually depends on how the student does addition (either as 2 independent adding and carrying operations or 1 combined operation), and this could be different for different people. The way the question is phrased, it sounds to me like we should assume they are independent, but I'm not certain. Sam |
![]() |
![]() |
Message no. 170[Branch from no. 169] Posted by Daniel Joseph Anderson (s76045996) on Monday, March 14, 2005 12:12am Subject: Re: Assn 4 Q1 assumptions Consider this: carrying is the operation of, given a value that is too large to fit in one digit, subtracting some amount and compensating it elsewhere to make it fit in the digit in question. Not knowing how to add does not prevent the number under consideration from (correctly or in-) being larger than the digit in question can hold. Thus "I have a number that won't fit, I have to do something with it" can result in CORRECT carrying from addition operations of unknown correctness. If you make different assumptions - eg. can't know how to carry if don't know how to add - it dramatically decreases the size of the model. Since this is a "from real life" example, though, choosing the model is at least as important as getting the numbers right. Which model is right, I can't say - due in no small part to not being absolutely certain. *grin* |
![]() |
![]() |
Message no. 171[Branch from no. 169] Posted by Daniel Joseph Anderson (s76045996) on Monday, March 14, 2005 12:16am Subject: Re: Assn 4 Q1 assumptions Oh, and Samuel: if you assume that carrying is an operation performed even when there is no carry bit, you do realize that with correct addition, 00 + 01 gives a potentially incorrect result, right? (I was originally going to model it that way - nice and simple - but then realized that most people don't think about carrying 0s, and so it's maybe not very realistic to act as if they do.) |
![]() |
![]() |
Message no. 172[Branch from no. 171] Posted by Daniel Joseph Anderson (s76045996) on Monday, March 14, 2005 12:48am Subject: Re: Assn 4 Q1 assumptions And, just to be spammy: note that the problem can be much more efficiently stated if the correct answer is stored. Then info (don't want to give it away entirely) dictates whether or not knowing/not knowing comes into play for carrying, giving a T/F for whether that matters, which then acts upon the true value to give the value that the student in question would achieve. So you get a few extra nodes, roughly the same number of arcs, but vastly a vastly smaller total for entries in probability tables. Too bad I've already done it the long way - that would've been easier. |
![]() |
![]() |
Message no. 173[Branch from no. 170] Posted by Samuel Douglas Davis (s85850014) on Monday, March 14, 2005 1:15am Subject: Re: Assn 4 Q1 assumptions >If you make different assumptions - eg. can't know how to carry if don't know how to >add - it dramatically decreases the size of the model. Since this is a "from real life" >example, though, choosing the model is at least as important as getting the numbers >right. Which model is right, I can't say - due in no small part to not being absolutely >certain. *grin* I don't think there was any suggestion that the student's knowledge of how to carry should be dependent on their knowledge of addition. The question is whether getting a wrong answer in the addition step affects the chance of getting the carry bit right. |
![]() |
![]() |
Message no. 174[Branch from no. 165] Posted by David Poole (cpsc_422_term2) on Monday, March 14, 2005 10:21am Subject: Re: Assn 4 Q1 assumptions In message 165 on Sunday, March 13, 2005 10:17pm, David Burns Cameron writes: >I'm tempted to assume that the value of a carried digit depends only on >whether or not the student knows how to carry, and not whether or not >the student knows binary addition. But this doesn't quite seem right, >since carrying is the step you do after you have correctly done the >addition step, if it is necessary. > >However, if I model the carry digit as depending on whether or not the >student knows binary addition and whether or not the student knows how >to carry, then that makes one variable dependent on 5 variables, and >therefore having 2^5 probabilities. Many are the same, but this still >feels excessive. It is. You only need 4 parents. If the student needs to know both, you could create a parent that says they know both (and has both as a parent). Or youy could make knowing how to carry depend on knowing addition. Which make more sense depends on your semantics for the various variables. [Eventually we want to get to the stage that the decisions are based on how the world works, not about the tool.] David >I guess we are trying to model a person though, so things should get a >little complicated. What does everyone else think? > >Dave |
![]() |
![]() |
Message no. 175[Branch from no. 166] Posted by David Poole (cpsc_422_term2) on Monday, March 14, 2005 10:22am Subject: Re: Assn 4 Q1 prior probabilities In message 166 on Sunday, March 13, 2005 10:28pm, David Burns Cameron writes: >Since we don't know whether the student knows binary addition or not >what is a reasonable prior probability for it? Should we just guess, >given no further information? > >Dave For the moment, just guess, We will discuss how to learn probabilities from data later. David |
![]() |
![]() |
Message no. 176[Branch from no. 169] Posted by David Poole (cpsc_422_term2) on Monday, March 14, 2005 10:26am Subject: Re: Assn 4 Q1 assumptions > The way the question is phrased, it sounds to me like >we should assume they are independent, but I'm not certain. I was trying to phrase the question to not prejudge any answer, but for you to think about the domain (as you have done) and to make a choice that seems reasonable. I will post a solution, but there is no right answer. One of the things to learn is that modelling a domain is non-trivial (even a seemingly trivial example), but once modelled, we can answer interesting questions. David |
![]() |
![]() |
Message no. 177 Posted by David Poole (cpsc_422_term2) on Monday, March 14, 2005 11:58am Subject: project proposal feeback Everyone should have received email commenting on your porposal. If you didn't receive an email, please send me an email containing your proposal. In general we want a project that is of managable size so that you can tell us in the presentation: here is one thing that we tried and it did/didn't work, and we learned... David |
![]() |
![]() |
Message no. 178 Posted by David Poole (cpsc_422_term2) on Wednesday, March 16, 2005 1:14pm Subject: Relevant talk tomorrow There is an invited speaker talking tomorrow immediatley before our class. David Forsyth will be talking about tracking people. This is closely related to what we have been covering in class. See: http://www.cs.ubc.ca/~rbridson/EASS/#mar17 Some of you may find this interesting. David |
![]() |
![]() |
Message no. 179 Posted by David Poole (cpsc_422_term2) on Wednesday, March 16, 2005 2:23pm Subject: web page on RL I just came across he following web page on " Common myths and misstatements about reinforcement learning The ambition" http://neuromancer.eecs.umich.edu/cgi-bin/twiki/view/Main/MythsofRL It may make interesting reading given your projects. You should be able to understand much of it (it isn't as technically complicated as many other pages). David |
![]() |
![]() |
Message no. 180 Posted by Michael Chiang (s27992023) on Monday, March 21, 2005 2:41pm Subject: project consultation with TAs Hi all, Below are available time slots for meeting with either Frank or I for project consultation. Here are instructions for choosing slots: (1) Each group are allowed 2 slots of 20 minutes per week leading up to the due date. The chosen slots will be fixed for this period. (2) Choose two empty slots by putting the name or student number of ONE of your group members in the box to the left, and repost the modified list to this thread using the reply function. We will use the latest completed version of this list, and please do not alter the choices of other groups! ---------------------------------------- Frank's slots: @ room #341 [ ] - Mon 1 ~ 1.20pm [ ] - Mon 1.20 ~ 1.40pm [ ] - Mon 1.40 ~ 2pm [ ] - Mon 2 ~ 2.20pm [ ] - Mon 2.20 ~ 2.40pm [ ] - Mon 2.40 ~ 3pm [ ] - Mon 3 ~ 3.20pm [ ] - Wed 9.50am ~ 10.10am [ ] - Wed 10.10 ~ 10.30am [ ] - Wed 10.30 ~ 10.50am [ ] - Fri 1 ~ 1.20pm [ ] - Fri 1.20 ~ 1.40pm [ ] - Fri 1.40 ~ 2pm [ ] - Fri 2 ~ 2.20pm [ ] - Fri 2.20 ~ 2.40pm [ ] - Fri 2.40 ~ 3pm [ ] - Fri 3 ~ 3.20pm [ ] - Fri 3.20 ~ 3.40pm [ ] - Fri 3.40 ~ 4pm [ ] - Fri 4 ~ 4.20pm ----------------------------------------------- Mike's slots: @ room # 206 (student learning centre) [ ] - Tue 1 ~ 1.20pm [ ] - Tue 1.20 ~ 1.40pm [ ] - Tue 1.40 ~ 2pm [ ] - Tue 2 ~ 2.20pm [ ] - Tue 2.20 ~ 2.40pm [ ] - Tue 2.40 ~ 3pm [ ] - Tue 3 ~ 3.20pm [ ] - Tue 3.20 ~ 3.40pm [ ] - Tue 3.40 ~ 4pm [ ] - Tue 4 ~ 4.20pm [ ] - Thu 1 ~ 1.20pm [ ] - Thu 1.20 ~ 1.40pm [ ] - Thu 1.40 ~ 2pm [ ] - Thu 2 ~ 2.20pm [ ] - Thu 2.20 ~ 2.40pm [ ] - Thu 2.40 ~ 3pm [ ] - Thu 3 ~ 3.20pm [ ] - Thu 3.20 ~ 3.40pm [ ] - Thu 3.40 ~ 4pm [ ] - Thu 4 ~ 4.20pm |
![]() |
![]() |
Message no. 181[Branch from no. 180] Posted by David Burns Cameron (s66878984) on Monday, March 21, 2005 8:35pm Subject: Re: project consultation with TAs ---------------------------------------- Frank's slots: @ room #341 [ ] - Mon 1 ~ 1.20pm [ ] - Mon 1.20 ~ 1.40pm [ ] - Mon 1.40 ~ 2pm [ ] - Mon 2 ~ 2.20pm [ ] - Mon 2.20 ~ 2.40pm [ ] - Mon 2.40 ~ 3pm [ ] - Mon 3 ~ 3.20pm [ ] - Wed 9.50am ~ 10.10am [ ] - Wed 10.10 ~ 10.30am [ ] - Wed 10.30 ~ 10.50am [ ] - Fri 1 ~ 1.20pm [ ] - Fri 1.20 ~ 1.40pm [ ] - Fri 1.40 ~ 2pm [ ] - Fri 2 ~ 2.20pm [ ] - Fri 2.20 ~ 2.40pm [ ] - Fri 2.40 ~ 3pm [ ] - Fri 3 ~ 3.20pm [ ] - Fri 3.20 ~ 3.40pm [ ] - Fri 3.40 ~ 4pm [ ] - Fri 4 ~ 4.20pm ----------------------------------------------- Mike's slots: @ room # 206 (student learning centre) [ ] - Tue 1 ~ 1.20pm [ ] - Tue 1.20 ~ 1.40pm [ ] - Tue 1.40 ~ 2pm [ ] - Tue 2 ~ 2.20pm [ ] - Tue 2.20 ~ 2.40pm [ ] - Tue 2.40 ~ 3pm [ ] - Tue 3 ~ 3.20pm [ ] - Tue 3.20 ~ 3.40pm [ ] - Tue 3.40 ~ 4pm [ ] - Tue 4 ~ 4.20pm [ Dave Cameron ] - Thu 1 ~ 1.20pm [ Dave Cameron ] - Thu 1.20 ~ 1.40pm [ ] - Thu 1.40 ~ 2pm [ ] - Thu 2 ~ 2.20pm [ ] - Thu 2.20 ~ 2.40pm [ ] - Thu 2.40 ~ 3pm [ ] - Thu 3 ~ 3.20pm [ ] - Thu 3.20 ~ 3.40pm [ ] - Thu 3.40 ~ 4pm [ ] - Thu 4 ~ 4.20pm |
![]() |
![]() |
Message no. 182[Branch from no. 181] Posted by Ryan Yee (s81483042) on Tuesday, March 22, 2005 2:57pm Subject: Re: project consultation with TAs ---------------------------------------- Frank's slots: @ room #341 [ ] - Mon 1 ~ 1.20pm [ ] - Mon 1.20 ~ 1.40pm [ ] - Mon 1.40 ~ 2pm [ ] - Mon 2 ~ 2.20pm [ ] - Mon 2.20 ~ 2.40pm [ ] - Mon 2.40 ~ 3pm [ ] - Mon 3 ~ 3.20pm [ ] - Wed 9.50am ~ 10.10am [ ] - Wed 10.10 ~ 10.30am [ ] - Wed 10.30 ~ 10.50am [ ] - Fri 1 ~ 1.20pm [ ] - Fri 1.20 ~ 1.40pm [ ] - Fri 1.40 ~ 2pm [ ] - Fri 2 ~ 2.20pm [ ] - Fri 2.20 ~ 2.40pm [ ] - Fri 2.40 ~ 3pm [ ] - Fri 3 ~ 3.20pm [ ] - Fri 3.20 ~ 3.40pm [ ] - Fri 3.40 ~ 4pm [ ] - Fri 4 ~ 4.20pm ----------------------------------------------- Mike's slots: @ room # 206 (student learning centre) [ ] - Tue 1 ~ 1.20pm [ ] - Tue 1.20 ~ 1.40pm [ ] - Tue 1.40 ~ 2pm [ ] - Tue 2 ~ 2.20pm [ ] - Tue 2.20 ~ 2.40pm [ ] - Tue 2.40 ~ 3pm [ ] - Tue 3 ~ 3.20pm [ ] - Tue 3.20 ~ 3.40pm [ ] - Tue 3.40 ~ 4pm [ ] - Tue 4 ~ 4.20pm [ Dave Cameron ] - Thu 1 ~ 1.20pm [ Dave Cameron ] - Thu 1.20 ~ 1.40pm [ ] - Thu 1.40 ~ 2pm [ Ryan Yee ] - Thu 2 ~ 2.20pm [ Ryan Yee ] - Thu 2.20 ~ 2.40pm [ ] - Thu 2.40 ~ 3pm [ ] - Thu 3 ~ 3.20pm [ ] - Thu 3.20 ~ 3.40pm [ ] - Thu 3.40 ~ 4pm [ ] - Thu 4 ~ 4.20pm |
![]() |
![]() |
Message no. 183[Branch from no. 182] Posted by Dan Shu-Zan Liu (s80395015) on Tuesday, March 22, 2005 3:26pm Subject: Re: project consultation with TAs ---------------------------------------- Frank's slots: @ room #341 [ ] - Mon 1 ~ 1.20pm [ ] - Mon 1.20 ~ 1.40pm [ ] - Mon 1.40 ~ 2pm [ ] - Mon 2 ~ 2.20pm [ ] - Mon 2.20 ~ 2.40pm [ ] - Mon 2.40 ~ 3pm [ ] - Mon 3 ~ 3.20pm [ ] - Wed 9.50am ~ 10.10am [ ] - Wed 10.10 ~ 10.30am [ ] - Wed 10.30 ~ 10.50am [ ] - Fri 1 ~ 1.20pm [ ] - Fri 1.20 ~ 1.40pm [ ] - Fri 1.40 ~ 2pm [ ] - Fri 2 ~ 2.20pm [ ] - Fri 2.20 ~ 2.40pm [ ] - Fri 2.40 ~ 3pm [ ] - Fri 3 ~ 3.20pm [ ] - Fri 3.20 ~ 3.40pm [ ] - Fri 3.40 ~ 4pm [ ] - Fri 4 ~ 4.20pm ----------------------------------------------- Mike's slots: @ room # 206 (student learning centre) [ ] - Tue 1 ~ 1.20pm [ ] - Tue 1.20 ~ 1.40pm [ ] - Tue 1.40 ~ 2pm [ ] - Tue 2 ~ 2.20pm [ ] - Tue 2.20 ~ 2.40pm [ ] - Tue 2.40 ~ 3pm [ ] - Tue 3 ~ 3.20pm [ ] - Tue 3.20 ~ 3.40pm [ ] - Tue 3.40 ~ 4pm [ ] - Tue 4 ~ 4.20pm [ Dave Cameron ] - Thu 1 ~ 1.20pm [ Dave Cameron ] - Thu 1.20 ~ 1.40pm [ ] - Thu 1.40 ~ 2pm [ Ryan Yee ] - Thu 2 ~ 2.20pm [ Ryan Yee ] - Thu 2.20 ~ 2.40pm [ ] - Thu 2.40 ~ 3pm [ ] - Thu 3 ~ 3.20pm [Bob McGregor ] - Thu 3.20 ~ 3.40pm [Bob McGregor] - Thu 3.40 ~ 4pm [ ] - Thu 4 ~ 4.20pm |
![]() |
![]() |
Message no. 184[Branch from no. 183] Posted by Kaili Elizabeth Vesik (s83834010) on Wednesday, March 23, 2005 9:16am Subject: Re: project consultation with TAs ---------------------------------------- Frank's slots: @ room #341 [ ] - Mon 1 ~ 1.20pm [ ] - Mon 1.20 ~ 1.40pm [ ] - Mon 1.40 ~ 2pm [ ] - Mon 2 ~ 2.20pm [ ] - Mon 2.20 ~ 2.40pm [ ] - Mon 2.40 ~ 3pm [ ] - Mon 3 ~ 3.20pm [ ] - Wed 9.50am ~ 10.10am [ ] - Wed 10.10 ~ 10.30am [ ] - Wed 10.30 ~ 10.50am [ ] - Fri 1 ~ 1.20pm [ ] - Fri 1.20 ~ 1.40pm [ ] - Fri 1.40 ~ 2pm [ ] - Fri 2 ~ 2.20pm [ ] - Fri 2.20 ~ 2.40pm [ ] - Fri 2.40 ~ 3pm [ ] - Fri 3 ~ 3.20pm [ ] - Fri 3.20 ~ 3.40pm [ ] - Fri 3.40 ~ 4pm [ ] - Fri 4 ~ 4.20pm ----------------------------------------------- Mike's slots: @ room # 206 (student learning centre) [ ] - Tue 1 ~ 1.20pm [ ] - Tue 1.20 ~ 1.40pm [ ] - Tue 1.40 ~ 2pm [ ] - Tue 2 ~ 2.20pm [ ] - Tue 2.20 ~ 2.40pm [ ] - Tue 2.40 ~ 3pm [ ] - Tue 3 ~ 3.20pm [ ] - Tue 3.20 ~ 3.40pm [ ] - Tue 3.40 ~ 4pm [ ] - Tue 4 ~ 4.20pm [ Dave Cameron ] - Thu 1 ~ 1.20pm [ Dave Cameron ] - Thu 1.20 ~ 1.40pm [ Kaili Vesik ] - Thu 1.40 ~ 2pm [ Ryan Yee ] - Thu 2 ~ 2.20pm [ Ryan Yee ] - Thu 2.20 ~ 2.40pm [ ] - Thu 2.40 ~ 3pm [ ] - Thu 3 ~ 3.20pm [Bob McGregor ] - Thu 3.20 ~ 3.40pm [Bob McGregor] - Thu 3.40 ~ 4pm [ ] - Thu 4 ~ 4.20pm |
![]() |
![]() |
Message no. 185 Posted by David Poole (cpsc_422_term2) on Thursday, March 24, 2005 4:00pm Subject: Assignment 5 The assignment and the decision network for question 2 is attached. Have a good wekend. David |
![]() |
![]() |
Message no. 186 Posted by Frank Hutter (s62336011) on Friday, March 25, 2005 9:17am Subject: extra project consultation hours Tuesday, Mar 29 Hi all, when Michael and me came up with project consultation hours, none of us realized my primary choices of Monday and Friday both coincided with Easter in the first week. I guess the least I can do to leverage this is to throw in a few extra slots next Tuesday (the day after Easter Monday), so you have a chance to talk about projects in case you're working on them over the weekend. I'm busy until 5 on Tuesday, so here's a few slots after that: [ ] - Tu, Mar 29: 5 ~ 5.20pm [ ] - Tu, Mar 29: 5.20 ~ 5.40pm [ ] - Tu, Mar 29: 5.40 ~ 6pm [ ] - Tu, Mar 29: 6 ~ 6.20pm [ ] - Tu, Mar 29: 6.20 ~ 6.40pm [ ] - Tu, Mar 29: 6.40 ~ 7pm Cheers, Frank |
![]() |
![]() |
Message no. 187[Branch from no. 184] Posted by Kaili Elizabeth Vesik (s83834010) on Saturday, March 26, 2005 5:32pm Subject: Re: project consultation with TAs ---------------------------------------- Frank's slots: @ room #341 [ ] - Mon 1 ~ 1.20pm [ ] - Mon 1.20 ~ 1.40pm [ ] - Mon 1.40 ~ 2pm [ ] - Mon 2 ~ 2.20pm [ ] - Mon 2.20 ~ 2.40pm [ ] - Mon 2.40 ~ 3pm [ ] - Mon 3 ~ 3.20pm [ ] - Wed 9.50am ~ 10.10am [ ] - Wed 10.10 ~ 10.30am [ ] - Wed 10.30 ~ 10.50am [ ] - Fri 1 ~ 1.20pm [ ] - Fri 1.20 ~ 1.40pm [ ] - Fri 1.40 ~ 2pm [ ] - Fri 2 ~ 2.20pm [ ] - Fri 2.20 ~ 2.40pm [ ] - Fri 2.40 ~ 3pm [ Kaili Vesik ] - Fri 3 ~ 3.20pm [ Kaili Vesik] - Fri 3.20 ~ 3.40pm [ ] - Fri 3.40 ~ 4pm [ ] - Fri 4 ~ 4.20pm ----------------------------------------------- Mike's slots: @ room # 206 (student learning centre) [ ] - Tue 1 ~ 1.20pm [ ] - Tue 1.20 ~ 1.40pm [ ] - Tue 1.40 ~ 2pm [ ] - Tue 2 ~ 2.20pm [ ] - Tue 2.20 ~ 2.40pm [ ] - Tue 2.40 ~ 3pm [ ] - Tue 3 ~ 3.20pm [ ] - Tue 3.20 ~ 3.40pm [ ] - Tue 3.40 ~ 4pm [ ] - Tue 4 ~ 4.20pm [ Dave Cameron ] - Thu 1 ~ 1.20pm [ Dave Cameron ] - Thu 1.20 ~ 1.40pm [ ] - Thu 1.40 ~ 2pm [ Ryan Yee ] - Thu 2 ~ 2.20pm [ Ryan Yee ] - Thu 2.20 ~ 2.40pm [ ] - Thu 2.40 ~ 3pm [ ] - Thu 3 ~ 3.20pm [Bob McGregor ] - Thu 3.20 ~ 3.40pm [Bob McGregor] - Thu 3.40 ~ 4pm [ ] - Thu 4 ~ 4.20pm |
![]() |
![]() |
Message no. 188 Posted by Guan Wang (s77942019) on Wednesday, March 30, 2005 12:46pm Subject: Finding policy Hi, I'm confused on how to find the optimal policy. For the car-buying example discussed in class how do I know which nodes to cancel first(why eliminate car condition first), what is the order of elimination? Thanks a lot, Guan |
![]() |
![]() |
Message no. 189[Branch from no. 188] Posted by David Poole (cpsc_422_term2) on Wednesday, March 30, 2005 5:24pm Subject: Re: Finding policy In message 188 on Wednesday, March 30, 2005 12:46pm, Guan Wang writes: >Hi, I'm confused on how to find the optimal policy. For the car-buying example discussed >in class how do I know which nodes to cancel first(why eliminate car condition first), what >is the order of elimination? > >Thanks a lot, >Guan Let's talk about that in class tomorrow. But in general, the elimination order is arbitrary as long as you eliminate a decision variable when it is in a factor that contains only (some of) its parents. That is, you eliminate a decision variable's non-parents first, then the decision variable itself (by maximizing). Apart from this, you can eliminate variables in any order. David |
![]() |
![]() |
Message no. 190 Posted by Danelle Abra Wettstein (s86800018) on Wednesday, March 30, 2005 9:02pm Subject: Solving the decision network? I'm having troubles solving the decision network. Try to optimize, and it says to add the no-forgetting arcs. Try to add the arcs, and it tells me to order the decision variables. How do you do that? And how do you give the decision variables policies, if at all? Sorry... feel like I should know this from lecture. Thanks in advance! |
![]() |
![]() |
Message no. 191 Posted by David Poole (cpsc_422_term2) on Wednesday, March 30, 2005 10:37pm Subject: CIspace Bayes net applet We have a new version of the applet available from the CIspace page. It has been fixed so that it works as long as you use verbose mode and eliminate the variables manually. Also, remember to save your graph. Don't add new variables after optimizing. If you do this, it should work fine. (It is on the list to get fixed this summer, but that isn't much help to you.) As an alternative, you can use Netica (downloadable from http://www.norsys.com/). The free version should be good enough for your assignment. Please let us know if you have any problems. David |
![]() |
![]() |
Message no. 192[Branch from no. 190] Posted by David Poole (cpsc_422_term2) on Wednesday, March 30, 2005 10:40pm Subject: Re: Solving the decision network? In message 190 on Wednesday, March 30, 2005 9:02pm, Danelle Abra Wettstein writes: >I'm having troubles solving the decision network. Try to optimize, and it says to add the >no-forgetting arcs. Try to add the arcs, and it tells me to order the decision variables. >How do you do that? Please clear your cache and try again. We have uploaded a new version that should fix this. (Optimize in verbose mode and eliminate variables by clicking on them). >And how do you give the decision variables policies, if at all? By optimizing.... then you can chabge the policies. >Sorry... feel like I should know this from lecture. Sorry that the applet isn't as bug-free as we would like... >Thanks in advance! You are welcome. I hope it works now, David |
![]() |
![]() |
Message no. 193[Branch from no. 187] Posted by Blake William Edwards (s83251017) on Thursday, March 31, 2005 5:33pm Subject: Re: project consultation with TAs This is for this week right? or did i miss it Frank's slots: @ room #341 [ ] - Mon 1 ~ 1.20pm [ ] - Mon 1.20 ~ 1.40pm [ ] - Mon 1.40 ~ 2pm [ ] - Mon 2 ~ 2.20pm [ ] - Mon 2.20 ~ 2.40pm [ ] - Mon 2.40 ~ 3pm [ ] - Mon 3 ~ 3.20pm [ ] - Wed 9.50am ~ 10.10am [ ] - Wed 10.10 ~ 10.30am [ ] - Wed 10.30 ~ 10.50am [ ] - Fri 1 ~ 1.20pm [ ] - Fri 1.20 ~ 1.40pm [ ] - Fri 1.40 ~ 2pm [ ] - Fri 2 ~ 2.20pm [Blake Edwards ] - Fri 2.20 ~ 2.40pm [Blake Edwards ] - Fri 2.40 ~ 3pm [ Kaili Vesik ] - Fri 3 ~ 3.20pm [ Kaili Vesik] - Fri 3.20 ~ 3.40pm [ ] - Fri 3.40 ~ 4pm [ ] - Fri 4 ~ 4.20pm ----------------------------------------------- Mike's slots: @ room # 206 (student learning centre) [ ] - Tue 1 ~ 1.20pm [ ] - Tue 1.20 ~ 1.40pm [ ] - Tue 1.40 ~ 2pm [ ] - Tue 2 ~ 2.20pm [ ] - Tue 2.20 ~ 2.40pm [ ] - Tue 2.40 ~ 3pm [ ] - Tue 3 ~ 3.20pm [ ] - Tue 3.20 ~ 3.40pm [ ] - Tue 3.40 ~ 4pm [ ] - Tue 4 ~ 4.20pm [ Dave Cameron ] - Thu 1 ~ 1.20pm [ Dave Cameron ] - Thu 1.20 ~ 1.40pm [ ] - Thu 1.40 ~ 2pm [ Ryan Yee ] - Thu 2 ~ 2.20pm [ Ryan Yee ] - Thu 2.20 ~ 2.40pm [ ] - Thu 2.40 ~ 3pm [ ] - Thu 3 ~ 3.20pm [Bob McGregor ] - Thu 3.20 ~ 3.40pm [Bob McGregor] - Thu 3.40 ~ 4pm [ ] - Thu 4 ~ 4.20pm |
![]() |
![]() |
Message no. 194 Posted by Danelle Abra Wettstein (s86800018) on Thursday, March 31, 2005 9:27pm Subject: Project question I know I'm going to get mocked for this question, but does the report have to be a certain length? Can we get a guideline? I just don't want to fall incredibly short of the suggested length. Amount of words? Something? The proposal is one thing, but I don't want to get marks taken off of the project due to poor format :) |
![]() |
![]() |
Message no. 195[Branch from no. 194] Posted by David Poole (cpsc_422_term2) on Thursday, March 31, 2005 10:11pm Subject: Re: Project question In message 194 on Thursday, March 31, 2005 9:27pm, Danelle Abra Wettstein writes: >I know I'm going to get mocked for this question, but does the report have to be a >certain length? Can we get a guideline? I just don't want to fall incredibly short of the >suggested length. Amount of words? Something? The proposal is one thing, but I don't >want to get marks taken off of the project due to poor format :) Good question. I was wondering when someone was going to ask this! I would sggest about 3-8 pages typeset (depending on the size of the group). Here is a suggested outline: Title + Authors Abstract: about 100 words Introduction: what is the problem you are trying to solve Badckground: what someone needs to know to read this paper (write it so that one of your peers can understand what is going on). Hypothesis: what is it that you are trying to show Methodology: what you actually did to test the hypothesis Results: what you discovered (and why we should believe it). Conclusion and future work: sum up the paper and suggest what other questions may be interesteing based on what you did. Acknowledgements and References: reference all sources used. ---- I hope that helps, David |
![]() |
![]() |
Message no. 196 Posted by David Poole (cpsc_422_term2) on Thursday, March 31, 2005 10:13pm Subject: Project Presentation Schedule Here is a schedule for the project presentations. You can switch times, but you can only switch daya with an equal number of people. (I want to keep the days balanced). Each person should plan for 3 minutes + 1 minute for questions. Just try to tell us one thing that is interesting. Groups should give a multi-person coordinated talk (i.e., every person should talk, and the whole presentation of the group should be coherent). You can use slides (for use with an overhead projector), powerpoint or pdf. You can bring it on a floppy (remember these?), a USB drive, a CD or you can email it to me before 8:00pm on the previous day. If you name is not on this list please email me ASAP. ******* Tuesday ******** Costa Vlachos, Tom Pospisil David Matheson, Ian Macdonald, Yavar Naddaf Robin McQuinn, Dave Cameron Daniel McLaren Stanley Chiu, Dan Liu, Vivian Luk, Bob McGregor, Sillard Urbanovich, Guan Wang ******* Thursday ******* David Chong, Kaili Vesik Bryan Chua Onur Kamili, Ryan Yee Wing Hang David Chan, Stanley Tso Danelle Wettstein, Kevin Irmscher, Stephen Mak Sam Davis Blake Edwards William Fong, Daniel Chang |
![]() |
![]() |
Message no. 197[Branch from no. 195] Posted by Danelle Abra Wettstein (s86800018) on Thursday, March 31, 2005 10:17pm Subject: Re: Project question Most definitely. That was better than expected :) So, a group of 3 people should have about 5 or 6 pages... got it! |
![]() |
![]() |
Message no. 198 Posted by David Poole (cpsc_422_term2) on Thursday, March 31, 2005 10:20pm Subject: Today's logic programming & Bayes net example If you are interested in the example of the multi-digit arithmetic from todays class, see http://www.cs.ubc.ca/spider/poole/ci2/code/cilog/CILog2.html It is the arithmetic.cil that I showed. It is interesting to play with. If you don't want to look at it that's fine. I promise I won't ask anything on the final exam about it or the other stuff I covered in the second half of the class. But I will about the value of information. David |
![]() |
![]() |
Message no. 199[Branch from no. 197] Posted by Danelle Abra Wettstein (s86800018) on Thursday, March 31, 2005 10:27pm Subject: Re: Project question Actually... Double-spaced? And what about w/ images? Should we count those out when doing pages, or do they belong in the pages? No, I'm never happy with an answer ;) |
![]() |
![]() |
Message no. 200[Branch from no. 196] Posted by Christopher John Hawkins (s93985018) on Thursday, March 31, 2005 10:50pm Subject: Re: Project Presentation Schedule My group (C.J. Hawkins and Mike Nightingale) seems to have been left off the list. |
![]() |
![]() |
Message no. 201[Branch from no. 199] Posted by David Poole (cpsc_422_term2) on Friday, April 1, 2005 2:58pm Subject: Re: Project question In message 199 on Thursday, March 31, 2005 10:27pm, Danelle Abra Wettstein writes: >Actually... > >Double-spaced? No. Jut make it as readable as possible. >And what about w/ images? Should we count those out when doing pages, or do they >belong in the pages? This is meant to be a rough estimate. Figures are good, but I can't tell how many words a figure should replace. Use enough words to explain clearly what you have done and what you have learned. Use your common sense. >No, I'm never happy with an answer ;) OK. So then I'll give you an answer to keep you unhappy ;^} David |
![]() |
![]() |
Message no. 202[Branch from no. 200] Posted by David Poole (cpsc_422_term2) on Saturday, April 2, 2005 2:34pm Subject: Re: Project Presentation Schedule In message 200 on Thursday, March 31, 2005 10:50pm, Christopher John Hawkins writes: >My group (C.J. Hawkins and Mike Nightingale) seems to have been left off >the list. You can present on Thursday. David |
![]() |
![]() |
Message no. 203 Posted by Kaili Elizabeth Vesik (s83834010) on Saturday, April 2, 2005 10:24pm Subject: Assignment marks David, You mentioned in class that any assignments done prior to the midterm with grades lower than our midterm grade would have their grades increased to the value of our midterm grade. Should we expect to see this reflected in the "grades" section of webct, or will it be considered only when you calculate our final marks? Thanks. Kaili |
![]() |
![]() |
Message no. 204[Branch from no. 203] Posted by David Poole (cpsc_422_term2) on Sunday, April 3, 2005 10:55am Subject: Re: Assignment marks In message 203 on Saturday, April 2, 2005 10:24pm, Kaili Elizabeth Vesik writes: >David, > >You mentioned in class that any assignments done prior to the midterm >with grades lower than our midterm grade would have their grades >increased to the value of our midterm grade. Should we expect to see >this reflected in the "grades" section of webct, or will it be >considered only when you calculate our final marks? > >Thanks. >Kaili It will be reflected in my program to compute grades. (But it might be good to remind me closer to the final exam ;^) David |
![]() |
![]() |
Message no. 205 Posted by David Poole (cpsc_422_term2) on Sunday, April 3, 2005 11:02am Subject: Assignment 5 I had one student who had a problem with the assignment because they did not put in the no-forgetting arcs. You need arcs from previous decisions and the information available to them into subsequent decisions. Otherwise the algorithm doesn't work: it never gets to the stage where it can maximize. Unfortunately the applet doesn't give very good error messages (It used to, but it was wrong, so we removed it). That is the only tricky part of question 1. David p.s. if you don't know where to start, it may be easier to start with question 2. The questions are in this order in the assignment because logically creating a decision network comes before solving one. But playing with one may make it easier to know how to construct one. |
![]() |
![]() |
Message no. 206 Posted by Daniel Joseph Anderson (s76045996) on Sunday, April 3, 2005 11:42am Subject: assignment 5 There's mention of assignment 5 here, but it's not on the website... what's up? |
![]() |
![]() |
Message no. 207[Branch from no. 206] Posted by David Poole (cpsc_422_term2) on Sunday, April 3, 2005 1:17pm Subject: Re: assignment 5 In message 206 on Sunday, April 3, 2005 11:42am, Daniel Joseph Anderson writes: >There's mention of assignment 5 here, but it's not on the website... what's up? It was given out in class and the text is in message 185 (March 24). David |
![]() |
![]() |
Message no. 208 Posted by Stephen Shui Fung Mak (s36743003) on Monday, April 4, 2005 12:13am Subject: Final Exam Practice Questions? Will there be any given out anytime soon? It would be great if a set of practice final exam questions can be given this week so that we can have more time to ask questions and prepare for it. |
![]() |
![]() |
Message no. 209[Branch from no. 208] Posted by David Poole (cpsc_422_term2) on Monday, April 4, 2005 9:26am Subject: Re: Final Exam Practice Questions? In message 208 on Monday, April 4, 2005 12:13am, Stephen Shui Fung Mak writes: >Will there be any given out anytime soon? It would be great if a set of practice final >exam questions can be given this week so that we can have more time to ask questions >and prepare for it. OK. I will try to get something out this week. But I can't promise it. David |
![]() |
![]() |
Message no. 210[Branch from no. 209] Posted by Wing Hang Chan (s84098011) on Monday, April 4, 2005 12:03pm Subject: Re: Final Exam Practice Questions? It would be great to have the sample questions released early. Also, will we be allowed a cheat-sheet like for the midterm? |
![]() |
![]() |
Message no. 211 Posted by Robert McGregor (s92140011) on Monday, April 4, 2005 8:16pm Subject: Assignment 5 Hi, I'm just wondering if there is an update for the decision networking applet that has not been uploaded. The current applet on CISpace does not allow you to create decision or value nodes... Thanks, Bob |
![]() |
![]() |
Message no. 212[Branch from no. 211] Posted by Samuel Douglas Davis (s85850014) on Monday, April 4, 2005 9:14pm Subject: Re: Assignment 5 You have to select Belief/Decision Mode --> Decision Network Mode in the Network Options menu. Sam |
![]() |
![]() |
Message no. 213[Branch from no. 210] Posted by David Poole (cpsc_422_term2) on Monday, April 4, 2005 9:24pm Subject: Re: Final Exam Practice Questions? In message 210 on Monday, April 4, 2005 12:03pm, Wing Hang Chan writes: >It would be great to have the sample questions released early. I agree; it would be great. > Also, >will we be allowed a cheat-sheet like for the midterm? Yes. One sheet of letter sized paper. You can use as many sides of this one sheet of paper as you like. David |
![]() |
![]() |
Message no. 214 Posted by Daniel Wen-Yen Chang (s81965014) on Monday, April 4, 2005 9:42pm Subject: Assignment 5 Questions Hi, I'm just wondering if anyone could give me an example of a utility function, optimal decision function, and optimal policy? Probably using the car buying question? I tried to look at the notes and the things that David has written on the board, but I'm still confused as to what is expected from us. cheers, dan |
![]() |
![]() |
Message no. 215[Branch from no. 212] Posted by Robert McGregor (s92140011) on Monday, April 4, 2005 10:10pm Subject: Re: Assignment 5 Thanks for the help. Bob |
![]() |
![]() |
Message no. 216[Branch from no. 214] Posted by David Poole (cpsc_422_term2) on Tuesday, April 5, 2005 9:30am Subject: Re: Assignment 5 Questions In message 214 on Monday, April 4, 2005 9:42pm, Daniel Wen-Yen Chang writes: >Hi, > >I'm just wondering if anyone could give me an example of a utility function, optimal >decision function, and optimal policy? Probably using the car buying question? Have a look at question 2. If you click on the utility (diamond shaped) node (in the appropriate mode) it shows you the utility function. It gives the utility for various values of its parents. After you optimize decisions (use verbose mode, and click on the nodes to emilinate), you can view the optimal decision functions by clicking on them. The optimal policy is the set of the optimal decision functions. >I tried to look at the notes and the things that David has written on the board, but I'm >still confused as to what is expected from us. > >cheers, >dan |
![]() |
![]() |
Message no. 217[Branch from no. 205] Posted by Kaili Elizabeth Vesik (s83834010) on Wednesday, April 6, 2005 7:05pm Subject: Re: Assignment 5 I am having some problems with Question One. Here's what I've done (I've tried it both online and after downloading the applet onto my own machine): - created network - filled in probablity/utility tables - clicked "add no-forgetting arcs" button - clicked "optimize decisions" button There are two issues. First of all, all the utilities I filled into the table are in the range [0,100], but when I optimize, it says the expected utility is 451 (plus some decimal). This seems rather strange to me. Second problem: after I optimize decisions, when I click the "View/modify decision" button and then a decision variable rectangle, I get an error message saying "Policy has not been defined yet." When I click the "Tell me more" option, it says "the network has not yet been optimized". But didn't I just optimized it? Could anyone be so kind as to point me in the right direction here? Thanks! |
![]() |
![]() |
Message no. 218[Branch from no. 217] Posted by Samuel Douglas Davis (s85850014) on Wednesday, April 6, 2005 8:28pm Subject: Re: Assignment 5 In message 217 on Wednesday, April 6, 2005 7:05pm, Kaili Elizabeth Vesik writes: >I am having some problems with Question One. Here's what I've done (I've >tried it both online and after downloading the applet onto my own machine): >- created network >- filled in probablity/utility tables >- clicked "add no-forgetting arcs" button >- clicked "optimize decisions" button > >There are two issues. > >First of all, all the utilities I filled into the table are in the range >[0,100], but when I optimize, it says the expected utility is 451 (plus >some decimal). This seems rather strange to me. > >Second problem: after I optimize decisions, when I click the >"View/modify decision" button and then a decision variable rectangle, I >get an error message saying "Policy has not been defined yet." When I >click the "Tell me more" option, it says "the network has not yet been >optimized". But didn't I just optimized it? > >Could anyone be so kind as to point me in the right direction here? >Thanks! I had the same problems. Are you using verbose mode and selecting the nodes to eliminate manually? When I first did it, it wouldn't let me select all of the nodes for some reason, but when I started over and created the exact same network from scratch it did work, so you might try that. Sam |
![]() |
![]() |
Message no. 219[Branch from no. 218] Posted by Kaili Elizabeth Vesik (s83834010) on Wednesday, April 6, 2005 8:34pm Subject: Re: Assignment 5 Yeah, that's exactly what's happening. Thanks for the suggestion; I'll try again. |
![]() |
![]() |
Message no. 220[Branch from no. 218] Posted by David Poole (cpsc_422_term2) on Wednesday, April 6, 2005 8:49pm Subject: Re: Assignment 5 >I had the same problems. Are you using verbose mode and selecting the >nodes to eliminate manually? When I first did it, it wouldn't let me >select all of the nodes for some reason, but when I started over and >created the exact same network from scratch it did work, so you might >try that. I am not sure what the problem is, but the student who was maintaining this code assured that this works (see an earler message). We have hored a student over the summer to fix it, but that doesn't help you. Sorry about that! David |
![]() |
![]() |
Message no. 221[Branch from no. 218] Posted by David Poole (cpsc_422_term2) on Wednesday, April 6, 2005 8:55pm Subject: Re: Assignment 5 >I had the same problems. Are you using verbose mode and selecting the >nodes to eliminate manually? When I first did it, it wouldn't let me >select all of the nodes for some reason, but when I started over and >created the exact same network from scratch it did work, so you might >try that. Before you try to create the problem again from scratch try the following: save the graph (copy the text representation into the clipboard and save it in a text file, or use the save in the downloaded version), quit, then copy the text representation back. This usually works (as it doesn't remember some state that otherwise messes things up). David |
![]() |
![]() |
Message no. 222[Branch from no. 221] Posted by Kaili Elizabeth Vesik (s83834010) on Wednesday, April 6, 2005 9:17pm Subject: Re: Assignment 5 Well, I've tried both suggestions that were given (recreating the entire network, as well as saving and restarting), and neither has worked. So it's off to banging my head against the wall some more. Thanks for your suggestions. Kaili |
![]() |
![]() |
Message no. 223[Branch from no. 222] Posted by Onur Komili (s88435045) on Wednesday, April 6, 2005 10:08pm Subject: Re: Assignment 5 Glad to see I'm not alone when it comes to banging my head on the wall. Strange things keep happening when I use the applet, I'm not sure if it's me not understanding the material or if the applet is lying to me and has a bug. Onur |
![]() |
![]() |
Message no. 224 Posted by Guan Wang (s77942019) on Wednesday, April 6, 2005 10:36pm Subject: applet messed up Hi I used the applet to do question 2 last week and it optimazied it to give 79, now its giving me 71. Seems like the way it optmizes changed but which one is correct? |
![]() |
![]() |
Message no. 225[Branch from no. 223] Posted by Wing Hang Chan (s84098011) on Wednesday, April 6, 2005 10:36pm Subject: Re: Assignment 5 the applet works fine for my question 1 graph (and my conditional probabilities) i would recommend that you play with the question 2 graph first. it really helped me :) |
![]() |
![]() |
Message no. 226[Branch from no. 224] Posted by Onur Komili (s88435045) on Wednesday, April 6, 2005 10:44pm Subject: Re: applet messed up The applet optimizes to 79.2 for me still, but I'm not even sure if that's right. It's optimizing to 71 without you making any changes or observing anything? Onur |
![]() |
![]() |
Message no. 227[Branch from no. 225] Posted by Onur Komili (s88435045) on Wednesday, April 6, 2005 10:46pm Subject: Re: Assignment 5 For question 2, why does the probability of Trouble 2 come out to be 0.16? I keep calculating 0.3, and can't seem to figure out how it's getting 0.16, I'm calculating things just like I did the others but it just doesn't seem to add up. Onur |
![]() |
![]() |
Message no. 228[Branch from no. 226] Posted by Guan Wang (s77942019) on Wednesday, April 6, 2005 10:48pm Subject: Re: applet messed up ya, strange. I reloaded the graph it optimizes 79.2 again. I guess no need to panic? :-) |
![]() |
![]() |
Message no. 229[Branch from no. 217] Posted by Onur Komili (s88435045) on Wednesday, April 6, 2005 11:32pm Subject: Re: Assignment 5 I'm having the exact same problems you are now. My Expected Utility comes out to 520, when my utility values range from 0 to 100. Also I'm getting the "policy not defined" messages you're having after trying to optimize it. Did you figure out how to solve your problem yet? Onur |
![]() |
![]() |
Message no. 230[Branch from no. 229] Posted by Danelle Abra Wettstein (s86800018) on Wednesday, April 6, 2005 11:44pm Subject: Re: Assignment 5 Are you doing brief or verbose? I find brief gives me some whacked out answer, so I always use verbose now. |
![]() |
![]() |
Message no. 231[Branch from no. 229] Posted by David Poole (cpsc_422_term2) on Thursday, April 7, 2005 9:57am Subject: Re: Assignment 5 In message 229 on Wednesday, April 6, 2005 11:32pm, Onur Komili writes: >I'm having the exact same problems you are now. > >My Expected Utility comes out to 520, when my utility values range from 0 to 100. Also >I'm getting the "policy not defined" messages you're having after trying to optimize it. > >Did you figure out how to solve your problem yet? > >Onur I looked at your code. All of the decision nodes need to be connected together. And any of the information from a previous decision needs to be available for a next decision. It needs to be "noforgetting". Otherwise the algorithm doesn't work. You did not do that, and that is why it doesn't work. David |
![]() |
![]() |
Message no. 232 Posted by Wing Hang Chan (s84098011) on Friday, April 8, 2005 1:35pm Subject: Sample Exam Questions? Just wondering when we can expect to see some sample exam questions. The sooner the better as I would like to get a head start on studying for this course. Thank you :) p.s. I enjoyed the class presentations, they were all very interesting |
![]() |
![]() |
Message no. 233 Posted by Stanley Chi Hong Tso (s58635020) on Saturday, April 9, 2005 9:36pm Subject: Assignment solutions? Are there assignment solutions to the recent assignments? Stan |
![]() |
![]() |
Message no. 234 Posted by Vivian Luk (s82215013) on Tuesday, April 12, 2005 9:18pm Subject: March 29th/31st Lecture notes? Are the lecture notes for March 29th and 31st going to be available? Thanks! # 29 Mar. Value of information and control. # 31 Mar. Putting it together. |
![]() |
![]() |
Message no. 235 Posted by David Poole (cpsc_422_term2) on Wednesday, April 13, 2005 12:32pm Subject: practice final exam available There is a practice final exam available from the courese web page. I had hoped to post this earlier, but I couldn't; sorry about that. I will post a solution on Monday sometime. David |
![]() |
![]() |
Message no. 236[Branch from no. 235] Posted by Wing Hang Chan (s84098011) on Wednesday, April 13, 2005 3:10pm Subject: Re: practice final exam available thank you for the posting the sample exam. will there be solutions posted for this? |
![]() |
![]() |
Message no. 237[Branch from no. 236] Posted by Danelle Abra Wettstein (s86800018) on Wednesday, April 13, 2005 6:33pm Subject: Re: practice final exam available >I will post a solution on Monday sometime. > >David > Wing Hang Chan writes: >thank you for the posting the sample exam. will there be solutions >posted for this? I'm sorry, ask that again? |
![]() |
![]() |
Message no. 238 Posted by Danelle Abra Wettstein (s86800018) on Wednesday, April 13, 2005 11:24pm Subject: Office hours? Are the TA and Prof office hours the same for the exam period? |
![]() |
![]() |
Message no. 239[Branch from no. 237] Posted by Wing Hang Chan (s84098011) on Thursday, April 14, 2005 3:54am Subject: Re: practice final exam available oops i was in such a rush this afternoon i didn't finish reading the post. *waits for monday* |
![]() |
![]() |
Message no. 240 Posted by Michael Chiang (s27992023) on Thursday, April 14, 2005 9:48am Subject: TA hours (Michael) Hi all, I will be running my usual TA hour today at 1pm, in room 106 (accessible from the atrium, just knock on the door). Also, I will run a 2 hour session next on Monday at 11am in the Student Learning Centre, CISCR 206. Good luck with the preparations, Michael |
![]() |
![]() |
Message no. 241 Posted by Frank Hutter (s62336011) on Thursday, April 14, 2005 10:37am Subject: TA hours (Frank) Hi everybody, I'll have extra TA hours this Friday 11-1 and next Tuesday 11-12:30 (just before David's office hour). I'll also have my regular office hour next Wednesday (but better don't wait until then ;). Please notice that all these TA hours will be held in my new office in the new building, office X563. Cheers, Frank |
![]() |
![]() |
Message no. 242 Posted by Vivian Luk (s82215013) on Thursday, April 14, 2005 6:41pm Subject: How long is the final exam? Is the final exam 2hrs or 2.5hrs? Thanks. |
![]() |
![]() |
Message no. 243 Posted by Stephen Shui Fung Mak (s36743003) on Friday, April 15, 2005 12:37am Subject: Assignment Solutions???? When will they be posted??? The exam is on next Wednesday...and I feel so lost right now... |
![]() |
![]() |
Message no. 244[Branch from no. 243] Posted by Kaili Elizabeth Vesik (s83834010) on Friday, April 15, 2005 11:32am Subject: Re: Assignment Solutions???? To add to the questions re: assignments, David, will we be able to pick up our marked Assignment 5s sometime before the exam? Regarding the post I'm replying to, is there anyone who's gotten perfect on any of the assignments who happens to feel like being kind to the rest of us and scanning/posting their solutions? I assume that's not against the rules, since the due dates are all past now, right? Thanks! Kaili |
![]() |
![]() |
Message no. 245[Branch from no. 242] Posted by Vivian Luk (s82215013) on Friday, April 15, 2005 12:06pm Subject: Re: How long is the final exam? Any reply soon would be greatly appreciated! (I have to give a presentation at 3:30 right after the 422Final so I need to contact my prof if I can't make it on time...) Thx |
![]() |
![]() |
Message no. 246[Branch from no. 241] Posted by Frank Hutter (s62336011) on Friday, April 15, 2005 12:19pm Subject: Re: TA hours (Frank) By "new building" I mean CS2, not DMP, in case someone was wondering ... |
![]() |
![]() |
Message no. 247[Branch from no. 242] Posted by David Poole (cpsc_422_term2) on Friday, April 15, 2005 1:27pm Subject: Re: How long is the final exam? In message 242 on Thursday, April 14, 2005 6:41pm, Vivian Luk writes: >Is the final exam 2hrs or 2.5hrs? > >Thanks. 2.5 hours. Remember you can bring in 1 letter-sized sheet of paper. David |
![]() |
![]() |
Message no. 248[Branch from no. 234] Posted by David Poole (cpsc_422_term2) on Friday, April 15, 2005 1:46pm Subject: Re: March 29th/31st Lecture notes? In message 234 on Tuesday, April 12, 2005 9:18pm, Vivian Luk writes: >Are the lecture notes for March 29th and 31st going to be available? >Thanks! > ># 29 Mar. Value of information and control. ># 31 Mar. Putting it together. No. I don't think there were any. David |
![]() |
![]() |
Message no. 249[Branch from no. 233] Posted by Vivian Luk (s82215013) on Saturday, April 16, 2005 12:01pm Subject: Re: Assignment solutions? Will there be solutions to the assignments? Thanks. |
![]() |
![]() |
Message no. 250 Posted by Michael Chiang (s27992023) on Sunday, April 17, 2005 1:31am Subject: assignment 4 solutions Hi all, attached are solutions to assignment 4 (one of the many possible sets at least) which Frank and I put together. They should be fairly self-explanatory. If not, feel free to post your questions here and we'll do our best to answer. Michael |
![]() |
![]() |
Message no. 251[Branch from no. 250] Posted by Frank Hutter (s62336011) on Sunday, April 17, 2005 2:29am Subject: Re: assignment 4 solutions I think Mike forgot to post this part of our solution to question 2 of assignment 4 which does the computations by hand: Question 2a) Initial factors: f0(A), f1(B), f2(A,B,C), f3(B,D), f4(C,E), f5(C,F) Eliminate D: sum f3(B,D) over D => new factor f6(B) (all 1s) Eliminate F: sum f5(C,F) over F => new factor f7(C) (all 1s) Eliminate A: sum f0(A)*f2(A,B,C) over A => new factor f8(B,C) (t,t:0.16; t,f: 0.84; f,t:0.76; f,f: 0.24) Eliminate B: sum f1(B)*f6(B)*f8(B,C) over B => new factor f9(C) (t:0.64, f:0.36) Eliminate C: sum f7(C)*f9(C)*f4(C,E) over C => new factor f10(E) (t:0.52, f:0.48) This is already normalized, so the result is P(E=t) = 0.52 and P(E=f) = 0.48 Question 2b) You can reuse most of the computation. f5(C,F) becomes F5(C) (t: 0.8, f:0.1) Note that this is NOT a CPT anymore, it is not normalized! You also don't need to eliminate F anymore. The last factor is then f9(E) (t: 0.3656, f: 0.1824) Normalization yields the end result: P(E=t|F=f) = 0.6672 and P(E=f|F=f) = 0.3328 |
![]() |
![]() |
Message no. 252 Posted by Frank Hutter (s62336011) on Sunday, April 17, 2005 2:42am Subject: Assignment 5 solutions Hi everybody, here's the solution of assignment 5. For question 1, David's solution is attached as an xml file. For question 2, here's what Mike and I came up with (the xml file can be found on the course website): Question 2a) The optimal decision function for variable cheat2 is to always cheat, expect when you cheated before (Cheat1=t) and got trouble (Trouble1=t). When you didn't cheat but still got trouble, it actually doesn't matter whether you cheat in Cheat2 (the utilities are then equal for Cheat2=t and Cheat2=f, and the applet chooses Cheat2=t in that case). You can compute this decision function with variable elimination as follows. Like in assignment 4 for Bayesian networks, we eliminate one variable at a time. Random variables are eliminated by summing over them, and decision variables are eliminated by taking the maximum of their outcomes (there's an example later on). Initial factors: f0(Cheat2, Trouble1, Watched, Trouble2) (from CPT for Trouble2) f1(Trouble1, Cheat1, Watched) (from CPT of Trouble1) f2(Watched) (from CPT of Watched) f3(Trouble2, Cheat2, Utility) (from table for the utility node) Eliminate Trouble2: sum f0 x f3 over Trouble2 => new factor f4(Trouble1, Watched, Cheat2, Utility) This is factor f4: Trouble1 Watched Cheat2 Utility t t t 30 t t f 49 t f t 79 t f f 49 f t t 44 f t f 70 f f t 100 f f f 70 Eliminate Watched: sum f1 x f2 x f4 over Watched => new factor f5(Trouble1, Cheat1, Cheat2, Utility) This is factor f5: Trouble1 Cheat1 Cheat2 Utility t t t 9.6 t t f 15.68 t f t 0 t f f 0 f t t 63.52 f t f 47.6 f f t 77.6 f f f 70 From this, you can already extract the decision function for Cheat2: Trouble1 Cheat1 Cheat2 t t f (since 15.68 > 9.6) t f t (arbitrary, could just as well be f, since 0==0) f t t (since 63.52 > 47.6) f f t (since 77.6 > 70) You read this as follows: "if Trouble1=t and Cheat1=t, I will not cheat the second time (Cheat2=f)" etc. Question 2b) Eliminate Cheat2: maximize f5 over Cheat2 => new factor f6(Trouble1, Cheat1, Utility) (the maximization is the same we did in the end of question 2a - just keep the higher entry) ) This is factor f6: Trouble1 Cheat1 Utility t t 15.68 t f 0 f t 63.52 f f 77.6 Eliminate Trouble1: sum f6 over Trouble1 => new factor f7(Cheat1, Utility) This is factor f7: Cheat1 Utility t 79.2 f 77.6 Thus, the decision function for Cheat1 always chooses Cheat1=t. The optimal policy is the combination of the decision functions for Cheat1 and Cheat2. If you want to compute the expected utility (which was not asked for), you need to eliminate Cheat1 as well: Eliminate Cheat1: maximize f7 over Cheat1 => factor f8(Utility). This yields the expected utility 79.2 when you follow the optimal policy. Note that the expected utility of a policy merely says how well you do on average with this policy. It is NOT the policy itself. The policy consists of decision functions for each decision variable. (Many people mixed this up) |
![]() |
![]() |
Message no. 253[Branch from no. 249] Posted by Frank Hutter (s62336011) on Sunday, April 17, 2005 2:46am Subject: Re: Assignment solutions? Ok, now there should be solutions for all assignments: Solutions for assignments 1 and 2 are on the course website. David posted a solution to assignment 3 on February 23. Mike posted a solution to assignment 4 earlier today. I just posted a solution to assignment 5. If you don't find or understand them, please ask. Cheers, Frank |
![]() |
![]() |
Message no. 254[Branch from no. 228] Posted by Frank Hutter (s62336011) on Sunday, April 17, 2005 2:48am Subject: Re: applet messed up 79.2 is correct for the expected utility - see the solution I just posted. I hope there's no new bug with the applet. Frank |
![]() |
![]() |
Message no. 255[Branch from no. 251] Posted by Michael Chiang (s27992023) on Sunday, April 17, 2005 10:10am Subject: Re: assignment 4 solutions Oops, good save Frank! Thanks. M. |
![]() |
![]() |
Message no. 256[Branch from no. 253] Posted by Vivian Luk (s82215013) on Sunday, April 17, 2005 11:04am Subject: Re: Assignment solutions? Thanks!!! |
![]() |
![]() |
Message no. 257[Branch from no. 252] Posted by Kaili Elizabeth Vesik (s83834010) on Sunday, April 17, 2005 11:49am Subject: Re: Assignment 5 solutions There doesn't seem to be an xml file attached. Was there supposed to be? Kaili |
![]() |
![]() |
Message no. 258[Branch from no. 245] Posted by Samuel Douglas Davis (s85850014) on Sunday, April 17, 2005 2:12pm Subject: Re: How long is the final exam? In message 245 on Friday, April 15, 2005 12:06pm, Vivian Luk writes: >Any reply soon would be greatly appreciated! (I have to give a >presentation at 3:30 right after the 422Final so I need to contact my >prof if I can't make it on time...) > >Thx Isn't the 422 final at 3:30? |
![]() |
![]() |
Message no. 259[Branch from no. 258] Posted by Vivian Luk (s82215013) on Sunday, April 17, 2005 4:23pm Subject: Re: How long is the final exam? I meant 5:30 :) |
![]() |
![]() |
Message no. 260 Posted by Samuel Douglas Davis (s85850014) on Sunday, April 17, 2005 6:11pm Subject: SARSA(lambda) I think I've misunderstood something all along. The notes on reinforcement learning say that the algorithm given for SARSA(lambda) "specifies that Q[s, a] is updated for every state s and action a whenever a new reward is received," however it seems to me that it says that Q is updated at each step, regardless of whether there is a reward or not. Am I right in thinking that the algorithm only specifies what to do when r != 0, and if so, what happens when r=0? Thanks, Sam |
![]() |
![]() |
Message no. 261[Branch from no. 260] Posted by David Poole (cpsc_422_term2) on Sunday, April 17, 2005 10:07pm Subject: Re: SARSA(lambda) In message 260 on Sunday, April 17, 2005 6:11pm, Samuel Douglas Davis writes: >I think I've misunderstood something all along. The notes on >reinforcement learning say that the algorithm given for SARSA(lambda) >"specifies that Q[s, a] is updated for every state s and action a >whenever a new reward is received," however it seems to me that it says >that Q is updated at each step, regardless of whether there is a reward >or not. Am I right in thinking that the algorithm only specifies what to >do when r != 0, and if so, what happens when r=0? > >Thanks, >Sam r=0 isn't treated differently from any other case. A reward is received, includes the case when 0 is received. David |
![]() |
![]() |
Message no. 264[Branch from no. 261] Posted by Samuel Douglas Davis (s85850014) on Sunday, April 17, 2005 10:26pm Subject: Re: SARSA(lambda) In message 261 on Sunday, April 17, 2005 10:07pm, David Poole writes: >r=0 isn't treated differently from any other case. A reward is received, >includes the case when 0 is received. > >David > Ok, I was confused by question 3c of the midterm. I thought it was asking us how part b would be different using SARSA, but it was actually asking us about part a, right? |
![]() |
![]() |
Message no. 265 Posted by David Poole (cpsc_422_term2) on Sunday, April 17, 2005 10:26pm Subject: Solution to the practice final exam There is a solution to the practice final exam at: http://www.cs.ubc.ca/spider/poole/cs422/2005/exams/prfinsol.html or http://www.cs.ubc.ca/spider/poole/cs422/2005/exams/prfinsol.pdf You can also expect something repreated from the midterm, and something that you should have learned from the assignments and the project. David |
![]() |
![]() |
Message no. 266 Posted by Michael Chiang (s27992023) on Monday, April 18, 2005 11:31am Subject: extra TA hour Hi all, I will be offering another hour of TA consultation tomorrow (due to some demand). It will be held in the Student Learning Centre #206, 3.30pm ~ 4.30pm. Michael |
![]() |
![]() |
Message no. 267 Posted by Onur Komili (s88435045) on Monday, April 18, 2005 1:39pm Subject: Midterm Questions Scan I know it's a little late but I just realized I missed the lecture where we got our midterms back and never actually went to pick it up since our grades were posted online. Could someone please scan the midterm (feel free to blank out your answers if you want) and just post the midterm questions or possibly email it to me? David posted the solutions to the midterm but it doesn't have the questions themselves and I can't remember what all the questions were. Thanks in advance, Onur |
![]() |
![]() |
Message no. 268 Posted by Danelle Abra Wettstein (s86800018) on Monday, April 18, 2005 2:43pm Subject: Final, Question 10c I was going through this question, and have come across what I believe is an error. Going to the 3rd equation, after you sum out S, you get f(A,C). Then maximize over shutdown and get f(c,utility)... so that leaves P(C), f(A,C) and f(C, Utility). Then it sums out C... that leaves f(A, Utility)... but the answer in the solutions just says f(Utility). It's my assumption that A should have been summed out sometime before C, right? (or we can just sum out A now... but the bottom line is A should have been summed out) Let me know if I'm somewhat right :) |
![]() |
![]() |
Message no. 269 Posted by Danelle Abra Wettstein (s86800018) on Monday, April 18, 2005 2:45pm Subject: Discounted reward I'm a bit confused as to the purpose of discounted reward. My thoughts were that the values got more refined as the algorithms/applets ran, so why are the more refined, future values worth less than the original, unrefined values? I understand that discounted reward solves the problem of infinite rewards, but is there another reason for Discounted rewards besides that? Thanks. |
![]() |
![]() |
Message no. 270[Branch from no. 268] Posted by Danelle Abra Wettstein (s86800018) on Monday, April 18, 2005 2:55pm Subject: Re: Final, Question 10c Also, is there any specific reasons the variables were summed/maximized in this order? Thanks! |
![]() |
![]() |
Message no. 271 Posted by Danelle Abra Wettstein (s86800018) on Monday, April 18, 2005 3:29pm Subject: Midterm Q3b For Q[S4,Right], how is the answer 10+0.5(10-10) = 10 generated? Where did the discount of 0.9 go, and how does S5 suddenly have a value of 10? |
![]() |
![]() |
Message no. 272[Branch from no. 271] Posted by Kaili Elizabeth Vesik (s83834010) on Monday, April 18, 2005 4:06pm Subject: Re: Midterm Q3b In message 271 on Monday, April 18, 2005 3:29pm, Danelle Abra Wettstein writes: >For Q[S4,Right], how is the answer 10+0.5(10-10) = 10 generated? Where did the >discount of 0.9 go, and how does S5 suddenly have a value of 10? The discount of 0.9 doesn't show up because it multiplies the current value of s5, which is zero. The expression, written out fully, would be Q[s4,right] <- 10 + 0.5(10 + 0.9*0 - 10). s5 doesn't have a Q-value, but it does have a reward of 10; that's where the first 10 inside the brackets comes from. Take a look at the diagram on the test-- s5 is inside a circle; that is, a reward state. |
![]() |
![]() |
Message no. 273[Branch from no. 272] Posted by Samuel Douglas Davis (s85850014) on Monday, April 18, 2005 4:29pm Subject: Re: Midterm Q3b I don't really understand why the value of s5 is 0. We visited it before and must have carried out some action, and if we went right or crashed into a wall it would have a non-zero Q-value, wouldn't it? |
![]() |
![]() |
Message no. 274[Branch from no. 273] Posted by Danelle Abra Wettstein (s86800018) on Monday, April 18, 2005 5:12pm Subject: Re: Midterm Q3b Oh.. I incorrectly had brackets around a statement... that totally explains why I was confused :) |
![]() |
![]() |
Message no. 275[Branch from no. 273] Posted by Danelle Abra Wettstein (s86800018) on Monday, April 18, 2005 5:15pm Subject: Re: Midterm Q3b In message 273 on Monday, April 18, 2005 4:29pm, Samuel Douglas Davis writes: >I don't really understand why the value of s5 is 0. We visited it before >and must have carried out some action, and if we went right or crashed >into a wall it would have a non-zero Q-value, wouldn't it? It could have moved down. I guess we just make the assumption that it didn't hit anything, given the information wasn't presented. |
![]() |
![]() |
Message no. 276[Branch from no. 275] Posted by Samuel Douglas Davis (s85850014) on Monday, April 18, 2005 5:48pm Subject: Re: Midterm Q3b In message 275 on Monday, April 18, 2005 5:15pm, Danelle Abra Wettstein writes: >In message 273 on Monday, April 18, 2005 4:29pm, Samuel Douglas Davis writes: >>I don't really understand why the value of s5 is 0. We visited it before >>and must have carried out some action, and if we went right or crashed >>into a wall it would have a non-zero Q-value, wouldn't it? > >It could have moved down. I guess we just make the assumption that it didn't hit anything, given the >information wasn't presented. Actually, I think I was wrong. Even if we went right and received a negative reward, the value of the other actions would still be zero, so the max of those values is still zero. |
![]() |
![]() |
Message no. 277[Branch from no. 268] Posted by David Poole (cpsc_422_term2) on Monday, April 18, 2005 8:25pm Subject: Re: Final, Question 10c In message 268 on Monday, April 18, 2005 2:43pm, Danelle Abra Wettstein writes: >I was going through this question, and have come across what I believe is an error. > >Going to the 3rd equation, after you sum out S, you get f(A,C). Then maximize over >shutdown and get f(c,utility)... so that leaves P(C), f(A,C) and f(C, Utility). Then it sums >out C... that leaves f(A, Utility)... but the answer in the solutions just says f(Utility). It's >my assumption that A should have been summed out sometime before C, right? (or we >can just sum out A now... but the bottom line is A should have been summed out) > >Let me know if I'm somewhat right :) Yes, it is wrong. You need to sum out all of the variables apart from A and Shutdown. You can sum out these variables in any order. Then you need to maximize shutdown. Then you sum out A. Sorry about that. I will correct it. (So if someone loads the solutions later tonight they may not understand this thread). David |
![]() |
![]() |
Message no. 278[Branch from no. 269] Posted by David Poole (cpsc_422_term2) on Monday, April 18, 2005 9:21pm Subject: Re: Discounted reward In message 269 on Monday, April 18, 2005 2:45pm, Danelle Abra Wettstein writes: >I'm a bit confused as to the purpose of discounted reward. My thoughts were that the >values got more refined as the algorithms/applets ran, so why are the more refined, >future values worth less than the original, unrefined values? I understand that discounted >reward solves the problem of infinite rewards, but is there another reason for Discounted >rewards besides that? > >Thanks. This has nothing to do with any algorithms. It is a way to comapre $1:00 now with $1:00 in a year's time. Think about how much is it worth to you now to get $1000 in a year. It would be less tha $1000, but more than, say, $500. A discount of gamma means that a reward of 1 in one time step is worth gamma to you now. And a reward of 1 in two time steps is worth gamma^2 to you now, etc. Does that make sense? David |
![]() |
![]() |
Message no. 279[Branch from no. 271] Posted by David Poole (cpsc_422_term2) on Monday, April 18, 2005 9:37pm Subject: Re: Midterm Q3b In message 271 on Monday, April 18, 2005 3:29pm, Danelle Abra Wettstein writes: >For Q[S4,Right], how is the answer 10+0.5(10-10) = 10 generated? Where did the >discount of 0.9 go, and how does S5 suddenly have a value of 10? S5 doesn't have a value of 10; it has a value of 0. You receive a reward of 10 from entering S5. That was what the bold text at the top of page 4 explained. David |
![]() |
![]() |
Message no. 280[Branch from no. 273] Posted by David Poole (cpsc_422_term2) on Monday, April 18, 2005 9:41pm Subject: Re: Midterm Q3b In message 273 on Monday, April 18, 2005 4:29pm, Samuel Douglas Davis writes: >I don't really understand why the value of s5 is 0. We visited it before >and must have carried out some action, and if we went right or crashed >into a wall it would have a non-zero Q-value, wouldn't it? No. It was only the second time though, and even though one of the Q values was negative (because it crashed) one of the other Q-values would be zero, so the future value would be zero. David |
![]() |
![]() |
Message no. 281[Branch from no. 257] Posted by Frank Hutter (s62336011) on Monday, April 18, 2005 10:26pm Subject: Re: Assignment 5 solutions Thanks for pointint that out. I was so sure I attached it ... anyhow, it's there now. Frank |
![]() |
![]() |
Message no. 282 Posted by William Hoy Fong (s77957017) on Monday, April 18, 2005 11:14pm Subject: assignment pick up have the assignments been marked yet? is there a chance that we could pick them up tomorrow (tuesday)? if so when and where will they be available? |
![]() |
![]() |
Message no. 283[Branch from no. 282] Posted by Onur Komili (s88435045) on Tuesday, April 19, 2005 12:42am Subject: Re: assignment pick up On a related note, have projects been marked? Onur |
![]() |
![]() |
Message no. 284 Posted by Stanley Chi Hong Tso (s58635020) on Tuesday, April 19, 2005 4:59am Subject: May I have a softcopy of the midterm exam? I would like to have a copy of the midterm exam if possible, so I can print it out and redo them for study thanks |
![]() |
![]() |
Message no. 285[Branch from no. 282] Posted by David Poole (cpsc_422_term2) on Tuesday, April 19, 2005 9:22am Subject: Re: assignment pick up In message 282 on Monday, April 18, 2005 11:14pm, William Hoy Fong writes: >have the assignments been marked yet? is there a chance that we could >pick them up tomorrow (tuesday)? if so when and where will they be >available? They will be available from outside of my door at 12:30. The projects have not been marked (sorry). David |
![]() |
![]() |
Message no. 286[Branch from no. 284] Posted by David Poole (cpsc_422_term2) on Tuesday, April 19, 2005 9:43am Subject: Re: May I have a softcopy of the midterm exam? In message 284 on Tuesday, April 19, 2005 4:59am, Stanley Chi Hong Tso writes: >I would like to have a copy of the midterm exam if possible, so I can print it out and redo >them for study > >thanks I just posted an HTML version at: http://www.cs.ubc.ca/spider/poole/cs422/2005/exams/mid.html David |
![]() |
![]() |
Message no. 287 Posted by Robin McQuinn (s12331039) on Tuesday, April 19, 2005 9:50am Subject: Applet XML Loading I can't get the Belief and Descision Network applet to load local XML files. Is this indeed possible using the entirety of the document path? And if so, Is the option the "open location" menu option? How else can the XML files be loaded? Or are we expected to scan through the XML files and extract the info visually, which is what I have been doing. Thanks lots |
![]() |
![]() |
Message no. 288[Branch from no. 287] Posted by Kaili Elizabeth Vesik (s83834010) on Tuesday, April 19, 2005 10:13am Subject: Re: Applet XML Loading Are you using the version on the web, or have you downloaded it? If you download the applet and run it yourself, there is a File menu option called "Load Graph", which opens a file chooser and lets you select your xml file to load. |
![]() |
![]() |
Message no. 289[Branch from no. 288] Posted by Robin McQuinn (s12331039) on Tuesday, April 19, 2005 10:35am Subject: Re: Applet XML Loading Brilliant! Didn't do that yet, because it seems to change so much! That had the side effect though, of learning the XML representation, which is obvious enough! thanks |
![]() |
![]() |
Message no. 290[Branch from no. 286] Posted by Onur Komili (s88435045) on Tuesday, April 19, 2005 11:57am Subject: Re: May I have a softcopy of the midterm exam? Excellent, thank you. Onur |
![]() |
![]() |
Message no. 291 Posted by Onur Komili (s88435045) on Tuesday, April 19, 2005 12:57pm Subject: Midterm, Question #2 In the solutions, David wrote the following for #2 Q[s13,a2] = 0.8 * 0.8 * ( 0 + 0.9 * 2) -- up, no treasure + 0.8 * 0.2 * 0.25 * ( 0 + 0.9 * 7) -- up, treasure at top right + 0.1 * 0.8 * ( 0.2 * -10 + 0.9 * 0) -- left, no treasure + 0.1 * 0.2 * ( 0.2 * -10 + 0.9 * 0) -- left, treasure appears + 0.1 * 0.2 * 0.25 (10 + 0.9*0) -- right, treasure appears there I'm assuming he's using the Value Iteration algorithm ( http://www.cs.ubc.ca/spider/poole/ci2/excerpts/decisionprocesses.pdf top of page 7 ) however his solution above doesn't match up with the algorithm. Where does the second 0.8 in the first line come from, and where does the 3rd 0.25 come from in the 2nd and 5th lines? According to the algorithm it says... P(s'|a, s)(r(s, a, s') + γ Vk−1(s')) (pardon my lack of formatting, just look at page 7 for exact algo) If we're doing the first line, it should be. P(s'|a,s) = 0.8 (probability of horizontal/vertical move) r(s, a, s') = 0 (no reward since not in a corner) γ = 0.9 (given to us) Vk−1(s') = 2 (given on grid) Wouldn't that mean the first line should be 0.8 * ( 0 + 0.9 * 2 ) + ... ? Please help! Thanks in advance, Onur |
![]() |
![]() |
Message no. 292[Branch from no. 291] Posted by Robin McQuinn (s12331039) on Tuesday, April 19, 2005 1:08pm Subject: Re: Midterm, Question #2 Hi Onur, that had me flummoxed for a bit too, but the answer is related to the specific states for which there are non-zero rewards. The probabilities come from the following: Line 1: P(s'|a,s) = P(moving up) * P(treasure doesn't appear) Line 2: P(s'|a,s) = P(moving up) * P(treasure appears) * P(treasure in top right) Line 3: P(s'|a,s) = P(moving left) * P(treasure doesn't appear) Line 4: P(s'|a,s) = P(moving left) * P(treasure appears) Line 5: P(s'|a,s) = P(moving right) * P(treasure appears) * P(treasure in top right) remember, if a treasure appears there is only a .25 probabaility that it will appear in the top right corner. Hope that helps Robin |
![]() |
![]() |
Message no. 293[Branch from no. 292] Posted by Onur Komili (s88435045) on Tuesday, April 19, 2005 1:13pm Subject: Re: Midterm, Question #2 Ahh... I was starting to think about that after posting. So I assume that for the line... + 0.1 * 0.8 * ( 0.2 * -10 + 0.9 * 0) -- left, no treasure the 0.2 * -10 is the fact that there's a 0.2 probability that this particular monster will check if the person landed, and if so they get the -10 reward. That makes more sense. Thank you. Onur |
![]() |
![]() |
Message no. 294 Posted by Daniel Joseph Anderson (s76045996) on Tuesday, April 19, 2005 1:14pm Subject: reminder for Dr. Poole from before: In message 203 on Saturday, April 2, 2005 10:24pm, Kaili Elizabeth Vesik writes: >David, > >You mentioned in class that any assignments done prior to the midterm >with grades lower than our midterm grade would have their grades >increased to the value of our midterm grade. Should we expect to see >this reflected in the "grades" section of webct, or will it be >considered only when you calculate our final marks? > >Thanks. >Kaili It will be reflected in my program to compute grades. (But it might be good to remind me closer to the final exam ;^) David |
![]() |
![]() |
Message no. 295 Posted by Daniel Joseph Anderson (s76045996) on Tuesday, April 19, 2005 1:21pm Subject: Chapters on the final The course webpage says "6, 7, 9, 10, 11 and 12" but I know there's some stuff in there that we didn't cover in class. Are we expected to know all of each of these chapters, or only some sections of each? Thanks, -Dan |
![]() |
![]() |
Message no. 296[Branch from no. 287] Posted by David Poole (cpsc_422_term2) on Tuesday, April 19, 2005 1:57pm Subject: Re: Applet XML Loading In message 287 on Tuesday, April 19, 2005 9:50am, Robin McQuinn writes: >I can't get the Belief and Descision Network applet to load local XML >files. Is this indeed possible using the entirety of the document path? > And if so, Is the option the "open location" menu option? >How else can the XML files be loaded? Or are we expected to scan >through the XML files and extract the info visually, which is what I >have been doing. > >Thanks lots If it is run as an application (the windows executable or the jar file), you should be able to load xml files. It works for me. Otherwise you can just copy the whole file into the clipboeard and paste into the view/edit text representations. Let us know if this doesn't work. David |
![]() |
![]() |
Message no. 297[Branch from no. 295] Posted by David Poole (cpsc_422_term2) on Tuesday, April 19, 2005 1:59pm Subject: Re: Chapters on the final In message 295 on Tuesday, April 19, 2005 1:21pm, Daniel Joseph Anderson writes: >The course webpage says "6, 7, 9, 10, 11 and 12" but I know there's some >stuff in there that we didn't cover in class. Are we expected to know >all of each of these chapters, or only some sections of each? > >Thanks, >-Dan Only what we covered in class. David |
![]() |
![]() |
Message no. 298[Branch from no. 286] Posted by Stanley Chi Hong Tso (s58635020) on Tuesday, April 19, 2005 5:04pm Subject: Re: May I have a softcopy of the midterm exam? Nice, thanks |
![]() |
![]() |
Message no. 299 Posted by Onur Komili (s88435045) on Tuesday, April 19, 2005 8:01pm Subject: Practice Final #10c I'm having a really hard time figuring out when/how to do factorization. I know it's probably too late at this point, but if anyone wants to attempt to explain this to me and others that are probably unsure as well but too embaressed to ask I'd really appreciate it. I understand what maximizing is and how to do it. I understand what "summing out" is. What I don't know is how to choose what needs to be factored and why. I tried reading the notes and the book and it's just not clicking in my brain at all. If anyone wants to explain why the solution for question 10c is the way it is I'd appreciate it. Thanks in advance, Onur |
![]() |
![]() |
Message no. 300[Branch from no. 299] Posted by David Poole (cpsc_422_term2) on Tuesday, April 19, 2005 8:26pm Subject: Re: Practice Final #10c In message 299 on Tuesday, April 19, 2005 8:01pm, Onur Komili writes: >I'm having a really hard time figuring out when/how to do factorization. I know it's >probably too late at this point, but if anyone wants to attempt to explain this to me and >others that are probably unsure as well but too embaressed to ask I'd really appreciate it. > >I understand what maximizing is and how to do it. I understand what "summing out" is. >What I don't know is how to choose what needs to be factored and why. I tried reading >the notes and the book and it's just not clicking in my brain at all. > >If anyone wants to explain why the solution for question 10c is the way it is I'd >appreciate it. > >Thanks in advance, > >Onur Sum out all of the random variables that are not the parent of a decision node. These can be done in any order. You should have a decision variable with its parent; maximize this. Repeat till there are no more decision nodes. Then sum out the remaining variables: the resulting number is the expected utility. Does this make sense? David |
![]() |
![]() |
Message no. 301[Branch from no. 300] Posted by Onur Komili (s88435045) on Tuesday, April 19, 2005 8:51pm Subject: Re: Practice Final #10c That makes a lot more sense. In your ch10/lect6.pdf page 16 slides you say something similar but it didn't make any sense at the time and even reading it now it doesn't make sense. Perhaps rewording it to say something similar may make it a little more clear for future semesters. I just tested it out on the practice final question and it seems to work out. I just wish I asked this before assignment 5. Ah well, better late than never I suppose. Thanks for clearing that up, Onur |
![]() |
![]() |
Message no. 302 Posted by Danelle Abra Wettstein (s86800018) on Tuesday, April 19, 2005 11:26pm Subject: ?? Is anyone as terrified as me? |
![]() |
![]() |
Message no. 303[Branch from no. 302] Posted by David Burns Cameron (s66878984) on Tuesday, April 19, 2005 11:49pm Subject: Re: ?? yes. |
![]() |
![]() |
Message no. 304[Branch from no. 303] Posted by Vivian Luk (s82215013) on Wednesday, April 20, 2005 12:33am Subject: Re: ?? ditto :( |
![]() |
![]() |
Message no. 305[Branch from no. 304] Posted by Onur Komili (s88435045) on Wednesday, April 20, 2005 12:34am Subject: Re: ?? More so than any other exam I've ever had since highschool.... Onur |
![]() |
![]() |
Message no. 306[Branch from no. 305] Posted by Ryan Yee (s81483042) on Wednesday, April 20, 2005 12:36am Subject: Re: ?? Game over man!! Game over! |
![]() |
![]() |
Message no. 307[Branch from no. 306] Posted by Stanley Chi Hong Tso (s58635020) on Wednesday, April 20, 2005 1:22am Subject: Re: ?? I'm sure I'll be f***ed after the exam. |
![]() |
![]() |
Message no. 308[Branch from no. 307] Posted by Daniel Wen-Yen Chang (s81965014) on Wednesday, April 20, 2005 1:29am Subject: Re: ?? i just hope that the actual final's difficulty will be like the sample one and yes..I'm terrified too |
![]() |
![]() |
Message no. 309[Branch from no. 308] Posted by Wing Hang Chan (s84098011) on Wednesday, April 20, 2005 2:18am Subject: Re: ?? lol is this the pre-exam anxiety thread? wish everyone good luck this afternoon :) |
![]() |
![]() |
Message no. 310[Branch from no. 306] Posted by Kaili Elizabeth Vesik (s83834010) on Wednesday, April 20, 2005 12:15pm Subject: Re: ?? In message 306 on Wednesday, April 20, 2005 12:36am, Ryan Yee writes: >Game over man!! Game over! Expresses my feelings EXACTLY. :) (I don't really know why I added a smiley there... could be that exam-fear-delirium I'm feeling). |
![]() |
![]() |
Message no. 311[Branch from no. 302] Posted by Christopher John Hawkins (s93985018) on Wednesday, April 20, 2005 1:19pm Subject: Re: ?? Game over indeed! impending doom... |
![]() |
![]() |
Message no. 312[Branch from no. 311] Posted by Michael Nightingale (s98742018) on Wednesday, April 20, 2005 1:22pm Subject: Re: ?? and to think that my graduation hinges on decision networks ;( lol |
![]() |
![]() |
Message no. 313[Branch from no. 312] Posted by Stephen Shui Fung Mak (s36743003) on Wednesday, April 20, 2005 2:45pm Subject: Re: ?? In message 312 on Wednesday, April 20, 2005 1:22pm, Michael Nightingale writes: >and to think that my graduation hinges on decision networks ;( > >lol LOL...same feeling here |
![]() |
![]() |
Message no. 314[Branch from no. 313] Posted by Daniel Gayo McLaren (s40871022) on Wednesday, April 20, 2005 3:01pm Subject: Re: ?? I had an instructor that used to tell us to bring Kleenex to his midterms/exams because there would be lots of crying. I'm bringing lots for this one. Good luck everyone! |
![]() |
![]() |
Message no. 315[Branch from no. 313] Posted by Danelle Abra Wettstein (s86800018) on Wednesday, April 20, 2005 3:09pm Subject: Re: ?? In message 313 on Wednesday, April 20, 2005 2:45pm, Stephen Shui Fung Mak writes: >In message 312 on Wednesday, April 20, 2005 1:22pm, Michael >Nightingale writes: >>and to think that my graduation hinges on decision networks ;( >> >>lol > >LOL...same feeling here Me too! And my extended family has already bought non-refundable tickets to Vancouver in May! |
![]() |
![]() |
Message no. 316[Branch from no. 315] Posted by David Burns Cameron (s66878984) on Wednesday, April 20, 2005 7:49pm Subject: Re: ?? I survived! And it wasn't nearly as bad as I thought! Thank You, Practice Final!!! |
![]() |
![]() |
Message no. 317 Posted by Vivian Luk (s82215013) on Monday, April 25, 2005 12:50am Subject: Midterm total Is the midterm out of 55 or 60? (WebCT says 60 but counting up points on midterm add up to 55) Tks :) Vivian |
![]() |
![]() |
Message no. 318[Branch from no. 317] Posted by Kaili Elizabeth Vesik (s83834010) on Tuesday, April 26, 2005 7:38am Subject: Re: Midterm total I'm at my parents' house right now (without my exam), so I don't know for certain, but I'm pretty sure there was a comment during the test about how one of the questions has an incorrect points value. If nobody else has answered this by the time I get back to my place, I will check the exam and let you know. Kaili |
![]() |
![]() |
Message no. 319[Branch from no. 317] Posted by David Poole (cpsc_422_term2) on Tuesday, April 26, 2005 9:51am Subject: Re: Midterm total In message 317 on Monday, April 25, 2005 12:50am, Vivian Luk writes: >Is the midterm out of 55 or 60? (WebCT says 60 but counting up points >on midterm add up to 55) The questions are worth 10, 10, 25, 15 which adds up to 60. If anyone has marks for assignments missing, please let me know. David |
![]() |
![]() |
Message no. 320 Posted by Michael Nightingale (s98742018) on Wednesday, April 27, 2005 12:52pm Subject: Final Marks Just wanted to mention that I noticed final exam marks are up on the SSC, so you can check out how you did. Have a great summer, and if applicable, a fun-filled graduation (see you there) :D |
![]() |
![]() |
Message no. 321[Branch from no. 320] Posted by David Poole (cpsc_422_term2) on Wednesday, April 27, 2005 2:24pm Subject: Re: Final Marks In message 320 on Wednesday, April 27, 2005 12:52pm, Michael Nightingale writes: >Just wanted to mention that I noticed final exam marks are up on the >SSC, so you can check out how you did. > >Have a great summer, and if applicable, a fun-filled graduation (see you >there) :D The final grades have been submitted (yesterday), so you should be able to access them (but I'm not sure when they release them). Thanks all. Have a good summer, David |
![]() |
![]() |
Message no. 322[Branch from no. 320] Posted by David Poole (cpsc_422_term2) on Wednesday, April 27, 2005 2:27pm Subject: Re: Final Marks In message 320 on Wednesday, April 27, 2005 12:52pm, Michael Nightingale writes: >Just wanted to mention that I noticed final exam marks are up on the >SSC, so you can check out how you did. That's interesting, WebCT claims they are not released (and I can't check this). Can you really see them? David |
![]() |
![]() |
Message no. 323[Branch from no. 322] Posted by Stephen Shui Fung Mak (s36743003) on Wednesday, April 27, 2005 3:44pm Subject: Re: Final Marks Yes! It's out! =D Stephen In message 322 on Wednesday, April 27, 2005 2:27pm, David Poole writes: >In message 320 on Wednesday, April 27, 2005 12:52pm, Michael Nightingale >writes: >>Just wanted to mention that I noticed final exam marks are up on the >>SSC, so you can check out how you did. > >That's interesting, WebCT claims they are not released (and I can't >check this). Can you really see them? > >David > |
![]() |
![]() |
Message no. 324[Branch from no. 323] Posted by Daniel Gayo McLaren (s40871022) on Wednesday, April 27, 2005 9:01pm Subject: Re: Final Marks I can't see the final exam mark on WebCT, but I can see my final mark on the Student Service Center website. Happy summer! |
![]() |
Download
Close
![]() |