Third edition of Artificial Intelligence: foundations of computational agents, Cambridge University Press, 2023 is now available (including the full text).
6.1.3 Conditional Probability
Typically, we do not only want to know the prior probability of some proposition, but we want to know how this belief is updated when an agent observes new evidence.
The measure of belief in proposition h based on proposition e is called the conditional probability of h given e, written P(h|e).
A formula e representing the conjunction of all of the agent's observations of the world is called evidence. Given evidence e, the conditional probability P(h|e) is the agent's posterior probability of h. The probability P(h) is the prior probability of h and is the same as P(h|true) because it is the probability before the agent has observed anything.
The posterior probability involves conditioning on everything the agent knows about a particular situation. All evidence must be conditioned on to obtain the correct posterior probability.
Other Possible Measures of Belief
Justifying other measures of belief is problematic. Consider, for example, the proposal that the belief in α∧β is some function of the belief in α and the belief in β. Such a measure of belief is called compositional. To see why this is not sensible, consider the single toss of a fair coin. Compare the case where α1 is "the coin will land heads" and β1 is "the coin will land tails" with the case where α2 is "the coin will land heads" and β2 is "the coin will land heads." For these two cases, the belief in α1 would seem to be the same as the belief in α2, and the belief in β1 would be the same as the belief in β2. But the belief in α1 ∧β1, which is impossible, is very different from the belief in α2 ∧β2, which is the same as α2.
The conditional probability P(f|e) is very different from the probability of the implication P(e →f). The latter is the same as P(¬e ∨ f), which is the measure of the interpretations for which f is true or e is false. For example, suppose you have a domain where birds are relatively rare, and non-flying birds are a small proportion of the birds. Here P(¬flies | bird) would be the proportion of birds that do not fly, which would be low. P(bird →¬flies) is the same as P(¬bird ∨ ¬flies), which would be dominated by non-birds and so would be high. Similarly, P(bird →flies) would also be high, the probability also being dominated by the non-birds. It is difficult to imagine a situation where the probability of an implication is the kind of knowledge that is appropriate or useful.