Third edition of Artificial Intelligence: foundations of computational agents, Cambridge University Press, 2023 is now available (including the full text).
7.7 Learning as Refining the Hypothesis Space
So far, learning is either choosing the best representation - for example, the best decision tree or the best values for parameters in a neural network - or predicting the value of the target features of a new case from a database of previous cases. This section considers a different notion of learning, namely learning as delineating those hypotheses that are consistent with the examples. Rather than choosing a hypothesis, the aim is to find all hypotheses that are consistent. This investigation will shed light on the role of a bias and provide a mechanism for a theoretical analysis of the learning problem.
We make three assumptions:
- There is a single target feature, Y, that is Boolean. This is not really a restriction for classification, because any discrete feature can be made into Boolean features using indicator variables.
- The hypotheses make definitive predictions, predicting true or false for each example, rather than probabilistic prediction.
- There is no noise in the data.
Given these assumptions, it is possible to write a hypothesis in terms of a proposition, where the primitive propositions are assignments to the input features.
For the rest of this section, we write this more simply as
The goal is to try to find a proposition on the input features that correctly classifies the training examples.
article | Crime | Academic | Local | Music | Reads |
a1 | true | false | false | true | true |
a2 | true | false | false | false | true |
a3 | false | true | false | false | false |
a4 | false | false | true | false | false |
a5 | true | true | false | false | true |
The aim is to learn which articles the user reads.
In this example, reads is the target feature, and the aim is to find a definition such as
reads ↔crime ∧(¬academic ∨ ¬music) .
This definition may be used to classify the training examples as well as future examples.
Hypothesis space learning assumes the following sets:
- I, the instance space, is the set of all possible examples.
- H, the hypothesis space, is a set of Boolean functions on the input features.
- E⊆I is the set of training examples. Values for the input features and the target feature are given for the training example.
If h∈H and i∈I, we write h(i) to mean the value that h predicts for i on the target variable Y.
The hypothesis space H could be all Boolean combinations of the input features or could be more restricted, such as conjunctions or propositions defined in terms of fewer than three features.
In Example 7.23, the training examples are E={a1,a2,a3,a4,a5}. The target feature is Reads. Because the table specifies some of the values of this feature, and the learner will make predictions on unseen cases, the learner requires a bias. In hypothesis space learning, the bias is imposed by the hypothesis space.
Hypothesis h is consistent with a set of training examples E if ∀e ∈E, h accurately predicts the target feature of e. That is, h(e)=val(e,Y); the predicted value is the same as the actual value for each example. The problem is to find the subset of H or just an element of H consistent with all of the training examples.