Third edition of Artificial Intelligence: foundations of computational agents, Cambridge University Press, 2023 is now available (including the full text).
7.2 Supervised Learning
An abstract definition of supervised learning is as follows. Assume the learner is given the following data:
- a set of input features, X1,...,Xn;
- a set of target features, Y1,...,Yk;
- a set of training examples, where the values for the input features and the target features are given for each example; and
- a set of test examples, where only the values for the input features are given.
The aim is to predict the values of the target features for the test examples and as-yet-unseen examples. Typically, learning is the creation of a representation that can make predictions based on descriptions of the input features of new examples.
If e is an example, and F is a feature, let val(e,F) be the value of feature F in example e.
Example | Author | Thread | Length | Where Read | User Action |
e1 | known | new | long | home | skips |
e2 | unknown | new | short | work | reads |
e3 | unknown | follow Up | long | work | skips |
e4 | known | follow Up | long | home | skips |
e5 | known | new | short | home | reads |
e6 | known | follow Up | long | work | skips |
e7 | unknown | follow Up | short | work | skips |
e8 | unknown | new | short | work | reads |
e9 | known | follow Up | long | home | skips |
e10 | known | new | long | work | skips |
e11 | unknown | follow Up | short | home | skips |
e12 | known | new | long | work | skips |
e13 | known | follow Up | short | home | reads |
e14 | known | new | short | work | reads |
e15 | known | new | short | home | reads |
e16 | known | follow Up | short | work | reads |
e17 | known | new | short | home | reads |
e18 | unknown | new | short | work | reads |
e19 | unknown | new | long | work | ? |
e20 | unknown | follow Up | long | home | ? |
In this data set, val(e11,Author)=unknown, val(e11,Thread)=follow Up, and val(e11,UserAction)=skips.
The aim is to predict the user action for a new example given its values for the input features.
The most common way to learn is to have a hypothesis space of all possible representations. Each possible representation is a hypothesis. The hypothesis space is typically a large finite, or countably infinite, space. A prediction is made using one of the following:
- the best hypothesis that can be found in the hypothesis space according to some measure of better,
- all of the hypotheses that are consistent with the training examples, or
- the posterior probability of the hypotheses given the evidence provided by the training examples.
One exception to this paradigm is in case-based reasoning, which uses the examples directly.