Third edition of Artificial Intelligence: foundations of computational agents, Cambridge University Press, 2023 is now available (including the full text).

7.2 Supervised Learning

An abstract definition of supervised learning is as follows. Assume the learner is given the following data:

  • a set of input features, X1,...,Xn;
  • a set of target features, Y1,...,Yk;
  • a set of training examples, where the values for the input features and the target features are given for each example; and
  • a set of test examples, where only the values for the input features are given.

The aim is to predict the values of the target features for the test examples and as-yet-unseen examples. Typically, learning is the creation of a representation that can make predictions based on descriptions of the input features of new examples.

If e is an example, and F is a feature, let val(e,F) be the value of feature F in example e.


Example Author Thread Length Where Read User Action
e1 known new long home skips
e2 unknown new short work reads
e3 unknown follow Up long work skips
e4 known follow Up long home skips
e5 known new short home reads
e6 known follow Up long work skips
e7 unknown follow Up short work skips
e8 unknown new short work reads
e9 known follow Up long home skips
e10 known new long work skips
e11 unknown follow Up short home skips
e12 known new long work skips
e13 known follow Up short home reads
e14 known new short work reads
e15 known new short home reads
e16 known follow Up short work reads
e17 known new short home reads
e18 unknown new short work reads
e19 unknown new long work ?
e20 unknown follow Up long home ?
Figure 7.1: Examples of a user's preferences. These are some training and test examples obtained from observing a user deciding whether to read articles posted to a threaded discussion board depending on whether the author is known or not, whether the article started a new thread or was a follow-up, the length of the article, and whether it is read at home or at work. e1,...,e18 are the training examples. The aim is to make a prediction for the user action on e19, e20, and other, currently unseen, examples.

Example 7.1: Figure 7.1 shows training and test examples typical of a classification task. The aim is to predict whether a person reads an article posted to a bulletin board given properties of the article. The input features are Author, Thread, Length, and Where Read. There is one target feature, User Action. There are eighteen training examples, each of which has a value for all of the features.

In this data set, val(e11,Author)=unknown, val(e11,Thread)=follow Up, and val(e11,UserAction)=skips.

The aim is to predict the user action for a new example given its values for the input features.

The most common way to learn is to have a hypothesis space of all possible representations. Each possible representation is a hypothesis. The hypothesis space is typically a large finite, or countably infinite, space. A prediction is made using one of the following:

  • the best hypothesis that can be found in the hypothesis space according to some measure of better,
  • all of the hypotheses that are consistent with the training examples, or
  • the posterior probability of the hypotheses given the evidence provided by the training examples.

One exception to this paradigm is in case-based reasoning, which uses the examples directly.