Artificial Intelligence - foundations of computational agents -- 7.2 Supervised Learning

Third edition of Artificial Intelligence: foundations of computational agents, Cambridge University Press, 2023 is now available (including the full text).

7.2 Supervised Learning

An abstract definition of supervised learning is as follows. Assume the learner is given the following data:

a set of input features, X₁,...,X_n;
a set of target features, Y₁,...,Y_k;
a set of training examples, where the values for the input features and the target features are given for each example; and
a set of test examples, where only the values for the input features are given.

The aim is to predict the values of the target features for the test examples and as-yet-unseen examples. Typically, learning is the creation of a representation that can make predictions based on descriptions of the input features of new examples.

If e is an example, and F is a feature, let val(e,F) be the value of feature F in example e.

Example	Author	Thread	Length	Where Read	User Action
e₁	known	new	long	home	skips
e₂	unknown	new	short	work	reads
e₃	unknown	follow Up	long	work	skips
e₄	known	follow Up	long	home	skips
e₅	known	new	short	home	reads
e₆	known	follow Up	long	work	skips
e₇	unknown	follow Up	short	work	skips
e₈	unknown	new	short	work	reads
e₉	known	follow Up	long	home	skips
e₁₀	known	new	long	work	skips
e₁₁	unknown	follow Up	short	home	skips
e₁₂	known	new	long	work	skips
e₁₃	known	follow Up	short	home	reads
e₁₄	known	new	short	work	reads
e₁₅	known	new	short	home	reads
e₁₆	known	follow Up	short	work	reads
e₁₇	known	new	short	home	reads
e₁₈	unknown	new	short	work	reads
e₁₉	unknown	new	long	work	?
e₂₀	unknown	follow Up	long	home	?

Figure 7.1: Examples of a user's preferences. These are some training and test examples obtained from observing a user deciding whether to read articles posted to a threaded discussion board depending on whether the author is known or not, whether the article started a new thread or was a follow-up, the length of the article, and whether it is read at home or at work. e₁,...,e₁₈ are the training examples. The aim is to make a prediction for the user action on e₁₉, e₂₀, and other, currently unseen, examples.

Example 7.1: Figure 7.1 shows training and test examples typical of a classification task. The aim is to predict whether a person reads an article posted to a bulletin board given properties of the article. The input features are Author, Thread, Length, and Where Read. There is one target feature, User Action. There are eighteen training examples, each of which has a value for all of the features.

In this data set, val(e₁₁,Author)=unknown, val(e₁₁,Thread)=follow Up, and val(e₁₁,UserAction)=skips.

The aim is to predict the user action for a new example given its values for the input features.

The most common way to learn is to have a hypothesis space of all possible representations. Each possible representation is a hypothesis. The hypothesis space is typically a large finite, or countably infinite, space. A prediction is made using one of the following:

the best hypothesis that can be found in the hypothesis space according to some measure of better,
all of the hypotheses that are consistent with the training examples, or
the posterior probability of the hypotheses given the evidence provided by the training examples.

One exception to this paradigm is in case-based reasoning, which uses the examples directly.