Dynamic Bayesian Networks (DBNs)

Dynamic Bayesian Networks (DBNs) are directed graphical models of stochastic processes. They generalise hidden Markov models (HMMs) and linear dynamical systems by representing the hidden (and observed) state in terms of state variables, which can have complex interdependencies. The graphical structure provides an easy way to specify these conditional independencies, and hence to provide a compact parameterization of the model.

The general inference problem for DBNs is to compute P(X(i,t) | Y(:, t1:t2)), where X(i,t) represents the i'th hidden variable at time and t Y(:,t1:t2) represents all the evidence between times t1 and t2. (Of course, we might want to compute joint distributions over sets of hidden variables, especially if we want to do learning.) There are several special cases of interest:

Online filtering: P(X(i,t) | Y(:, 1:t))
Offline fixed interval smoothing: P(X(i,t) | Y(:, 1:T)), where T is the length of the sequence
Online fixed lag smoothing: P(X(i,t) | Y(:, t-lag:t))
Online prediction: P(X(i,t+lead) | Y(:, 1:t))

We discuss how to compute these quantities using different engines below.

Note that "temporal Bayesian network" would be a better name than "dynamic Bayesian network", since it is assumed that the model structure does not change, but the term DBN has become entrenched. We also normally assume that the parameters do not change, i.e., the model is time-invariant. However, we can always add extra hidden nodes to represent the current "regime", thereby creating mixtures of models to capture periodic non-stationarities.

There are some cases where the size of the state space can change over time, e.g., tracking a variable, but unknown, number of objects. In this case, we need to change the model structure over time. BNT does not support this, but see the following paper for a discussion of some of the issues:

Dynamic belief networks for discrete monitoring, A. E. Nicholson and J. M. Brady. IEEE Systems, Man and Cybernetics, 24(11):1593-1610, 1994.

How to use DBNs with the Bayes Net Toolbox

Model specification

HMMs
Kalman filters
Coupled HMMs
Water network
BAT network

Inference

Offline smoothing
Offline filtering
Online filtering
Online prediction
A summary of the different inference engines (dynamic)

Learning

Model specification

Hidden Markov Models (HMMs)

The simplest kind of DBN is a Hidden Markov Model (HMM), which has one discrete hidden node and one discrete or continuous observed node per slice. We illustrate this below. As before, circles denote continuous nodes, squares denote discrete nodes, clear means hidden, shaded means observed.

We have "unrolled" the model for three "time slices" -- the structure and parameters are assumed to repeat as the model is unrolled further. Hence to specify a DBN, we need to define the intra-slice topology (within a slice), the inter-slice topology (between two slices), as well as the parameters for the first two slices. (Such a two-slice temporal Bayes net is often called a 2TBN.)

We can specify the topology as follows.

intra = zeros(2);
intra(1,2) = 1; % node 1 in slice t connects to node 2 in slice t

inter = zeros(2);
inter(1,1) = 1; % node 1 in slice t-1 connects to node 1 in slice t

We can specify the parameters as follows, where for simplicity we assume the observed node is discrete.

Q = 2; % num hidden states
O = 2; % num observable symbols

ns = [Q O];
dnodes = 1:2;
bnet = mk_dbn(intra, inter, ns, dnodes);
for i=1:4
  bnet.CPD{i} = tabular_CPD(bnet, i);
end

We assume the distributions P(X(t) | X(t-1)) and P(Y(t) | X(t)) are independent of t for t > 1. Hence the CPD for nodes 5, 7, ... is the same as for node 3, so we say they are in the same equivalence class, with node 3 being the "representative" for this class. In other words, we have tied the parameters for nodes 3, 5, 7, ... Similarly, nodes 4, 6, 8, ... are tied. Note, however, that (the parameters for) nodes 1 and 2 are not tied to subsequent slices.

Above we assumed the observation model P(Y(t) | X(t)) is independent of t for t>1, but it is conventional to assume this is true for all t. So we would like to put nodes 2, 4, 6, ... all in the same class. We can do this by explicitely defining the equivalence classes, as follows.

We define eclass1(i) to be the equivalence class that node i in slice 1 belongs to. Similarly, we define eclass2(i) to be the equivalence class that node i in slice 2, 3, ..., belongs to. For an HMM, we have

eclass1 = [1 2];
eclass2 = [3 2];
eclass = [eclass1 eclass2];

This ties the observation model across slices, since e.g., eclass(4) = eclass(2) = 2.

By default, eclass1 = 1:ss, and eclass2 = (1:ss)+ss, where ss = slice size = the number of nodes per slice. But by using the above tieing pattern, we now only have 3 CPDs to specify, instead of 4:

bnet = mk_dbn(intra, inter, ns, dnodes, eclass1, eclass2);
prior0 = normalise(rand(Q,1));
transmat0 = mk_stochastic(rand(Q,Q));
obsmat0 = mk_stochastic(rand(Q,O));
bnet.CPD{1} = tabular_CPD(bnet, 1, prior0);
bnet.CPD{2} = tabular_CPD(bnet, 2, obsmat0);
bnet.CPD{3} = tabular_CPD(bnet, 3, transmat0);

We discuss how to do inference and learning on this model below. (See also my HMM toolbox, which is included with BNT.)

Some common variants on HMMs are shown below. BNT can handle all of these.

Linear Dynamical Systems (LDSs) and Kalman filters

A Linear Dynamical System (LDS) has the same topology as an HMM, but all the nodes are assumed to have linear-Gaussian distributions, i.e.,

   x(t+1) = A*x(t) + w(t),  w ~ N(0, Q),  x(0) ~ N(init_x, init_V)
   y(t)   = C*x(t) + v(t),  v ~ N(0, R)

Some simple variants are shown below.

We can create a regular LDS in BNT as follows.


intra = zeros(2);
intra(1,2) = 1;
inter = zeros(2);
inter(1,1) = 1;
n = 2;

X = 2; % size of hidden state
Y = 2; % size of observable state

ns = [X Y];
dnodes = [];
onodes = [2];
eclass1 = [1 2];
eclass2 = [3 2];
bnet = mk_dbn(intra, inter, ns, dnodes, eclass1, eclass2);

x0 = rand(X,1);
V0 = eye(X); % must be positive semi definite!
C0 = rand(Y,X);
R0 = eye(Y);
A0 = rand(X,X);
Q0 = eye(X);

bnet.CPD{1} = gaussian_CPD(bnet, 1, x0, V0);
bnet.CPD{2} = gaussian_CPD(bnet, 2, zeros(Y,1), R0, C0, 'full', 'untied', 'clamped_mean');
bnet.CPD{3} = gaussian_CPD(bnet, 3, zeros(X,1), Q0, A0, 'full', 'untied', 'clamped_mean');

We discuss how to do inference and learning on this model below. (See also my Kalman filter toolbox, which is included with BNT.)

Coupled HMMs

Here is an example of a coupled HMM with N=5 chains, unrolled for T=3 slices. Each hidden discrete node has a private observed Gaussian child.

We can make this using the function

Q = 2; % binary hidden nodes
discrete_obs = 0; % cts observed nodes
Y = 1; % scalar observed nodes
[bnet, onodes] = mk_chmm(N, Q, Y, discrete_obs);

We will use this model below to illustrate online prediction.

Water network

Consider the following model of a water purification plant, developed by Finn V. Jensen, Uffe Kjærulff, Kristian G. Olesen, and Jan Pedersen. (Click here for more details on this model. Following Boyen and Koller, we have added discrete evidence nodes.)

We now show how to specify this model in BNT.

ss = 12; % slice size
intra = zeros(ss);
intra(1,9) = 1;
intra(3,10) = 1;
intra(4,11) = 1;
intra(8,12) = 1;

inter = zeros(ss);
inter(1, [1 3]) = 1; % node 1 in slice 1 connects to nodes 1 and 3 in slice 2
inter(2, [2 3 7]) = 1;
inter(3, [3 4 5]) = 1;
inter(4, [3 4 6]) = 1;
inter(5, [3 5 6]) = 1;
inter(6, [4 5 6]) = 1;
inter(7, [7 8]) = 1;
inter(8, [6 7 8]) = 1;

onodes = 9:12; % observed
dnodes = 1:ss; % discrete
ns = 2*ones(1,ss); % binary nodes
eclass1 = 1:12;
eclass2 = [13:20 9:12];
eclass = [eclass1 eclass2];
bnet = mk_dbn(intra, inter, ns, dnodes, eclass1, eclass2);
for e=1:max(eclass)
  bnet.CPD{e} = tabular_CPD(bnet, e);
end

We have tied the observation parameters across all slices. Click here for a more complex example of parameter tieing.

BATnet

As an example of a more complicated DBN, consider the following example, which is a model of a car's high level state, as might be used by an automated car. (The model is from Forbes, Huang, Kanazawa and Russell, "The BATmobile: Towards a Bayesian Automated Taxi", IJCAI 95. The figure is from Boyen and Koller, "Tractable Inference for Complex Stochastic Processes", UAI98. For simplicity, we only show the observed nodes for slice 2.)

Since this topology is so complicated, it is useful to be able to refer to the nodes by name, instead of number.

names = {'LeftClr', 'RightClr', 'LatAct', ... 'Bclr', 'BYdotDiff'};
ss = length(names);

We can specify the intra-slice topology using a cell array as follows, where each row specifies a connection between two named nodes:

intrac = {...
   'LeftClr', 'LeftClrSens';
  'RightClr', 'RightClrSens';
  ...
  'BYdotDiff', 'BcloseFast'};

Finally, we can convert this cell array to an adjacency matrix using the following function:

[intra, names] = mk_adj_mat(intrac, names, 1);

This function also permutes the names so that they are in topological order. Given this ordering of the names, we can make the inter-slice connectivity matrix as follows:

interc = {...
   'LeftClr', 'LeftClr';
   'LeftClr', 'LatAct';
   ...
   'FBStatus', 'LatAct'};

inter = mk_adj_mat(interc, names, 0);

To refer to a node, we must know its number, which can be computed as in the following example:

obs = {'LeftClrSens', 'RightClrSens', 'TurnSignalSens', 'XdotSens', 'YdotSens', 'FYdotDiffSens', ...
      'FclrSens', 'BXdotSens', 'BclrSens', 'BYdotDiffSens'};
for i=1:length(obs)
  onodes(i) = stringmatch(obs{i}, names);
end
onodes = sort(onodes);

(We sort the onodes since most BNT routines assume that set-valued arguments are in sorted order.) We can now make the DBN:

dnodes = 1:ss; 
ns = 2*ones(1,ss); % binary nodes
bnet = mk_dbn(intra, inter, ns, dnodes);

To specify the parameters, we must know the order of the parents. See the function BNT/general/mk_named_CPT for a way to do this in the case of tabular nodes. For simplicity, we just generate random parameters:

for i=1:2*ss
  bnet.CPD{i} = tabular_CPD(bnet, i);
end

A complete version of this example is available in BNT/examples/dynamic/bat1.m.

Inference

The general inference problem for DBNs is to compute P(X(i,t) | Y(:, t1:t2)), where X(:,t) represents the i'th hidden variable at time and t Y(:,t1:t2) represents all the evidence between times t1 and t2. (Of course, we might want to compute joint distibutions over sets of hidden variables, esp. if we want to do learning.) There are several special cases of interest:

Online filtering: P(X(i,t) | Y(:, 1:t))
Offline fixed interval smoothing: P(X(i,t) | Y(:, 1:T)), where T is the length of the sequence
Online fixed lag smoothing: P(X(i,t) | Y(:, t-lag:t))
Online prediction: P(X(i,t+lead) | Y(:, 1:t))

Not all engines support all these functions, as we detail below.

If all the hidden nodes are linear-Gaussian, and the observed nodes are linear-Gaussian, the model is a linear dynamical system, and we can use the Kalman filter to do inference. If all hidden nodes are discrete, we can use the HMM filter, or the junction tree algorithm, to do inference (the observed nodes can be discrete or continuous). BNT does not currently have any engines capable of doing inference in models which do not fall into these two categories (e.g., non-linear, non-Gaussian models); particle filtering is one possible approach.

Unfortunately, even if all the hiddden nodes are discrete, and the model is sparse, exact inference takes O(2^n) time, where n is the number of hidden nodes per slice. The basic reason for this is that two nodes become correlated, even if there is no direct connection between them in the 2TBN, by virtue of sharing common ancestors in the past. Hence we need need to use approximations. A popular one, known as BK, is described in

Tractable inference for complex stochastic processes , Boyen and Koller, UAI 1998
Approximate learning of dynamic models, Boyen and Koller, NIPS 1998.

This approximates the belief state with a product of marginals on a specified set of clusters. For example, in the water network, we might use the following clusters:

engine = bk_inf_engine(bnet, { [1 2], [3 4 5 6], [7 8] }, onodes);

This engine can now be used just like the jtree engine. Two special cases of the BK algorithm are supported: 'ff' (fully factored) means each node has its own cluster, and 'exact' means there is 1 cluster that contains the whole slice. These can be created as follows:

engine = bk_inf_engine(bnet, 'ff', onodes);
engine = bk_inf_engine(bnet, 'exact', onodes);

The latter is equivalent to junction tree:

engine = jtree_dbn_inf_engine(bnet, onodes);

A new approximate inference algorithm is described in

The Factored Frontier Algorithm for Approximate Inference in DBNs, Kevin Murphy and Yair Weiss, submitted to NIPS 2000.

A very flaky implementation of this is available as

engine = loopy_dbn_inf_engine(bnet, onodes, max_iter, fast, momentum)

Offline smoothing

We can use dynamic inference engines in a similar way to static inference engines, except now the evidence is a 2D cell array of size ss*T, where ss is the number of nodes per slice and T is the number of slices. Also, 'marginal_nodes' takes two arguments, the nodes and the time-slice. For example, to compute P(X(i,t) | y(:,1:T)), we proceed as follows:

ev = sample_dbn(bnet, T);
evidence = cell(ss,T);
evidence(onodes,:) = ev(onodes, :); % all cells besides onodes are empty
[engine, ll] = enter_evidence(engine, evidence);
marg = marginal_nodes(engine, i, t);

Offline filtering

"Offline filtering" means computing P(X(i,t) | y(:,1:t)) for t=1:T with a single function call, which requires all of Y to be available. (This is useful for simulating an online algorithm.) This can be done as follows.

filter = 1;
[engine, ll] = enter_evidence(engine, evidence, filter);
marg = marginal_nodes(engine, i, t);

Online filtering

In online filtering, we maintain the "belief state" P(X(:,t) | y(:,1:t)), and update it sequentially as new evidence arrives. We initialise the belief state for slice 1 as follows:

[engine, ll(1)] = dbn_update_bel1(engine, evidence(:,1));

To update the belief state, we must specify the evidence for the current and previous time slices:

[engine, ll(t)] = dbn_update_bel(engine, evidence(:,t-1:t));

To extract marginals of the current belief state, we use

marg{i,t} = dbn_marginal_from_bel(engine, i);

See the file BNT/examples/dynamic/online1.m for a complete example.

Online prediction

We can predict the belief state K steps ahead, P(X(:,t+K)|y(:,1:t)), using

engine = dbn_predict_bel(engine, K);

We can then extract marginals from this prediction using dbn_marginal_from_bel as before. (Note that, for the HMM inference engine, this function is very easy to implement: BNT just raises the transition matrix to the K'th power and multiplies it by the current belief-state vector.)

As an example of prediction, consider a Coupled HMM with N hidden discrete chains and each hidden node having its own private scalar Gaussian observation node. The K-step ahead predictive density is a mixture of 2^N N-dimensional Gaussians (assuming each hidden node is binary). The marginal predictive density for a single node is a mixture of two 1-dimensional Gaussians. This can be "collapsed" to a single Gaussian using moment matching, which is the best approximation in the KL sense. The following function does this.

function [pred_obs_mean, pred_obs_std] = mk_predictions(K, bnet, engine, evidence)

[ss T] = size(evidence);
N = ss/2;
onodes = (1:N)+N;
hnodes = 1:N;

pred_obs_mean = zeros(N, T-K);
pred_obs_std = zeros(N, T-K);

ymu = cell(1,N);
ySigma = cell(1,N);
for i=1:N
  o = onodes(i);
  s = struct(bnet.CPD{o}); % violate object privacy
  ymu{i} = s.mean;
  ySigma{i} = s.cov;
end

engine = dbn_init_bel(engine);
for t=1:T-K
  % K steps of prediction from previous belief state
  temp = engine;
  engine = dbn_predict_bel(engine, K);
  for i=1:N
    m = dbn_marginal_from_bel(engine, i);
    [mu, Sigma] = collapse_mog(ymu{i}, ySigma{i}, m.T);
    pred_obs_mean(i,t) = mu;
    pred_obs_std(i,t) = sqrt(Sigma);
  end
  engine = temp;
  % update belief state
  if t>1
    engine = dbn_update_bel(engine, evidence(:,t-1:t));
  else
    engine = dbn_update_bel1(engine, evidence(:,t));
  end
end

We can plot these predictions, along with the naive prediction that predicts the future will be the same as the present, as follows:

function plot_predictions(pred_obs_mean, pred_obs_std, data, lag, i)

minspeed = min(data(:));
maxspeed = max(data(:));
[N T] = size(data);

hold on
h=plot(1:T, data(i,:), 'k-');
%set(h,'linewidth',2);
h=plot(lag+1:T, pred_obs_mean(i,:), 'g:');
errorbar(lag+1:T, pred_obs_mean(i,:), pred_obs_std(i,:), 'b--');
axis([0 T minspeed maxspeed])

% naive prediction
h=plot(lag+1:T, data(i,1:T-lag), 'r--');
set(gca,'fontsize',16);
hold off

Below we show an example taken from

Modeling Freeway Traffic with Coupled HMMs, Jaimyoung Kwon and Kevin Murphy, UCB TR, 2000

A summary of the inference engines for dynamic networks

Click on the name of the engine for more details, including references. Smooth=1 means the engine supports offline smoothing. Filter=1 means the engine supports offline filtering. Predict=1 means the engine supports online filtering and prediction.

A canonical DBN is one which can be converted to the graph structure of an HMM/LDS by creating "mega-nodes", i.e., the only intra-slice connections should be from the hidden nodes to the observed, and the only inter-slice connections should be between hidden nodes; furthermore, all observed nodes must be leaves.

Name Exact? Node type? topology evid. smooth filter predict

bk approx D,G,CG all any 1 1 1
bk_fast approx D all fixed 1 1 0
bk_ff_hmm approx D canonical DBN leaves 1 1 1
ff approx D canonical DBN leaves 0 1 0
frontier exact D,G,CG all any 1 1 0
frontier_fast exact D all fixed 1 1 0
hmm exact D canonical DBN leaves 1 1 1
jtree_dbn exact D,G,CG all any 1 1 0
jtree_fast_dbn exact D all fixed 1 0 0
jtree_unrolled_dbn exact D,G,CG all any 1 0 0
kalman exact G canonical DBN leaves 1 1 0
loopy_dbn approx D canonical DBN leaves 1 0 0

Learning

Learning in DBNs can also be done online or offline. Currently only offline learning is implemented in BNT.

Offline learning is very similar to learning in static networks, except now the training data is a cell array of 2D cell-arrays (this allows us to have variable-length cases). For example,

ncases = 2;
cases = cell(1, ncases);
for i=1:ncases
  ev = sample_dbn(bnet, T);
  cases{i} = cell(ss,T);
  cases{i}(onodes,:) = ev(onodes, :);
end
[engine2, LL] = learn_params_dbn(engine, cases, max_iter);

See the file BNT/examples/dynamic/water1.m for details.

Documentation last updated on 26 May 2000

Name	Exact?	Node type?	topology	evid.	smooth	filter	predict

bk	approx	D,G,CG	all	any	1	1	1
bk_fast	approx	D	all	fixed	1	1	0
bk_ff_hmm	approx	D	canonical DBN	leaves	1	1	1
ff	approx	D	canonical DBN	leaves	0	1	0
frontier	exact	D,G,CG	all	any	1	1	0
frontier_fast	exact	D	all	fixed	1	1	0
hmm	exact	D	canonical DBN	leaves	1	1	1
jtree_dbn	exact	D,G,CG	all	any	1	1	0
jtree_fast_dbn	exact	D	all	fixed	1	0	0
jtree_unrolled_dbn	exact	D,G,CG	all	any	1	0	0
kalman	exact	G	canonical DBN	leaves	1	1	0
loopy_dbn	approx	D	canonical DBN	leaves	1	0	0