load X.mat [nRows,nCols] = size(X); nNodes = nRows*nCols; nStates = 2; y = int32(1+X); y = reshape(y,[1 1 nNodes]);In this case, the features will correspond to a noise-corrupted version of the image:
X = X + randn(size(X))/2; X = reshape(X,1,1,nNodes);We now make the edgeStruct giving the lattice-structured relationship between the labels:
adj = sparse(nNodes,nNodes); % Add Down Edges ind = 1:nNodes; exclude = sub2ind([nRows nCols],repmat(nRows,[1 nCols]),1:nCols); % No Down edge for last row ind = setdiff(ind,exclude); adj(sub2ind([nNodes nNodes],ind,ind+1)) = 1; % Add Right Edges ind = 1:nNodes; exclude = sub2ind([nRows nCols],1:nRows,repmat(nCols,[1 nRows])); % No right edge for last column ind = setdiff(ind,exclude); adj(sub2ind([nNodes nNodes],ind,ind+nRows)) = 1; % Add Up/Left Edges adj = adj+adj'; edgeStruct = UGM_makeEdgeStruct(adj,nStates); nEdges = edgeStruct.nEdges;We now want to make the features.
% Add bias and Standardize Columns tied = 1; Xnode = [ones(1,1,nNodes) UGM_standardizeCols(X,tied)]; nNodeFeatures = size(Xnode,2); nodeMap = zeros(nNodes,nStates,nNodeFeatures,'int32'); for f = 1:nNodeFeatures nodeMap(:,1,f) = f; endUnlike the previous demo, each node has different features. This is because each node will have features that depend on local properties of the image. This means that the features are different at different locations.
Note that unlike the previous demo where we only considered binary features, we now have continuous features. With continuous features, it often makes sense to do some sort of normalization of the feature values. For example, a common approach is to subtract off the mean of each feature, and normalize the values so that they have a standard deviation of 1. The function UGM_standardizeCols does this, transforming the features so that they have a mean of 0 and standard deviation of 1. Setting tied to 1 indicates that the features are to be standardized across nodes, as opposed to within nodes. Note that this standardization is optional, and in some applications it may not make sense to do it. Also, if you want to use a new data set you will need to use the same normalization used when training (and that the mean/variance of the new data may be different than the training data).
For this problem, we will use Ising-like edge potentials. For making the edge potentials, the bias should obviously be considered as a shared feature. However, the intensities of the two nodes on the edge are not a shared feature. So, we make our Xedge and edgeMap as follows:
% Make Xedge
sharedFeatures = [1 0];
Xedge = UGM_makeEdgeFeatures(Xnode,edgeStruct.edgeEnds,sharedFeatures);
nEdgeFeatures = size(Xedge,2);
% Make edgeMap
f = max(nodeMap(:));
edgeMap = zeros(nStates,nStates,nEdges,nEdgeFeatures,'int32');
for edgeFeat = 1:nEdgeFeatures
edgeMap(1,1,:,edgeFeat) = f+edgeFeat;
edgeMap(2,2,:,edgeFeat) = f+edgeFeat;
end
Since one of the two node features is not shared, each edge will have a total of 3 features: the bias, the intensity of the first node, and the intensity of the second.
nParams = max([nodeMap(:);edgeMap(:)]); w = zeros(nParams,1); funObj = @(w)UGM_CRF_PseudoNLL(w,Xnode,Xedge,y,nodeMap,edgeMap,edgeStruct); w = minFunc(funObj,w);We can analogously train an MRF with the pseudo-likelihood by making Xnode and Xedge consist of a single bias feature. Below, we plot the results obtained using ICM decoding with random parameters, and then the results obtained using ICM decoding with the parameters estimated under the pseudo-likelihood approximation.
In this context we want to pick features that are likely to have an associative effect. In other words, the feature should be large if the nodes are likely to have the same value. The function UGM_makeEdgeFeaturesInvAbsDif implements one possible way to make edge features that might have an associative effect. This function assumes that shared features are non-negative, and for non-shared features it uses the reciprocal of 1 plus the absolute difference between the node features. It can be called as follows:
sharedFeatures = [1 0];
Xedge = UGM_makeEdgeFeaturesInvAbsDif(Xnode,edgeStruct.edgeEnds,sharedFeatures);
nEdgeFeatures = size(Xedge,2);
Since we have changed the edge features we must also change the edge map:
f = max(nodeMap(:));
edgeMap = zeros(nStates,nStates,nEdges,nEdgeFeatures,'int32');
for edgeFeat = 1:nEdgeFeatures
edgeMap(1,1,:,edgeFeat) = f+edgeFeat;
edgeMap(2,2,:,edgeFeat) = f+edgeFeat;
end
To enforce that the edge weights are non-negative, we can use a bound-constrained optimization code. In particular, to minimize the pseudo-likelihood subject to non-negativity of the edge weights, we will use minConf_TMP from the minConf package. This code requires specifying two vectors, LB and UB, containing the lower and upper bounds on the values of the variables. In our case, the node weights are unbounded and the edge weights are bounded by below from 0, so we can learn an associative CRF using the pseudo-likelihood approximation as follows:
nParams = max([nodeMap(:);edgeMap(:)]);
w = zeros(nParams,1);
funObj = @(w)UGM_CRF_PseudoNLL(w,Xnode,Xedge,y,nodeMap,edgeMap,edgeStruct); % Make objective with new Xedge/edgeMap
UB = [inf;inf;inf;inf]; % No upper bound on parameters
LB = [-inf;-inf;0;0]; % No lower bound on node parameters, edge parameters must be non-negative
w = minConf_TMP(funObj,w,LB,UB);
Since the estimated parameters are guaranteed to yield associative edge potentials, we can use graph cuts to find the optimal decoding in the model. Below, we plot the optimal decoding:
We proposed trainig CRFs with non-negative edge features and bound-constrained optimization so that decoding is guaranteed to be possible with graph cuts in:
w = zeros(nParams,1); funObj = @(w)UGM_CRF_NLL(w,Xnode,Xedge,y,nodeMap,edgeMap,edgeStruct,@UGM_Infer_LBP); w = minConf_TMP(funObj,w,LB,UB)Below, we show the optimal decoding of the model trained with the loopy belief propagation approximate marginals (and approximate log-partition function):
Also note that we could have replaced loopy belief propagation with a different variational inference method. In particular, tree-reweighted belief propagation gives a convex upper bound on the negative log-likelihood function, so it leads to an optimal upper bound on the exact negative log-likelihood. In contrast, pseudo-likelihood is convex but isn't an upper bound, while loopy belief propagation isn't convex and doesn't give an upper bound.
It is also possible to enforce the associative condition with more complicated parameterizations of the edge potentials. In particular, we can enforce that the diagonal elements are positive and the non-diagonal elements are negative.
PREVIOUS DEMO NEXT DEMO