When there are loops in the BN, local propogation will not work, because of double counting evidence. In fact, it can be proved that local propogation is correct if and only if the graph is triangulated, i.e., there are no chordless cycles of length greater than 4. Intuitively, triangulation connects together those nodes that feature in a common term when summing out; in the water sprinkler example, we connect C to W and get the clique CWS, corresponding to the intermediate term T1(c,w,s) created when we summed out R. The order of summing terms out is equivalent to the elimination ordering used to triangulate the graph. Finding an ordering that minimizes the sum of the clique sizes (which determines the computational complexity) is NP-hard.
The maximal cliques in the triangulated graph are the clusters, which can be joined together to form a junction tree. This has the property that if x is a member of jtree nodes i and j, then x must be a member of every node on the path between i and j. This property (called the junction tree property) ensures that local propogation of information leads to global consistency.
The triangulation procedure is only defined for undirected graphs. It is not sufficient to simply ignore the direction of the arcs in the original BN, since directed and undirected graphs have different independence properties. In particular, parents who share a common child might not be independent in a directed graph (because of explaining away), but will be independent in an undirected graph unless the parents are connected (otherwise, the child would separate the parents). Hence we must first "moralize" the BN, i.e., connect together "unmarried" (non-connected) parents who share a common child, and then drop the directionality on the arcs. After moralization, we proceed with triangulation as before.
Once we have the (undirected) junction tree structure, we can either root it, thus converting it into a tree-structured BN, define CPDs for the new cluster nodes, and apply Pearl's tree algorithm, or we can leave it as an undirected tree, define potential functions for the new cluster nodes, and apply a local message algorithm specifically designed for undirected graphs. The latter approach is what is usually meant when people talk of the junction tree algorithm. The undirected formulation is slightly simpler because it is symmetric.
For more details, see