1. Motivation:
  · RNA plays regulatory, catalytic and structural role in cells 
  · Used in phylogenetic tree reconstruction
2. To understand 
  structure/function of the RNA molecule, we have two goals :
  (1) To align RNA sequence
  (2) To determine secondary structure of RNA sequence.
  · Primary structure : base sequence ( eg. 5'-AUCGUAA......CGU-3')
  · Secondary structure : the "base pair" (eg. C-G , A-U ) structure 
  that largely determine the 3D (tertiary) structure of the molecule.
  · Example :
  Given RNA : 5' - AUCCAAAGGAU - 3'
  Denoted by : 5' - S1S2S3.................Sn - 3'
  
  The secondary structure is a set S of base pair ( i , j ) , 1 < i < j 
  < n
  If given :
   
 
  then we know S = { (1,11) , (2,10) , (3,9) , (4,5) }
if given structure 
  as follows :
   
 
  Then S = { (4,8) , (7,11) } 
  Note : In this case, S = { (i,j), (i',j') }. Notice that i < i' < j 
  < j' . This is called a "pseudoknot" 
3. Goals 
  (1) and (2) are interrelated :
  · Base pairing interaction in an RNA molecule causes long-range dependencies 
  between nucleotides in the molecule. This implies that additive scoring system 
  used in pair-wise alignment doesn't work well for RNA
  · However, if we do have information on the secondary structure then 
  we have information of where the base pairs occur and this will help us do the 
  alignment of the RNA sequences.
  · That is, alignment methods that take into account the secondary structure 
  are preferred.
  · Conversely, RNA sequence alignment can be used to help determine RNA 
  secondary structure. This is called " Comparative Analysis" 
  .

  
  
 
  · Comparative analysis : ( refer to Durbin's 
  BSA Chap 10 )
  · Given several closely related sequences, iterate the following 2 steps 
  :
  1. Align the sequences ( based on new guess at the structure)
  2. Guessing what are the base pairs in the structure (based on the current best 
  guess of the alignment).
  · To accomplish step 2, we need to analyse the "Mutual Information" 
  called Mij between 2 aligned columns i and j .
  · Definition of Mij : 
  
   
  
  · How to calculate f xixj (frequency of pair ij in column i and j ) ?
  
  For example :
  Given
  
  -- i ------ j --
  --A------U--
  --A------C--
  --A------U--
  --C------G--
FAU = 2/4 
  FCG = 1/4 
  FCA = 0/4 ( order matters )
  
  · Intuitively, Mij is the amount information ( in bits) revealed about 
  the position j
  if you are told what's in position i .
· The following 
  example shows a "completely correlated" base pairs :
  i - j
  A-U
  U-A
  C-G
  G-C
fAU = fUA = fCG = fGC = 1/4
Mij = 4 * ( 1/4 * log2 * (1/4)/ ( (1/4)(1/4) ) ) + 12 * (0 ) = 2
( note : there 
  are total of 16 different (i,j) for i, j = { A,U,G,C } , only 4 base pairs present 
  in this example, and the rest 12 get score of zero )
  
  
RNA secondary structure prediction ( for single strand RNA sequence )
· Use measure 
  of stability of secondary structure associated with given RNA strand, that is, 
  the predicted free energy of the structure.
  · Loops tend to be de-stabilizing", and it contributes to +ve free 
  energy.
  · Assuming no pseudoknot, free energy is the sum of free energies of 
  individual loops and stacked pairs.
  · This turns our problem into "finding the secondary structure with 
  minimum free energy, taken over all secondary structures for the input molecule"
  · For details of the 4 free energy functions, please see notes from Lecture 
  16 of the Computational Biology course at U of W :
  http://www.cs.washington.edu/education/courses/527/00wi/
  
  Here's a brief summary of what Anne mentioned in the class :
  
  1. eS(i,j) : free energy of stacked loop , depending on Si, Sj, Si+1 , Sj-1 
  .
  2. eH(i,j) : free energy of hairpin loop closed by (i, j) , depending on Si, 
  Sj, 
  j-i (length), Si+1 and Sj-1.
  3. eL(i,j,i',j'): free energy of internal loop , depending on Si, Sj, Si', Sj' 
  , Si+1, Sj-1
  Si'-1, Sj'+1, i-i' and j-j' .
  4. eM( i , j , .......... ik , jk ) : free energy of multibranched loop closed 
  by (i,j). This one is not well understood.
  
  
  · Dynamic programming approach of finding optimal secondary 
  structure was briefly mentioned in the class. Anne suggested to see the details 
  in the U of W lecture notes 
  http://www.cs.washington.edu/education/courses/527/00wi/(lecture 16) 
Here's what Anne 
  mentioned in the class :
  
  Let W(j) be the free energy of the optimal secondary structure associated with 
  
  S1S2.........................Sj
  There's 2 possibility for Sj
  (1) If Sj is not paired, W(j) = W(j-1), since unpaired base does NOT contribute 
  to overall free energy.
  (2) If Sj is paired in optimal secondary structure, say to Si where i<j , 
  then W(j) = W(i-1) + V(i,j ) , where V(i,j) is the free energy of the optimal 
  structure of Si...Sj, assuming i,j forms a base pair in the structure.
  
  
· 
  Reference for this lecture :
  (1) Durbin's Biological Sequence Analysis, Chap 10.
  (2) Lecture 16 notes from U of W CSE 527 class : http://www.cs.washington.edu/education/courses/527/00wi/