MITACS Project Website - Project Highlights

We have developed AMR system that fully automatically decides the structure of a protein from NMR experiments. Initially experiments of AMR were performed on 4 small proteins with success.

Methods for reliable synthesis of long genes offer great promise for protein synthesis via expression of synthetic genes, with applications to improved analysis of protein structure and function, as well as engineering of novel proteins. Current technologies for gene synthesis use computational methods for design of short oligos, which can then be reliably synthesized and assembled into the desired target gene. We have developed efficient algorithms for special cases of this problem, and have shown that the general problem is NP-hard.

The accuracy of secondary structure predictions made by free energy minimization is limited by the quality of the energy parameters in the underlying free energy model. The most widely used model, the Turner99 model, has hundreds of parameters, and so a robust parameter estimation scheme should efficiently handle large data sets with thousands of structures. Moreover, the estimation scheme should also be trained using available experimental free energy data in addition to structural data. We have developed a new constraint generation (CG) method, the first computational approach to RNA free energy parameter estimation that can be efficiently trained on large sets of structural as well as thermodynamic data. Our CG approach employs a novel iterative scheme, whereby the energy values are first computed as the solution to a constrained optimization problem. Then the newly computed energy parameters are used to update the constraints on the optimization function, so as to better optimize the energy parameters in the next iteration. Using our method on biologically sound data, we obtain revised parameters for the Turner99 energy model. We show that by using our new parameters, we obtain significant improvements in prediction accuracy over current state of-the-art methods.

Improving the accuracy and efficiency of prediction methods is an ongoing challenge, particulary for pseudoknotted secondary structures, in which base pairs overlap. State-of-the-art methods, which are based on free energy minimization, have high run-time complexity (typically Theta(n^5) or worse), and can handle (minimize over) only limited types of pseudoknotted structures. We have developed Hfold, a new approach for prediction of pseudoknotted structures, motivated by the hypothesis that RNA structures fold hierarchically, with pseudoknot free (non-overlapping) base pairs forming first, and pseudoknots forming later so as to minimize energy relative to the folded pseudoknot free structure. Our H-fold algorithm uses two-phase energy minimization to predict hierarchically-formed secondary structures in O(n^3) time, matching the complexity of the best algorithms for pseudoknot free secondary structure prediction via enery minimization.