CPSC 536A - Bioinformatics Course, Spring 2001
[General Info] ·
[Course Outline] ·
[Course Projects] ·
[Grading] ·
[Assignments & Quizzes] ·
[Resources]
General / Administrative Information
Course number / title: 536A (201):
Topics in Algorithms and Complexity - Bioinformatics
Catalogue number: 40449
Time: Tue 10:00-11:30, Thu 13:00-14:30
Room: FSC 1001 (Tue), CISR 304 (Thu)
Instructors: Anne Condon / Holger Hoos
[Current official information on graduate courses in
2000/01 term 2 can be found
here]
Course Description:
Bioinformatics involves the application of computational methods in
order to address problems in molecular biology. This course will
provide a graduate introduction to algorithms and their applications
in bioinformatics. Topics in molecular biology that will motivate the
algorithmic content of the course include: sequence alignment,
phylogenetic tree reconstruction, prediction of RNA and protein
structure, gene finding and sequence annotation, gene expression,
and biomolecular computing.
This is an interdisciplinary course, and the goal is to involve
students who have either a strong computer science background or a
strong background in molecular biology (such as students in the
genetics graduate program), but not necessarily both. It is understood
that students from these groups will have different skills and
experience, and course lectures and assignments will take this into
account. However, all students should already have a solid background
in computer programming and should be comfortable with mathematical
reasoning, such as can be obtained in a college level course in
Mathematics or Statistics. Background in discrete mathematics or in
probability theory is especially relevant to the course content.
Class assignments will familiarize students with biological data and tools
for understanding this data and will help students gain a solid
understanding of principles for design and analysis of algorithms. Some
assignments will be involve use and extension of software tools, and
others will involve written studies of algorithms and their analysis.
Class projects will bring together students with different backgrounds to
apply ideas from the course to a problem in molecular biology.
Course Outline (Draft)
Module 1: Introduction and Basics (hh; 1.5wk)
- basics of molecular biology
[e.g., BML:1] - 1wk
- foundations of molecular genetics: DNA, RNA, proteins;
transcription, translation, regulation; genome
organisation, sexual and asexual reproduction
- cellular organisation, metabolism, regulatory pathways
- tree of life, phylogeny
- modern biochemical techniques:
cloning, DNA and protein sequencing, genome mapping,
PCR, mutagenesis;
in-vivo / in-vitro / in-silico
- biological models and formalisms: exceptions are the rule
- problems in bioinformatics: overview
[BML:1.5, need more] - 0.5wk
- sequencing genomes:
physical mapping, genome structure
- interpreting genomic sequence data:
sequence alignment, gene finding, structure prediction
(rna and proteins), pattern discovery
- understanding the cell / organisms:
regulatory pathways and networks,
simulations
- relations between organisms and
evolutionary questions:
phylogenetic trees, computational models
of evolution, simulations
- biomolecular computing:
dna computing, inverse folding, self-assembling
structures
- bioinformatics: tools vs. synergistic research
Module 2: Sequence Alignment (hh+ac; 2-2.5wk)
- scoring
- global sa
-> dynamic programming
- local sa
- complex models (affine gap scores etc.)
-> fsa models
- heuristic methods
-> blast, fasta, etc.
- multiple sa
Resources: [BSA:2]
Module 3: Phylogenetic Trees (ac(2)+hh(1); 1.5wk)
- Intro and common algorithms (hh)
- formulation of the tree reconstruction problem [CRS]
- parsimony: a sequence-based algorithm [BSA, SO, Fe78]
- distance-based algorithms [Wa96, SO]
- Probabilistic models and associated algorithms (ac)
- probabilistic models of evolution;
- the maximum likelihood algorithm [BSA, CKR]
- experimental studies of phylogenetic tree algorithms [H92, Wa01]
- Biological perspectives [Wo98]; current research directions (ac)
- the universal tree of life
- properties of molecular sequence data used to construct trees
- fit between models of evolution and real evolution
- synopsis of other topics in phylogeny and current
research questions [Mo00...]
Resources:
[CKR] has notes from a lecture by Felsenstein, a poineer of maximum
likelihood methods for phylogeny. Many of the other web-based course
materials also include notes on phylogeny, with [CRS] being quite
comprehensive.
[Fe78] Felsenstein J. Inferring Phylogenies. (In print yet?)
[Fe78] Felsenstein J. Cases in which parsimony or compatibility methods
will be positively misleading, Syst. Zoology, 27:401-410.
[H92] Hillis, D., Bull, J.J., White, M.E., Badgett, M.R., and
Molineux, I.J.. Experimental phylogenies: generation of a known
phylogeny, Science 255 (1992), 589-592.
[H97] Hillis, D., Inferring Complex Phylogenies,
Nature, Vol. 383, pp 130-131.
[Mo00] Moret, B.E., Wang, L., Warnow, T., and Wyman, S.K.,
Highly Accurate Reconstruction of Phylogenies from Gene Order
Data, Manuscript.
[SO] Swofford D. L. and Olsen, G. J. Phylogeny Reconstruction. Chapter 11,
pp 411-501 in D.M. Hillis and C. Moritz, eds. Molecular Systematics,
Sinauer Associates, Sunderland, Massachisetts.
[Wa96] Warnow, T. Some Combinatorial Optimization Problems in
Phylogenetics. Proc. International Colloquium on Combinatorics and
Graph Theory, Balatonlelle, Hungary (1996), in Vol. 7 of Bolyai
Society Mathematical Studies, A. Gyarfas, L. Lovasz, and L.A. Szekely,
eds., Budapest, 363-413, 1999.
[Wa01] St. John, K., Warnow, T., Moret, B.M.E., and Vawter, L.
Performance Study of Phylogenetic Methods: (Unweighted) Quartet Methods
and Neighbor-Joining. To appear.
[Wo98] Woese, C. The Universal Ancestor. Proc. Natl. Acad. Sci. USA, Vol. 95,
pp 6854-6859, June 1998.
Module 4: RNA and Protein Structure (ac+hh; 2wk)
- RNA secondary structure prediction (ac)
- dynamic programming approaches [Z81, CKR]
- equilibrium partition function for secondary structure [McC90]
- Prediction of pseudoknots in RNA secondary structure (ac) [I00, L00, R99]
- protein structure (hh)
- basics: primary, secondary, tertiary, quartary structure;
forces that stabilise structure
- computational problems
- tertiary structure prediction: approaches, energy minimisation
with genetic algorithms [CBC,RRO]
- secondary structure prediction: simple statisitical methods,
chou-fasman rules, neural network approaches [BML,RRO]
Resources:
[CKR] Karp-Ruzzo lecture notes again.
[I00] Isambert, H. and Siggia, E.D. Modeling RNA folding paths with
pseudoknots: application to hepatitis delta virus ribozyme. Proc.
Nat. Acad. Sci. 97L12, pp 6515-6250, June 6, 2000.
[L00] Lyngso, R.B. and Pedersen, C.N.S., Pseudoknow Prediction in
Energy Based Models, RECOMB 2000.
[Ho94] I.L. Hofacker, W. Fontana, P.F. Stadler, S. Bonhoffer, M. Tacker, and
P.Schuster, ``Fast folding and Comparison of RNA Secondary
Structures.'' Monatshefte f. Chemie, Volume 125, 1994. pages 167-188.
See also http://www.tbi.univie.ac.at/\~ivo/RNA.
[McC90] L.S. McCaskill, ``The Equilibrium Partition Function and Base
Pair Binding Probabilities for RNA Secondary Structure,'' Biopolymers,
Vol 29, 1990, pages 1105-1119.
[R99] E. Rivas, S.Eddy, ``A dynamic Programming Algorithm for RNA
Structure Prediction including Pseudoknots,'' Journal of Molecular
Biology, vol 285. 1999. pages 2053-2068.
[Z81] M. Zuker and P.Stielger,''Optimal computer folding of large RNA
sequences usingthermodynamics and auxiliary information,'' Nucleic
Acids Res 9,(1981), pages 133-148.
See also http://www.ibc.wustl.edu/\~zuker/rna/.
Module 5: Gene Finding and Sequence Annotation (hh; 1-1.5wk)
- gene finding
- basic problem; why is it interesting? why is it hard?
- open reading frames ('the watson method')
- simple pattern search
- weight matrix methods (WAM, etc.)
- signal detection and integration
- HMM-based methods [BML:8.2]
- state-of-the-art gene finders (HMMGene, GenScen)
- evaluating gene finding algorithms;
probabilistic scores, benchmark collections
- combining gene finders
Module 6: Gene Expression Analysis (ac; 1wk)
- technologies for gene expression analysis (microarrays, SAGE)
- cancer diagnosis: algorithms for class prediction and class
discovery
Resources:
Shamir and Sharon,
Algorithmic Approaches to Clustering Gene Expression Data
(available at
http://www.math.tau.ac.il/~rshamir/papers.html - scroll to the end of that
page to find the paper).
Ben-Dor et al. Tissue Classification with Gene Expression Profiles,
Recomb 2000.
Golub et al. Molecular Classification of Cancer: Class Discovery
and Class Prediction by Gene Expression Monitoring. Science 286, 531-537,
1999.
M.J. van der Laan and J. Bryan.
Gene Expression Analysis with the Parametric Bootstrap,
Biostatistics, 2001
(available at
http://www.stat.berkeley.edu/users/jenny/pub.html).
S. Brenner et al.,
Gene expression analysis by massively parallel signature sequencing (MPSS)
on microbead arrays, Nature Biotechnology, Vol. 18, June, 2000.
Module 7: Biomolecular Computing (ac; 2wk)
- Adleman's experiment; "classical" models for DNA computing
- Self-assembly models for DNA computing; Winfree's work
- combinatorial and algorithmic problems arising in biomolecular
- computation: inverse RNA folding, theory of self-assembly and
resource bounded tiling
Resources:
Erik Winfree's web page:
see Erik's thesis and STOC 2000 paper for
more on self-assembly computation.
Slides from first lecture (powerpoint format)
Other Topics (not to be covered this time):
- physical mapping [CRS]
- genome structure / rearrangements [CRS]
- pattern discovery (motives etc.)
Algorithms / approaches to be covered:
- dynamic programming
- basic data structures (e.g. as arise in phylogeny tree reconstruction)
- heuristic search / gradient descent / stochastic search
(including simulated annealing, genetic/evolutionary algorithms)
- HMMs
- clustering, tree inference
- neural networks
- high-level understanding of what NP-completeness means
Course Projects
Projects in Progress
-
Phylogenetic Classification of Organisms Based on Sequence Tags (1)
Anrew Tae-Jun Kwon, Christina Chen, Michael Huggett, Steve Oldridge
-
Phylogenetic Classification of Organisms Based on Sequence Tags (2)
Cydney Nielsen, Sohrab Shah, Brian Winters
-
RNA Secondary Structure Prediction Including Pseudoknots
Jihong Ren, Fenguang Song, Hosen Yung, Xiaodong Zhou
-
Combinatorial Design of Universal DNA Tag Systems
Tim Braun, James Gauthier, Tam Huynh, Louie van de Lagemaat
-
Universal DNA Tag Generation
John Brodoff, Kai S. Juse, Dan C. Tulpan
-
snoRNA U3 Prediction in Archaea
Kevin Chen, Adrian Secord, Sonia Ziesche
If you haven't submitted the HTML version of your (revised) project proposal
yet, please do so ASAP via e-mail to
hoos@cs.ubc.ca, subject 'CPSC 536A Course
Projects'. The prefered method is for your project to setup a webpage which
contains the proposal, reports, and relevant links (see the snoRNA group's
page for an example).
General Information
Students are expected to select and complete a course project according to
the following timetable:
Students should work in groups of two or three, preferably combining different
background and expertise in each team.
The project proposals and reports will be reviewed and evaluatated according
to standard criteria for research proposals / research papers.
As we intend to combine the final reports into a proceedings volume for
the mini-workshop, they need to be formatted uniformly. Please follow
the layout and formatting of this
sample text as closely as possible. We recommend to use LaTeX
for preparing the final document (in which case you can use the
source of the sample text as a template for your report).
Final reports should be submitted as PostScript or PDF files via e-mail to
hoos@cs.ubc.ca.
At least one week before the workshop, the authors will receive
reviews for their reports along with instructions for preparing the
camera-ready copies for the mini-workshop proceedings
(which mainly consist of the page numbers to be used).
Grading
Final grades will be determined from the following components:
- surprise quizzes (on reading assignments etc.) - ca. 10%
- homework assignments (simple problems and questions; hands-on use of tools, literature research; approx. one per module) - ca. 25%
- course project (reports and presentation) - ca. 65%
Assignments & Quizzes
01/04 - due 01/09 |
Reading assignment: Douglas Hofstadter,
The Genetic Code: Arbitrary? (available from the Reading Room) |
01/11 - due 01/16 |
Reading assignment: Baldi and Brunak,
Bioinformatics - a machine learning approach, Chapter 1
(available from the Reading Room)
Remark: Sections 1.4.4 and parts of 1.4.5 are not essential. |
01/16 - due 01/23 |
Assignment 1 (covers Module 1)
|
01/18 - due 01/23 |
Reading assignment:
Durbin, Eddy, Krogh, Mitchison: Biological sequence analysis, Chapter 2,
Section 2.3 (available from the Reading Room) |
02/01 - due 02/08 |
Assignment 2 (covers Module 2)
|
02/09 |
Solutions to Quiz 1
|
02/01 - due 02/08 |
Assignment 3 (covers Module 3)
|
03/06 - due 03/13 |
Reading assignment:
1) Baldi and Brunak,
Bioinformatics - a machine learning approach, Sections 5.1, 6.2
(available from the Reading Room)
2) Schulze-Kremer, Genetic Algorithms and Protein Folding.
(available electronically at http://www.techfak.uni-bielefeld.de/bcd/Curric/ProtEn/proten.html
|
03/20 - due 03/27 |
Assignment 4 (covers Module 4)
|
03/22 - due 03/27 |
Reading assignment:
1) Shamir and Sharon,
Algorithmic Approaches to Clustering Gene Expression Data
(available from the Reading Room and also electronically at
http://www.math.tau.ac.il/~rshamir/papers.html - scroll to the end of that
page to find the paper.
|
Resources
Lecture Notes:
-
Class 1 (01/04) - Basics of Molecular Biology (1), notes by Hamish Carr
(
html version,
pdf version)
Detailed outline and list of resources by Holger Hoos
-
Class 2 (01/09) - Basics of Molecular Biology (2), notes by Sonia Ziesche
Detailed outline and list of resources by Holger Hoos
-
Class 3 (01/11) - Basics of Molecular Biology (3) / Intro to Bioinformatics,
notes by Kai S. Juse
Detailed outline and list of resources by Holger Hoos
-
Class 4 (01/16) - Intro to Sequence Similarity Testing / Pairwise Sequence Alignment (1), notes by Adrian Secord
-
Class 5 (01/18) - Pairwise Sequence Alignment (2),
notes by Tim Braun
-
Class 6 (01/23) - Pairwise Sequence Alignment (3) , notes by Steve Oldridge
-
Class 7 (01/25) - Multiple Sequence Alignment (1),
notes by Tam Huynh
-
Class 8 (01/30) - Multiple Sequence Alignment (2),
notes by Dan Tulpan
- Class 9 (02/01) - Intro to Phylogenetic Tree Reconstruction,
notes by Andrew Tae-Jun Kwon
-
Class 10 (02/06) - Phylogeny 2: Parsimony ,
notes by Louie van de Laagemat
-
Class 11 (02/08) - Phylogeny 3: Maximum Likelihood ,
notes by Brian Winters (postscript format)
-
Class 12 (02/13) - RNA Secondary Structure Prediction (1),
notes by Kevin Chen
-
Class 13 (02/15) - RNA Secondary Structure Prediction (2) ,
notes by Jihong Ren
- Class 14 (03/01) - Protein Structure (1),
notes by James Gauthier
- Class 15 (03/06) - Protein Structure (2),
notes by Xiaodong Zhou
- Class 16 (03/08) - Gene Finding (1),
notes by Louie van de Laagemat
- Class 17 (03/13) - Gene Finding (2),
notes by Sohrab Shah
- Class 18 (03/20) - Gene Expression Analysis (1)
notes by Fengguang Song
- Class 19 (03/22) - Gene Expression Analysis (2)
notes by Ho-Sen Yung
- Class 20 (03/27) - Intro to DNA Computation
notes by Dan Tulpan
- Class 22 (04/5) - Survey of Design of DNA molecules (postscript format)
Lecture notes are also available as hardcopies in the CISCR Reading Room
Books:
-
[ASTS] D.Gusfield: Algorithms on Strings, Trees, and Sequences: Computational Science and Computational Biology. Cambridge University Press, 1997.
-
[BSA] Durbin, Eddy, Krogh, Mitchison: Biological sequence analysis: Probabilistic models of proteins and nucleic acids. Cambridge University Press, 1998. (Available from CICSR Reading Room)
-- [BSA:12] gives a long list of online resources
-
[BML] P. Baldi, S. Brunak: Bioinformatics: the machine learning approach.
MIT Press, 1998. (Available from CICSR Reading Room)
-
[BIOCH] Garret and Grisham: Biochemistry. Saunders College Publishing, 1995
-- used parts of Chapters 1-7 for Module 1
-
[MBC] Alberts, Bray, Lewis, Raff, Roberts, Watson: Molecular Biology of the Cell.
Garland Publishing, Inc., 1994
-- used parts of Chapters 1-7 for Module 1
- Benjamin Lewin: Genes V. Oxford University Press, 1994.
-
[GA] Griffiths, Miller, Suzuki, Lewontin, Gelbart: An Introduction to Genetic Analysis.
Freeman and Company, 1993.
- Watson, Gilman, Witkowsi, Zoller: Recombinant DNA.
Scientific American Books, 1992.
- Gesteland, Cach, Atkins (Editors): The RNA World.
Cold Spring Harbor Laboratory Press, 1999.
-- used part of Chapter 24 (translation of selenocystein) for Module 1
- T.Brock, M.Madigan, J.Martinko, J.Parker: Biology of Microorganisms.
Prentice Hall, Seventh Edition, 1994
Papers / online articles:
- [DH86] Douglas R. Hofstadter: Metamagical Themas, Chapter 27:
The Genetic Code: Arbitrsry?
Bantam Books, 1986. (Available from CICSR Reading Room)
-- reading assignment in Module 1
- [RRO]
Robert B. Russell: A Guide to Structure Prediction
-- overview of protein structure prediction techniques,
lots of pointers to the literature
Courses / course materials on the web:
-
[CBC]
BioComputing Hypertext Coursebook
-- useful set of tutorial texts, cover several topics from our course
-
[CRS]
http://www.math.tau.ac.il/~shamir/algmb/algmb98.html
Ron Shamir's course on Algorithms in Molecular Biology, Tel Aviv University, CS, 1998
-- very useful, notes + assignments
-
[CNF]
http://www.cs.huji.ac.il/~cbio/
Nir Friedman's course on Computational Methods In Molecular Biology, University of Jerusalem, 1999
-- syllabus, exercises, pointers
-
[CBS]
http://theory.lcs.mit.edu/~mona/18.417-home.html
Bonnie Berger's and Mona Singh's course Introduction to Computational Molecular Biology, MIT, 1998
-- lecture notes, problems, syllabus
-
[CKR]
http://www.cs.washington.edu/education/courses/590bi/98w/
Karp's and Ruzzo's course Algorithms in Molecular Biology, U Wash, 1998
-
[CMG]
http://bioinfo.mbb.yale.edu/mbb447b-99/
Mark Gerstein's course BIOINFORMATICS, Yale, Molecular Biophysics and Biochemistry, 1999
-- see also
bioinfo.mbb.yale.edu/mbb452a/bioinformatics.htm,
has a good list of readings / interesting pointers
Resource lists on the web
Bioinformatics tools / software / databases
last update 01/04/10, hh