(a) What is the predicted secondary structure of the RNA sequence GGCCAAGGCC? (Note: we use the convention that the left end of this string is the 5' end and the right end is the 3' end, as does Zuker's program.) [2 marks]
(b) Can you find a sequence that folds into the secondary structure described by ((((*((((***))))*((((***))))*((((***))))*))))? (Note that, using set notation, this structure, which is for a string of length 45, is {(1,45), (2,44), (3,43), (4,42), (6,16), (7,15), (8,14), (9,13), (18, 28), (19,,27), (20, 26), (21, 25), (30, 40), (31, 39), (32,38), (33,37)}.) [6 marks]
(a) Design a simple, three layer feed-forward neural network with two binary input units A and B and a binary output unit C such that C=1 if A=1 and B=0 or A=0 and B=1 (logic XOR). Use as few hidden units as possible. Specify the network structure, connection weights, and transfer functions for all units. [3 marks]
(b) When using a simple multi-layer perceptron (MLP) for secondary structure prediction, how is the input sequence presented to the network? Illustrate your answer with a simple example. [2 marks]
(c) Explain how secondary structure prediction can benefit from
using evolutionary information based on your knowledge of the
PHD approach (max 200 words). [3 marks]
Hint: You might find the following paper useful as an additional
reference: Burkhard Rost & Chris Sander,"3rd Generation Prediction Of
Secondary Structure",
http://www.columbia.edu/~rost/Papers/1999_humana/paper.html
BONUS QUESTION:
Is it possible to construct a three-layer feed-forward
neural network that computes the XOR function as specified in 3a
using only units with linear transfer functions?
Justify your answer!
(Might require further literature research!)
(a) Retrieve the protein sequence for the plant seed protein Crambin
(1CRN) from the Protein Data Bank (PDB) at www.pdb.org and
report the primary sequence. [1 mark]
Hint: Use the search tool from the PDB frontpage and specify the
PDB ID `1CRN'. Retrieve 'Sequence Details' and download sequence
in FASTA format.
(b) Check out the secondary structure annotation for the Crambin
PDB entry (1CRN) and annotate the primary sequence from part (a)
using the letters 'H' for alpha-helix and 'E' for beta-sheet.
[3 marks]
Attention: This is not the secondary structure as listed
under 'Sequence Details'!
Hint: Use 'Download/Display File' and select 'Display the Structure File',
PDB / HTML format; click on 'HELIX' / 'SHEET' for explanation of annotations.
(c) Use the NNPREDICT program (based on a feed-forward neural network) available at http://www.cmpharm.ucsf.edu/~nomi/nnpredict.html to predict the secondary structure of Crambin. Use the 'all alpha' and 'none' tertiary structure classes and compare the results to the secondary structure information from the PDB entry. [3 marks]
(d) Submit the Crambin sequence to the PredictProtein server at http://www.embl-heidelberg.de/predictprotein/predictprotein.html to perform a PHD secondary structure prediction. Compare the results to the PDB annotated secondary structure and to the NNPREDICT predictions from part (c) and discuss the differences you observe. [3 marks]
Warning: Part (d) of this exercise requires waiting for an automated e-mail response from the Predict Protein Server. Although this can be very fast, to be safe you should allow at least 24h for processing.
BONUS QUESTION: Perform the same analysis for the oxygen binding protein Myohemerythrin (PDB ID `2MHR'). What do you observe?
(a) Describe the difference between the MUTATE and the VARIATE operators in the evolutionary algorithm, and how the application of these operators changes over a run of the algorithm. Explain the motivation for the mechanism as proposed in Schulze-Kremer, Genetic Algorithms and Protein Folding, Section 1.2.1.4. [3 marks]
(b) Explain how a vector fitness function can be used to combine different fitness criteria. Furthermore, explain how in this case, generation replacement is different from the elitist replacement used for simple energy minimisation. (Your answer should be based on the assigned reading and not exceed 200 words.) [2 marks]
(c) How can information on secondary structure (e.g., from a secondary structure prediction algorithm) be used for improving an evolutionary algorithm for tertiary structure prediction based on the energy minimisation approach? [2 marks]