Protein Structure (2)

Module 4: Protein Structure (2)

Idea:

Given energy function (force field) , we want to use “genetic algorithm” to find global minimal.

Generic Algorithm:

Optimization:

To minimize over space of

Look at population of such values
Use genetic operators: mutation, cross-over, to generate offspring
Use f to evaluate fitness of all individuals
Select individuals that constitute next generation (survivors)

Can choose:

Mutation rate , how to mutate, cross-over
Initial population
Selection (usually population size = number of individuals is kept constant, can use deterministic or probabilistic choice mechanisms)

Important points:

Historically, GA is defined to operate on binary data.

Thus, are bit vectors and need to have good encoding scheme.

Evolutionary algorithm: like GA, but works directly on non-binary data.

Outline application of EA to tertiary structure prediction.

Initialization: starting population is randomly chosen (could also be based on statistics from protein database, e.g. PDB)
Evaluate initial population
Generate new individuals

Mutate: replace a torsion angle to a randomly selected value (same as above)
Variation: increment/decrement torsion angle by
Crossover:

Two-point crossover (helps to keeps changes in structure reasonably local)
Uniform crossover (50%)

Generation replacement(selection):elitist

Results (how good is this?)

Crambin:

46 residues
structure is difficult to predict
high resolution structure known(1.5)

Applying EA (1000 generations, 10 individuals) gives very bad results (structures found are quite different from native Crambin).

Reason:

The energy model is not good enough. It turns out that the energy of the result structure is much lower than the energy of the native structure, according to the energy model.

Note: The approach is much more successful for side-chain packing

Note: The EA can generally can be improved by using more sophisticated (problem specific) search operators (here: "local twist"). Protein Secondary Structure Prediction

Algorithm approaches:

simple statistic methods

Based on probability of accounting certain AAs as in given secondary structure(estimate from PDB)

This method gives only about 50% prediction accuracy

Chou-Fasman method ( a better one)

Don’t look at single AAs, but look at context(a window of AAs).
Normalize frequency counts by the frequency of the AA in a family or database of proteins.
Based on normalized frequency counts use rules that predict structure elements based on local contexts.

e.g. Predict as a -helix :segment of 6 residues

E[Pa ]>1.03

E[Pa ]>E[Pb ]

Not includes Proline.

Accuracy ? 63%

Neural Network approaches

Neural network is a nature inspired method

Neural networks are typically organized in layers. Layers are made up of a number of interconnected 'nodes', which contain an 'activation function'. Patterns are presented to the network via the 'input layer', which communicates to one or more 'hidden layers' where the actual processing is done via a system of weighted 'connections'. The hidden layers then link to an 'output layer' where the answer is output as shown in the graph above.

Parameters learned from data which are correct 2^nd structure.