> > | 09/01/10
Finished removing SAM generation from all mappers, and made SAM generation an optional post process:
- As mentioned yesterday, all mappers now return positions
- To generate SAM entries, the following is done:
- Using the position, a subsequence from the refernece is extracted
- The length of this subsequence is typically slightly longer than the read itself, to allow for indels, if present
- An alignment is conducted, and the aligned read and refernece (with dashes) is returned
- The dashed alignments are used to generate SAM and MD
- Some major problems:
- LocalAligner has a tendency to lop off mismatches at the end of reads to generate a higher score
- This is highly evident when aligning with
mismatch_test.fq
- Setting the mismatch penalty to 0 can prevent the removal of mismatches at one end, but not both...
- When this is done, the CIGAR breaks... by inserting a lot of S (for softly clipped, which are "free bases" in the alignment)
- The indel preference of the SW algorithm is more similar to BWA, as described on 08/06/10, and thus, most of the solutions provided in
indel_test.fq are incorrect...
- Fixing these requires some changes to all aligners, and also the CIGAR and MD generation functions inside
SamEntry
Other changes to mapper:
-
LFPair and LFAll are used in all possible places
- bowtie and quality mappers work completely in numberspace
- bwa still works in letter space... and converts when required
And assuming that last infinite loop in getPosition isn't a mapper problem, I guess that's it  |