Specifications
File Formats
- Reads input as FASTQ
- Alignments output as SAM
General Needs
- Works in letter space with RNA-Seq data
- Works in colour space with SOLiD data
- Fast
- Small memory footprint
- Align against reference with dynamic splice graph
- Minimize the size of the driver (i.e. abstract functions into classes)
- Feedback loop of discovered SNPs back into reference
Read Specific Needs
- Works with 75bp reads, up to 128bp
- Rescue transcriptome reads with huge chunks of mismatched introns (e.g.
NNNIIIII
; I
= intron)
- Split transcriptome reads to align across an intron junction
- Handle residual introns within exome reads (e.g.
NNNIINNN
)
- Handle resitual introns flanking exome reads (i.e.
NNNNNNII
)
- Prevent losing SNPs at the end of reads from clipping (e.g.
NNNNNXN
; N
= base; X
= SNP)
Alignment Specific Needs
- User-specified method of handling multimapped reads (i.e. read mapping equally well to multiple positions)
- Supports extended CIGAR encoding (including P for Padding)
- Confirm if traceback ever needs to return multiple best alignments
Quality Specific Needs
- Alignment Score (from alignment)
- Mapping Quality (as defined by MAQ, using Mosaik's implementation)
- Uniqueness (out of 100%, divided by number best mapped positions)
- Fragment Quality (PE specific value, probability that mapped fragment (flanked by PE) location is correct/incorrect)
Build and Complication
- Have CMake detect CPU architecture and link appropriate libraries
- Build as "Release" in ccmake for full optimization
Miscellaneous Questions
- Do we ever have to assume that the reference is circular (i.e. how to handle negative and out of range positions)
- Do we ever have to support positive penalty values?
This topic: BETA
> TipsAndTricks >
WebHome >
NGSAlignerProject > NGSAlignerToDo
Topic revision: r5 - 2010-05-20 - jujubix