NGSAlignerToDo < BETA < TWiki

Specifications

File Formats

Reads input as FASTQ
Alignments output as SAM

General Needs

Works in letter space with RNA-Seq data
Works in colour space with SOLiD data
Fast
Small memory footprint
Align against reference with dynamic splice graph
Minimize the size of the driver (i.e. abstract functions into classes)
Feedback loop of discovered SNPs back into reference

Read Specific Needs

Works with 75bp reads, up to 128bp
Rescue transcriptome reads with huge chunks of mismatched introns (e.g. NNNIIIII; I = intron)
Split transcriptome reads to align across an intron junction
Handle residual introns within exome reads (e.g. NNNIINNN)
Handle resitual introns flanking exome reads (i.e. NNNNNNII)
Prevent losing SNPs at the end of reads from clipping (e.g. NNNNNXN; N = base; X = SNP)

Alignment Specific Needs

User-specified method of handling multimapped reads (i.e. read mapping equally well to multiple positions)
Supports extended CIGAR encoding (including P for Padding)
Confirm if traceback ever needs to return multiple best alignments

Quality Specific Needs

Alignment Score (from alignment)
Mapping Quality (as defined by MAQ, using Mosaik's implementation)
Uniqueness (out of 100%, divided by number best mapped positions)
Fragment Quality (PE specific value, probability that mapped fragment (flanked by PE) location is correct/incorrect)

Build and Complication

Have CMake detect CPU architecture and link appropriate libraries
Build as "Release" in ccmake for full optimization

Miscellaneous Questions

Do we ever have to assume that the reference is circular (i.e. how to handle negative and out of range positions)
Do we ever have to support positive penalty values?

This topic: BETA > TipsAndTricks > WebHome > NGSAlignerProject > NGSAlignerToDo
Topic revision: r5 - 2010-05-20 - jujubix

Copyright © 2008-2025 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback