Difference: NGSAlignerDiscussion (1 vs. 5)

Revision 52010-05-28 - jayzhang

Line: 1 to 1

META TOPICPARENT	name="NGSAlignerProject"

Concerning Multimaps

Line: 66 to 66

-- Main.jujubix - 26 May 2010

Added:

>
>

Here's my take on the current UML.

Some points to note:

The Mappers are now specific to their index.
Construction of an aligner is still the same. You just have to be careful with which Mapper you choose to use with which Index.

-- Main.jayzhang - 28 May 2010

META FILEATTACHMENT	attr="h" comment="" date="1274935288" name="Drawing2.jpg" path="Drawing2.jpg" size="69712" user="jujubix" version="1.1"

Added:

>
>

META FILEATTACHMENT	attr="h" comment="" date="1275072194" name="uml.png" path="uml.png" size="37492" user="jayzhang" version="1.1"

Revision 42010-05-27 - jujubix

Line: 1 to 1

META TOPICPARENT	name="NGSAlignerProject"

Concerning Multimaps

Line: 38 to 38

-- Main.jujubix - 26 May 2010

Changed:

<
<

>
>

The main core of any tool is a Driver
A typical driver requires:

Line: 46 to 46

- SamFile to write output
- Mapper for mapping reads to the reference
  - Contains an Index holding the reference

Changed:

<
<

- - - The Index requires the refernece from a FastaFile (not shown)

>
>

- - - The Index requires the refernece from a FastaFile

- - May optionally use an Aligner
- MatchMaker for post-processing pair-end reads, if applicable
  - Requires access to an Index and an Aligner

Line: 66 to 66

-- Main.jujubix - 26 May 2010

Changed:

<
<

META FILEATTACHMENT	attr="" comment="" date="1274919547" name="Drawing1.jpg" path="Drawing1.jpg" size="61356" user="jujubix" version="1.1"

>
>

META FILEATTACHMENT	attr="h" comment="" date="1274935288" name="Drawing2.jpg" path="Drawing2.jpg" size="69712" user="jujubix" version="1.1"

Revision 32010-05-27 - jujubix

Line: 1 to 1

META TOPICPARENT	name="NGSAlignerProject"

Concerning Multimaps

Line: 38 to 38

-- Main.jujubix - 26 May 2010

Added:

>
>

The main core of any tool is a Driver
A typical driver requires:
- FastqFile to read input
- SamFile to write output
- Mapper for mapping reads to the reference
  - Contains an Index holding the reference
    - The Index requires the refernece from a FastaFile (not shown)
  - May optionally use an Aligner
- MatchMaker for post-processing pair-end reads, if applicable
  - Requires access to an Index and an Aligner

The typical way to build an aligner executable is as follows:
1. FastaFile to read refernece
2. create an Index by loading or building a new one
3. create an Aligner with user-set penalty values
4. create a Mapper using the above two objects
5. create a MatchMaker only if dealing with pair end (PE) reads
6. create a FastqFile (or two in PE), using files with reads
7. create a SamFile
8. create a Driver using the above three (or four, for PE)
9. Run() the Driver

I don't see anything horribly wrong with it... as it's built various usable aligners, but I don't see anything wonderful about it either, being built without much planning.

-- Main.jujubix - 26 May 2010

META FILEATTACHMENT	attr="" comment="" date="1274919547" name="Drawing1.jpg" path="Drawing1.jpg" size="61356" user="jujubix" version="1.1"

Revision 22010-05-26 - jujubix

Line: 1 to 1

META TOPICPARENT	name="NGSAlignerProject"

Concerning Multimaps

Line: 10 to 10

-- Main.jujubix - 21 May 2010

Added:

>
>

Concerning the Class Hierarchy

As the library starts to take shape, we have to decide upon a class hierarchy which project will be built upon. I imagine that changing the hierarchy down the road will be difficult, so in hopes or avoiding that, let's commit ourselves to a single hierarchy.

Some history about the existing hierarchy directories:

Originally, there was only IO, alignment, and index
- IO would read in the reference and reads
- The index (Kmer) would return positions in the reference that matched the first k bases of a read
- The aligner would align the entire to the reference at the specified position
Then then index was swapped... aligner was completely replaced when searching for exact reads
- The index would "locate" the position in the reference where the entire read was found
Inexact reads were supported, leading to the need for Mapper classes
- Would "map" reads to the reference, but allowed some form of variation (e.g. mismatches, gaps, etc...)
- Some required aligner classes, bringing back the need for them
To reduce the code seen in /tools/, Drivers were created
- Essentially, took in a mapper, input and output classes, and ran through every read in the given file
Pairend classes were introduced to handle the post-processing to make reads paired...
- These were fed into some specific Drivers, and works independently from index and mappers

As you can see, the entire hierarchy wasn't carefully planned, and rather extended when the need arose... so I wouldn't be surprised if there was room for improvement... or a completely restructuring.

Some personal concerns:

Some classes in IO are actually Types... this could be pulled out
The creation of every Mapper class requires the addition of a new Locate functions in the Index class
- Should the index simply be a container? And the mapper classes take care of the actually "locating", using the index?

-- Main.jujubix - 26 May 2010

Revision 12010-05-21 - jujubix

Line: 1 to 1

Added:

>
>

META TOPICPARENT	name="NGSAlignerProject"

Concerning Multimaps

I sort of came to my conclusion about this already, but I'll spam it here anyway for posterity.

When faced when multimaps, there are three modes of resolution: randomly select 1, report all, or report none.

Currently, it seems that by default I find all possible mappings, and only during the output phase do I filter to one of the above three (in reality... the latter 2) cases. This isn't very computationally efficient, so I suspect we'll have to adapt something like a report variable found in readaligner.

-- Main.jujubix - 21 May 2010

View topic | History: r5 < r4 < r3 < r2 | More topic actions...