Tags:
create new tag
view all tags

Concerning Multimaps

I sort of came to my conclusion about this already, but I'll spam it here anyway for posterity.

When faced when multimaps, there are three modes of resolution: randomly select 1, report all, or report none.

Currently, it seems that by default I find all possible mappings, and only during the output phase do I filter to one of the above three (in reality... the latter 2) cases. This isn't very computationally efficient, so I suspect we'll have to adapt something like a report variable found in readaligner.

-- Main.jujubix - 21 May 2010

Concerning the Class Hierarchy

As the library starts to take shape, we have to decide upon a class hierarchy which project will be built upon. I imagine that changing the hierarchy down the road will be difficult, so in hopes or avoiding that, let's commit ourselves to a single hierarchy.

Some history about the existing hierarchy directories:

  • Originally, there was only IO, alignment, and index
    • IO would read in the reference and reads
    • The index (Kmer) would return positions in the reference that matched the first k bases of a read
    • The aligner would align the entire to the reference at the specified position
  • Then then index was swapped... aligner was completely replaced when searching for exact reads
    • The index would "locate" the position in the reference where the entire read was found
  • Inexact reads were supported, leading to the need for Mapper classes
    • Would "map" reads to the reference, but allowed some form of variation (e.g. mismatches, gaps, etc...)
    • Some required aligner classes, bringing back the need for them
  • To reduce the code seen in /tools/, Drivers were created
    • Essentially, took in a mapper, input and output classes, and ran through every read in the given file
  • Pairend classes were introduced to handle the post-processing to make reads paired...
    • These were fed into some specific Drivers, and works independently from index and mappers

As you can see, the entire hierarchy wasn't carefully planned, and rather extended when the need arose... so I wouldn't be surprised if there was room for improvement... or a completely restructuring.

Some personal concerns:

  • Some classes in IO are actually Types... this could be pulled out
  • The creation of every Mapper class requires the addition of a new Locate functions in the Index class
    • Should the index simply be a container? And the mapper classes take care of the actually "locating", using the index?

-- Main.jujubix - 26 May 2010

Drawing2.jpg

  • The main core of any tool is a Driver
  • A typical driver requires:
    • FastqFile to read input
    • SamFile to write output
    • Mapper for mapping reads to the reference
      • Contains an Index holding the reference
        • The Index requires the refernece from a FastaFile
      • May optionally use an Aligner
    • MatchMaker for post-processing pair-end reads, if applicable
      • Requires access to an Index and an Aligner

  • The typical way to build an aligner executable is as follows:
    1. FastaFile to read refernece
    2. create an Index by loading or building a new one
    3. create an Aligner with user-set penalty values
    4. create a Mapper using the above two objects
    5. create a MatchMaker only if dealing with pair end (PE) reads
    6. create a FastqFile (or two in PE), using files with reads
    7. create a SamFile
    8. create a Driver using the above three (or four, for PE)
    9. Run() the Driver

I don't see anything horribly wrong with it... as it's built various usable aligners, but I don't see anything wonderful about it either, being built without much planning.

-- Main.jujubix - 26 May 2010

Jay's take on the UML

Here's my take on the current UML.

Some points to note:

  • The Mappers are now specific to their index.
  • Construction of an aligner is still the same. You just have to be careful with which Mapper you choose to use with which Index.

-- Main.jayzhang - 28 May 2010

Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View |  Raw edit | More topic actions
Topic revision: r5 - 2010-05-28 - jayzhang
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback