Difference: JaysJournal (69 vs. 70)

Revision 702010-08-12 - jayzhang

Line: 1 to 1
 
META TOPICPARENT name="NGSAlignerProject"
May 2010 archive
Line: 88 to 88
 
  • Benchmarks! I might be able to integrate the index into saligner without too much trouble, so we can get some pretty accurate comparisons?
  • Implement saving/loading of the whole index. Currently, I support saving/loading of the rank structure, and the FM index is just a few more data structures so it shouldn't be too bad.
Changed:
<
<

08/11/10

>
>

08/10/10

 Implemented a save/load feature for the FM index. I'm also thinking of implementing a "partial" load feature, where only the BWT string is loaded, and the other data structures are still built. The reason for this is that the BWT string is the one that takes the most memory (and maybe time, too) to build and should be constant for all indexes, while the other structures will differ depending on sampling rates and memory restrictions. So, the BWT string can be passed around between machines (with different memory restrictions) easily, while the other data structures can't.

I also did a few preliminary benchmarks, and the times were not great on the Locate function. I think this might be because we don't implement Locate the "proper" way, which guarantees a successful locate within sampling rate number of queries. Following Chris' suggestion, I tried graphing the number of backtracks it takes before a successful location is found on a random readset, and here are the results:

Line: 107 to 107
 To do:
  • Make a "proper" locate structure and compare.
Added:
>
>

08/11/10

Finished implementing the "proper" locate structure, which guarantees a hit in sampling rate backtracks. I did a few small tests first, where I set the DNA rank structure sampling rate to a constant 64 bits. For aligning 2 million reads a maximum of 1 time, the "proper" version outperforms the previous version by quite a bit, even under roughly the same memory usage (6.9 seconds at sr=64, memory=37MB; vs 25.9 seconds at sr=16, memory=39MB). Even if I up the sampling rate of the "proper" locate to 256 bases (memory usage = 34MB), it performs at comparable speeds.

To do:

  • Benchmark more thoroughly
  • Test more thoroughly
  • Implement a feature to load/save only the BWT string, instead of the whole index, which could change based on different memory usage profiles.
  • Get ready for integration?
 
META FILEATTACHMENT attr="h" comment="Rank graph" date="1278719684" name="rank-graph.png" path="rank-graph.png" size="32863" user="jayzhang" version="1.1"
META FILEATTACHMENT attr="h" comment="" date="1278982205" name="rank-graph2.png" path="rank-graph2.png" size="26249" user="jayzhang" version="1.1"
META FILEATTACHMENT attr="h" comment="" date="1280528212" name="rank-graph3.png" path="rank-graph3.png" size="23549" user="jayzhang" version="1.1"
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2025 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback