> > | 08/06/10
Yesterday and today, I did a bunch of cleanup on both the rank structure and FM index classes. I pulled some methods out of StaticDNAStructure::Init so the code can be cleaner. I also rethought the naming scheme of a lot of variables to make the code a little more readable. For the FM index, I also pulled out methods from the FMIndex::Init method and overloaded the method to take in C++ strings (which are used by saligner currently). I also added the building of the BWT string directly into the FM index class instead of making it built externally. Currently, only one sequence is supported, but later on, there will be multi-sequence support.
Chris suggeted in an email that we prefetch ep ranks while doing both counts and locates, which might make it faster. I'll be discussing this with him later.
Finally, I finished the FMIndex::Locate function today. It works as expected, and I've tested it pretty thoroughly. Locate requires a bunch of LF mappings of each position in the BWT string to travel "backwards" until a position is reached whose position has been indexed. However, I can't actually get the specific character at the given position in the BWT to do the LF mapping. The FM index paper suggests doing an LF on both position i and i-1 for each character in the alphabet, since the only one that will differ in LF value is the character I'm currently on. I'm also thinking I could just implement a select function in my rank structure, which shouldn't be too hard.
To do:
- Benchmarks! I might be able to integrate the index into
saligner without too much trouble, so we can get some pretty accurate comparisons?
- Implement saving/loading of the whole index. Currently, I support saving/loading of the rank structure, and the FM index is just a few more data structures so it shouldn't be too bad.
|