Difference: JayJuneArchive ( vs. 1)

Revision 12010-07-06 - jayzhang

Line: 1 to 1

Added:

>
>

META TOPICPARENT	name="JaysJournal"

06/01/10

Here are a few more times, this time with all bin and data files on the /tmp/ folder with times calculated from /usr/bin/time.

Using commit 5a82392eeeb5653d1b75b7a8dd7ccdd499605cfa and saligner, with the 125000 readset (real time)

Type	k	max_results	Unmapped (+1662 invalid)	Search Time (s)	Avg time/read (s)
Exact	-	1	17732	4.727	0.000037816
Exact	-	10	17732	5.01	0.0004008
Mismatch	2	1	5466	8.251	0.000066008
Mismatch	2	10	5466	9.438	0.000075504

Using BWA with flags -l 99 -k 2 outputting to SAM:

Type	k	max_results	Unmapped	Search Time (s)	Avg time/read (s)
Mismatch, I guess	2?	?	?	2.832	0.000022656

Using readaligner with flags -k 2 --sam --fastq:

Type <-- -->	k	max_results	Unmapped	Search Time (s)	Avg time/read (s)
Mismatch	2	1	?	6.662	0.000053296

Bowtie with version info:

64-bit
Built on sycamore.umiacs.umd.edu
Sat Apr 24 15:56:29 EDT 2010
Compiler: gcc version 4.1.2 20080704 (Red Hat 4.1.2-46)
Options: -O3 -m64  -Wl,--hash-style=both
Sizeof {int, long, long long, void*, size_t, off_t}: {4, 8, 8, 8, 8, 8}

Mode	Flags	Mapped	Unmapped	Search Time (s)	Average Search Time (s)
0-Mismatch	`-v 0`	105606	19394	1.310	0.00001048
1-Mismatch	`-v 1`	115631	9369	1.850	0.0000148
2/3-Mismatch	`-v 2`	118356	6644	1.878	0.000015024
2/3 Mismatch	`-v 3`	119430	5570	3.441	0.000027528
Seeded Quality	`-n 0`	112317	12683	1.478	0.000011824
Seeded Quality	`-n 1`	117659	7341	1.679	0.000013432
Seeded Quality	`-n 2`	118356	6644	1.974	0.000015792
Seeded Quality	`-n 3`	118330	6670	2.560	0.00002048

Also had a meeting with Chris today, where he clarified some of his previous communications. Our first priority should now be to compare Bowtie exact matching functions to saligner or readaligner exact matching. Daniel took on a job of writing a better locate function for exact matches. I will be looking for more areas to optimize and also fix some bugs and write a better saligner wrapper that actually takes in arguments. Also, since mismatches in cigars are recorded no differently from matches, both mismatch and exact can have their own defined cigar (e.g. 35M), instead of having to compare reads to references to generate our own cigar.

To do:

In Exact and MismatchMappers, just make a cigar on the fly instead of passing stuff to SamEntry.
More low level optimizations.
Make a better saligner wrapper.
Fix a bug in cigar generation (ticket 53df76)

06/02/10

I did a bunch of minor changes to the IO files to optimize them a little further. Most of the changes were just reserving space for strings/vectors beforehand, fixing a few unneeded lines, etc. I also looked into ticket 53df76 (buggy CIGAR output on indels), and found that it's not really a bug with the CIGAR generator. The bug actually comes from using the LocalAligner after a gap alignment to generate the "dashes" that the CIGAR generator needs. However, the behaviour for LocalAligner is unpredictable when used in this context; sometimes it will clip the start or end of the read and throw in mismatches instead of just inserting the gaps (which is actually what it's supposed to do, given it's local). So, there actually aren't bugs in either the CIGAR generator or the LocalAligner. Unfortunately, we can't really fix the bug given the current situation, because there's no other way to put dashes into the aligned ref/reads. The only solution is to either figure out how IndelQuery does its CIGAR generation or write our own IndelQuery (which we might just do anyway). Since the indel support is lower priority, I'm going to table this bug for later.

Another optimization I did today was make a convenience method in SamEntry to generate an entry representing an exact match (as per Chris' suggestion yesterday). Because exact matches are so predictable, it saves us having to generate the CIGAR, and a bunch of the tags. We also don't need to extract the original reference from the index, since it's not really needed. Mismatches should also be similar, since mismatches aren't represented differently from matches in CIGARs. However, we do have an MD:Z tag that supports mismatches (like BWA), but I'm not sure whether this is needed at all.

Finally, I added support for arguments in saligner, where we can specify reference, reads and output for convenience, so we don't have to recompile every time we need to use a different read file.

Anyways, I decided to run some tests on my newest changes using the 2622382 reads file just for completeness and for more accuracy, since I didn't do any tests on that file yesterday, either.

Under commit c6f55e33aa732c8952d1f56fa4c1fe6aa3875677, with the 2822382 reads file and Release build:

Type	k	max_results	Unmapped (+20957 invalid)	Search Time (s)	Avg time/read (s)
Exact	-	1	350717	86.634	0.000033036
Exact	-	10	350717	95.99	0.000036604

And with Bowtie (same version info as above):

Mode	Flags	Mapped	Unmapped	Search Time (s)	Average Search Time (s)
0-Mismatch	`-v 0`	2250708	371674	27.341	0.000010426

To do:

Do some profiling again...I haven't done that in a while and there's been a lot of changes to the code.
Start reading up on FM index.

06/03/10

Worked on more optimizations, this time guided by Valgrind. From KCacheGrind, I found that about 10-15% of our cycles were being used up in IO, especially making/writing SAM entries. So I went in and did a few more minor optimizations, such as using ostringstream instead of stringstream. Preliminary results (tested on my machine) shows it runs something like 10% faster now.

Up until now, I've really only been testing exact matches with saligner over the 125,000 reads file. I was doing one final test on the 2M read file (to verify output and to do a preliminary speed test), I noticed a lot of disk activity after about 40k reads! I'm guessing this is what cache misses look like (it could just be it reading from the reads file, I'm not really sure on this, have to confirm), so I'm going to have to do some more profiling with Callgrind. Granted, my system doesn't have as much memory as skorpios, but this might be one of the areas we can improve on.

Other than that, I think it might be a good idea to start reading up on FM indexes, because I really can't do much else without knowing how the aligner works. So tomorrow, I might start on my reading (probably following in Daniel's footsteps).

To do:

Check if we're really getting lots of cache misses.
Start on my reading!

Update: Ran a cachegrind over 2M reads, and didn't get very many cache misses (something like <1%), so I guess it was actually reading from input? I'll have to check out the log a bit more tomorrow. On a side note, the profiling took about 2.5 hours with 2M reads on my machine (don't even know the time because the built-in timer went negative), so it's not a very practical thing to do...

06/04/10

I feel like I've done all the minor optimizations on the I/O classes at this point, and probably won't be improving speeds much. So I spent the day reading up on FM indexes

using the same paper Daniel listed a while back. I now know how the BWT works and how count functions work on the index. I haven't really gotten through locate yet, but it shouldn't be too bad.

With my new-found knowledge, I took another look at the exact matching implementation in readaligner and now I feel like I get it a bit more. I tried to look up some papers on wavelet trees (I think that's what wv_tree stands for) and the rank functions, since that's pretty much the bottleneck but I couldn't really find much stuff on it (not that I tried that hard).

So for later:

Find some literature on wavelet trees and take a look at the rank function some more
Do some more reading...

06/07/10

Chris suggested finding the ratios between locating with 1 mismatch vs doing an exact locate for both bowtie and saligner. So here it is:

Run on skorpios with saligner commit 5f7f77021a321eab0019825bec36969209e707b6 and using the time command.

Aligner	Exact time (s)	1-mismatch time (s)	Mismatch:Exact ratio
saligner, max_results = 1	79.397	107.989	1.360
saligner, max_results = 10	86.937	127.223	1.46
bowtie (`-v 0/1`)	27.115	29.966	1.105
readaligner (`-k 0/1`)	68.646	87.841	1.280

Note: readaligner returns 1 max result, I believe.

Afternoon update: Well, I managed to shave another 5 or so seconds off saligner. Doing a brief benchmark, with same conditions as above: saligner, max_results = 1: 73.58s for exact alignment.

It's a lot closer to readaligner now. I ended up changing all the osstreams and ofstreams to sprintf and fprintf, which made it quite a bit faster. I'm debating whether I should change the reading code to cstdio as well, but it might get a bit more complicated, since I don't know the actual size of each line and the C input functions all require a char array buffer. Our saligner also does a bit more than readaligner. For example, our FASTQ file processing is much more robust and follows the FASTQ spec better.

Chris also mentioned that something else I can do is to replace the rank structure. He mentioned Daniel was looking at a 64-bit implementation, so that might be something to look at? I'll need to get more details later.

For tomorrow:

A few more minor optimizations.
Take a look at bowtie source code, maybe?

06/08/10

Decided to do one last benchmark with 32-bit bowtie instead of 64-bit bowtie, which might be a more accurate comparison, given that readaligner doesn't really take advantage of 64-bit registers (I don't think...). Anyways, here are the results:

Aligner	Exact time (s)	1-mismatch time (s)	Mismatch:Exact ratio
bowtie (-m32 compiled) (`-v 0/1`)	41.526	46.473	1.119

Also had a meeting with Daniel and Chris today (see Daniel's journal, June 8 entry for details).

After the meeting, I did a bit of tinkering with the FASTQ reader. After some discussion with Daniel, we decided it might be better to trade robustness for speed in the reader, as most FASTQ output these days are more standardized (for example, no wrap-around for sequence or quality strings). So by taking all those checks out, I was able to get another roughly 10-15% increase in speed. I also started doing some work on converting our C++-style strings to C-strings. I started with the FASTQ reader again, but it's going to end up being a huge change that'll effect almost all of our classes. At the same time, I think I should also start looking into using regular arrays instead of vectors where applicable.

To do:

Pretty much refactoring (converting C++ strings to C strings).

06/09/10

Experimented a bunch with char arrays vs C++ strings. I did a quick and dirty replacement of strings with char arrays in the FastqEntry and FastqFile classes, and then fixed all the compile errors that came with it, but there actually wasn't too big a difference. In fact, I couldn't really see a difference at all. Granted, I did do a pretty poor job of doing the replacement (just wanted results), so it might not have been as optimized as it could be, but it seems that changing over wouldn't make that much of a difference.

I also benchmarked my changes from yesterday (swapping out the Fastq read function). It seems saligner is now faster than readaligner (at least in exact matching). Anyways, here are the times:

Aligner	Exact time (s)	1-mismatch time (s)	Mismatch:Exact ratio
saligner, max_results=1	64.461	94.434	1.465

The mismatch:exact ratio is still a bit off from bowtie's and readaligner's, but that might also be because mismatches still unecessarily go through the CIGAR generator (which I should fix sometime).

There was quite a bit of discussion on some points discussed yesterday, detailed in Daniel's journal, June 10 entry.

Finally, spent the rest of my time conerting strings to char arrays, though I'm still not sure if it's doing anything. Regardless, it should be more future-proof.

There's also one point I'd like to look at in the ExactMapper, though I doubt it has too much of a performance impact. When the total results of an alignment is greater than max_results, it seems we do a Locate on all the results, not only max_results. This should be a pretty easy fix.

To do:

More converting
Look at fix mentioned above.

06/10/10

Bahhhhh spent all day converting strings to c-strings. I've decided to convert only things having to do with sequences and qualities to c-strings. To make it easier to see, I put up some typedefs:

#define NGSA_READLEN 128
typedef char SequenceBase;
typedef char QualityBase;
typedef SequenceBase SequenceString[NGSA_READLEN+1];
typedef QualityBase QualityString[NGSA_READLEN+1];

I'm about 75% done, I think. I finished the I/O classes and the drivers, as well as half the mappers. Just need to convert the rest, and then start on the pair-end matcher, and some other stuff. Also optimized some of the lower-level algorithms (such as ngsa::ReverseComplement) to make better use of char arrays. I hope the end result is a speed up, or at least doesn't result in a speed-down! Some parts still aren't very efficient, because there are other methods that still take in C++ strings, where I end up having to do a conversion (which takes O(n) time). I think strlen is also O(n), versus O(1) on std::string::size (I think that's what these are), since std::string can just update the size every time the string grows/shrinks, so I should make sure to avoid using strlen too often.

To do:

More conversion...

06/11/10

Finally finished converting almost all the NGSA library classes to c-strings, except the local aligners. I'm going to forgo the aligners for now because I'm planning on doing some more optimization on them later, so I'll have to change the code there anyway. I'm still not sure whether all my changes work, though, because I still can't seem to compile! It seems there's a problem with my common header, where I store all the constants and typedefs. For some reason, when it gets to linking all the libraries, it keeps giving me errors in whatever headers single_end_aligner.cc includes that also use my typedefs, saying that it can't find the definitions. I spent a long time on Google looking up forward declaring headers and typedefs, etc., but I still can't find a solution...maybe I'll have to ask Chris on Monday or Tuesday.

I also did a few benchmarks on a bunch of string vs char array functions, and I've developed the following guidelines for any future compatability issues or conversions:

Converting a string to char array using std::string::c_str() or std::string::data() are negligibly fast, so don't worry about doing it!
There's no difference as far as I can see between std::string::data() and std::string::c_str() in terms of speed.
Converting a char array to string (using std::string(const char *) or std::string::operator=(const char *)) is very slow (O(n)), avoid as much as possible.
Using strncpy with some pointer math and appending a NULL is much faster than string::substr (something like 4-5x faster).
std::string::operator[] is a bit slower than char array's [] operators as well, probably because the string does bounds checking (not sure on this). Even doing std::string::c_str(), then using the [] operator is faster (though not by much)
Also, be wary of the string's operator=(const char *) assignment, which sometimes masks a char array --> string conversion. Try being as explicit as possible when setting a string equal to a char array (by using the constructor), to avoid confusion

Finally, I think I might take Chris up on his offer for taking a break to study for my MCAT. I'll see how far I get on the weekend, but I might take Monday off for an extra study day. I guess we'll see after the weekend.

To do:

Finish the conversion by getting stuff to compile! (I really hope saligner ends up being faster).

06/18/10

Back from my MCAT!

Finally got my code to compile for the C string conversion. Now it's off to fix bugs and segmentation faults! I managed to clear up all the segmentation faults during the actual alignment process, but the output's still not quite right. My new build seems miss most of the reads that are alignable from an old build. Also, I get a segmentation fault during the index building process, which is really strange because I haven't even touched the code for that. No new code is even called until after the index has been built! Maybe I'll have to run the build through the debugger...

I also spent a large portion of the day experimenting with an idea I had last night. When calculating the rank value for a position i, instead of going to the next smallest pre-calculated position and doing popcounts up until i is reached, I modified the function so it goes to the nearest pre-calculated position, and either goes up or down from there. According to callgrind, popcount only gets called about 60% as much on the 125000 read dataset (170,862,051 vs the original 254,181,103 calls). Unfortunately, this only results in a very small speed increase, something like 3-5%, which is strange because I expected the gain to be a lot more.

Edit: Just noticed Daniel's started to implement a different rank structure, so my change might not even be relevant :(.

Finally, I noticed Daniel's made a lot of changes...not looking forward to the huge merge later...

To do:

Fix bugs in the new C string version.

06/22/10

I realized I forgot to make a post yesterday, so here it is:

Worked on moving over from C strings still. I managed to fix all the segmentation faults, even with the index-building step. Also fixed most of the bugs that arose from the conversion. Now, the exact aligner returns exactly the same output. The OneMismatchMapper doesn't quite do it, but I'm hoping the merge will take care of that.

Also worked on merging the branches, which might take a bit of time. There's so many new changes!

Entry for today: Had a meeting with Anne and Chris today (see Daniel's journal for notes). Anne and Chris have been discussing giving me a bigger aspect of the project to work on, which is great.

Chris suggested three things from the top of his head:

Work on building the index (which currently requires something like 12 GB for the human genome) in a memory-efficient fashion
Build an FM index from scratch, which would require things like a rank structure, etc.
Work on RNA Seq stuff (intron support, splice graph, etc)

So far, option 2 seems the most interesting to me, but all of them are pretty good!

Anyways, managed to finish merging everything. Mismatches and Exact matching work fine so far. The jumpstart mapper doesn't work (segfaults somewhere), but I think that might be something to do with the jump table not saving to the right place. Daniel also made some bug fixes, so I think I'll have to do some final (minor) merges, and then I'll be done the conversion!

To do:

Think about some bigger parts of the project I might like to do.
Finish up the C-string conversion (hopefully, done by tomorrow).

06/23/2010

FINALLY finished porting everything over to char arrays. I'm pretty sure everything's solid, I do as much to prevent buffer overflows as possible, and output looks the same as the old aligner using saligner and most mappers. Speedwise, it's pretty similar; I really can't tell if there's even been an improvement. I guess this port was to make it more future-proof (with packed strings) than anything.

Anyways, some outstanding bugs:

MutantMapper output isn't quite the same as the previous aligner. I think there are two different bugs:
- CIGAR strings on some reads add up to 36 on 35 nucleotide reads, which is strange...for example, one string will show up as "33M3S", where the old aligner shows "33M2S".
- For reverse strand reads, some positions seem to be 1 base less than the old aligner. I'm really not sure why...
KmerMapper output also isn't quite the same:
- CIGAR strings for some reads also add up to 36, just like MutantMapper
- Regardless of forward/reverse strand, some alignment positions are off, but by more than 1. This also screws up the CIGAR string.

The similarities of the above two cases seem to point to something that they have in common going wrong. They both use a LocalAligner as the final step of their alignment, so I'm thinking there might be something wonky going on there. I haven't really touched the LocalAligner yet, because I want to make some changes to it later on anyway, so I might as well put that off. Also, these two mappers won't be used too much for the time being (I don't think), so I think I"ll just put that off for now...

Chris also suggested I take a look at threading the aligner in the meantime, while he's still thinking up something for me to do. He suggested looking into Boost threads, which I did. I don't know...from the arguments I saw, the arguments seem to be more along the lines of "boost threads are easier to use" and "why not use boost threads?", so I think I'll just stick with those. Now I just have to figure out how to add the library to CMake for building. The actual threading looks kind of simple, since I can just have each thread work on a different read. The only shared object the two aligner threads would have are the input/output classes (SamFile and FastqFile) as far as I can see, and I think locking them up won't be too hard.

To do:

Push my latest changes after doing some final robustness tests (I'm kind of nervous...don't want to break anything!)
Add multithreading support!

06/24/10

Did some last-minute verifying for my C-string conversions (still a bit nervous about it!), and it all looks fine, aside from the KmerMapper and MutantMapper, detailed above. So I finally pushed the changes into dev.

Also began looking at multi-threading. It doesn't look too hard, and I've verified that the only places requiring locks should be with FastqFile and SamFile. Also, the two mappers, MismatchMapper and GapMapper, which use readaligner's query classes (MismatchQuery and IndelQuery respectively), might need locks when querying, because these seem to have shared variables. This might not be such a good plan because the bulk of the calculation is in the query segment, so it might as well be single threaded. To make matters worse, none of the mapper classes are copyable, so making copies of the mappers for each thread isn't an option. Nonetheless, this shouldn't be too big of a problem, especially if we're building a new index, since we probably won't be using the query classes much anyway.

On another note, I tried integrating the Boost.Thread libraries in, but I really don't know how with CMake. I tried extracting the Thread library using boost's bcp tool, and then adding that library into our build but I got a lot of compile errors; I think the thread libraries are compiled specially with bjam. So I'll have to ask Chris about that tomorrow.

To do:

Integrate the Boost.Thread library.
Finish up multi-threading.

06/25/10

Finally got the Boost.Thread library to compile with NGSA. Interesting thing...to test if linking worked, I did an #include <boost/thread/thread.hpp> in the saligner code, but after all our other includes. When I tried to compile, I would get compile errors from within Boost. But, when I moved the include statement to the top of the include list, it was perfectly fine! None of the other NGSA classes include the boost thread library either, so that was quite strange. I guess there's yet more incentive to follow the Google guidelines

for includes?

Anyways, I've added options in the CMakeLists.txt file for configuring multithreading, and also SSE support. For threading, I've made a new driver called MultithreadedDriver (I'm going to change this to ThreadedDriver), which takes in another driver. It will then create x number of threads, each calling the driver's Run method. If the user wants no multithreading, then ThreadedDriver just calls the Run method of the driver it gets without making any new threads.

To do:

Finish up threading.

06/26/10

It works! The multi-threaded driver works, at least with a minimal CMakeLists.txt and no bells and whistles. For some reason, I had a bit more trouble linking, and the test file I was able to compile on Friday wouldn't compile, despite pulling it directly from the website! So I had to reinstall Boost and it seems to work now...I feel like I've spent more time trying to get this thing to build than actually coding it!

I noticed something strange while running the threaded driver: it seems that when I specify 2 threads, it doesn't use both cores of my CPU, but when I specify >=3 threads, it seems to start utilizing both cores.

The threaded driver does work though; at least, it aligns all the 125k reads fine. I still have to lock up the driver files (single end driver and pair end driver), because I'm still running into a minor race condition when approaching the end of the file. Also, I still have to add the alignment status reporting into the threaded driver (e.g. report how many reads were aligned, not aligned, failed, etc). Finally, I need to verify the output of my threaded aligner, which I'm not really sure how to do; it seems to me that the threaded aligner might print reads out of order, so a simple diff won't suffice...

To do:

Add the "bells and whistles" to threaded driver.
Lock up driver classes to avoid race conditions.
Verify output.
Benchmark.

06/29/10

Finally got the multi-threaded aligner working, with pretty much the same functionality as before. I'm pretty sure I solved all the race conditions (with the exception of the two Query classes from readaligner), but I still don't really know how to compare the output to the old version, since everything is out of order. For now, I just made a quick glance through the output to check that every line aligns at the end properly (since I'm using the exact aligner, and each read is 35nt, every line should have pretty much the same length, given 8 spaces for each tab). It looks pretty good, but even going through the file like that is huge.

I realized that saligner wasn't running at high priority when I was doing tests yesterday, so doing that made it utilize both CPU cores on my machine. Preliminary tests on my machine shows that running with two threads drops the runtime to something like 51-52% of original speed, so there's not too much overhead from thread switching. Adding more threads does slow it down a little.

Finally, I got a new project from Chris. I'm to build a new rank structure he proposed, which should utilize the cache more efficiently and does away with the wavelet tree. Basically, we will be using two arrays for the popcount, so that we can accurately represent the four bases (as opposed to one array, which would only give us two values). The implementation will involve:

Implementing the structure for one block of 64-bits
Implementing the structure for multiple blocks of 64-bits (there's two ways to store the blocks, parallel or serial, so I should experiment in this step or step 3.)
Have multiple blocks of 64-bits, with counts for every block (there's multiple ways to store counts, so I should try experimenting here)
Adjusting sampling rate for every 32-bits, 64, 128, 256, 512.

To do:

Finish up the multi-threaded implementation: do some code inlining, check everything over, test on skorpios and merge.
Start on doing the rank structure!

06/30/2010

Finished up the multi-threading implementation. I've still yet to verify the output (I'm kind of putting it off...). I also did some tests on skorpios as follows:

Tested on commit 68bab30017bd18dd03f0689760111f229d595238, running everything from the /var/tmp/ directory without nice. Tested with the single NC_010473.fasta genome, with the 2622382 readset, using regular exact matching (no jump-start):

Threads	Real time (s)	User time (s)	System time (s)	Ratio	Theoretical
1 (with multithreading off)	64.389	60.936	1.592	1	1
1 (multithreading on)	66.336	62.616	1.456	1.03	1
2	37.990	63.536	2.588	0.59	0.5
3	31.849	73.085	3.504	0.49	0.33
4	26.918	74.169	4.580	0.42	0.25
5	31.418	73.381	5.412	0.49	0.2

Same as above, but using OneMismatchMapper:

Threads	Real time (s)	User time (s)	System time (s)	Ratio	Theoretical
1 (with multithreading off)	80.925	79.753	1.092	1.00	1.00
1 (multithreading on)	80.896	79.521	1.216	1.00	1.00
2	44.288	82.521	2.028	0.55	0.50
3	36.341	93.194	2.760	0.45	0.33
4	31.941	93.630	4.344	0.42	0.25
5	32.349	93.550	5.100	0.49	0.20

Notes:

Didn't go past 5 threads, because apparently skorpios is actually an i5 processor (not an i7), without hyperthreading, so there's not that much point.
Ratio was calculated comparing user times with the first entry (saligner compiled with multithreading set to OFF)
There seems to be a slight delay at the end of the alignment, when saligner finishes (all console output is done), but doesn't actually end. This wait comprises 2-3 seconds regardless of thread implementation and might be one factor of the poor ratios (though not much; even when this is taken into account, the difference is minimal, less than 0.05)
User time seems to make a big jump between 2 threads and 3 threads for both matchers. Maybe this is a clue?

Shifting gears, I've also started to implement the new rank structure, as discussed yesterday. I've completed step 1 and can now build one 64-bit block from a 64-character sequence of A/T/G/C. I'm using Daniel's representation for now, where A=0, C=1, G=2, T=3. It's interesting to note one rank operation is slightly faster than the other 3 rank operations, so it may be advantageous to change the alphabet assignment up to whichever base is the most common? Determining the most common is probably too slow and/or the speedup probably wouldn't be worth it though.

To do:

Rank structures!
Figure out why the multi-threaded implementation isn't working as well as hoped on skorpios.

View topic | History: r1 | More topic actions...