> > | 01/20/2011
I managed to implement a basic version of the Smith-Waterman on the GPU. The version is linear matching only (not Affine). Currently, I just create a matrix, where the reference is on the x axis and the read is on the y axis. I then create one thread per read base and align one sequence in each thread block. The alignment just runs through the matrix horizontally with a diagonal vector, then spits out the maximum score. It's also not completely optimized yet (have some __syncthreads() calls where I don't need them and don't do a parallel reduction for the max call), but it works!
The limitation with doing an alignment this way is that I'm limited by the number of threads I can have. I think the current limit is 512 threads/block (768 total max), which means I'm limited to aligning only 512-base reads, maximum. Of course, this isn't such a big problem, but Chris suggested a method to align reads so that I don't get as much "waste" in the padding regions, which may require more threads. More on this later, I guess.
To do:
- Implement Affine
- Optimize!
- Figure out the efficiency of my current implementation
|