Tags:
create new tag
view all tags
Assume for all instances below that a single FASTA reference ref.fasta and a single FASTQ read file reads.fastq are in the same directory as the program executable

Running bowtie:

  • Build the index
    • bowtie-build ref.fasta reference
  • Single-end alignments
    • bowtie -S reference reads.fastq output.sam (-S required for SAM output)

Running bwa

  • Build the index
    • bwa index ref.fasta
  • Single-end alignment
    • bwa aln ref.fasta reads.fastq > output.sai
  • Convert to SAM
    • bwa samse ref.fasta output.sai reads.fastq > output.sam

Running Mosaik

  • Build the index
    • MosaikBuild -fr ref.fasta -oa ref.dat
  • Convert the reads
    • MosaikBuild -q reads.fastq -out reads.dat -st illumina (illumina is the machine the reads were produced by)
  • Single-end alignments
    • MosaikAligner -in reads.dat -out output.dat -ia ref.dat -hs 14 -act 17 -mm 2 unique
  • Convert to SAM
    • MosaikText -in output.dat -sam output.sam

Note that the above was adapted from the Build and Align scripts from in the data directory of the Mosaik download.

Running readaligner

  • Build the index
    • builder ref.fasta
  • Single-end alignments
    • readaligner -k 2 -v -s -o output.sam ref.fasta -q reads.fastq
      • -k 2 for up to 2 mismatches
      • you can change -k to -i for 2 gaps (in my experience, runs much slower)
      • -v for verbose
      • -s for sam output (I don't think that readaligner output is tview friendly yet...)
      • -o to specify the output file
      • -q to specify that input is FASTQ

If all went well, you should obtain an output.sam file, using any of the above programs...

You can then display this using SAMTools:

  • Index the reference
    • samtools faidx ref.fasta
  • Convert SAM to BAM
    • samtools import ref.fasta output.sam output.bam
  • Sort the BAM (orders the reads by positions in the reference)
    • samtools sort output.bam sorted (will create sorted.bam)
  • Index the sorted BAM (unsorted BAM files cannot be indexed)
    • samtools index sorted.bam
  • Display
    • samtools tview sorted.bam ref.fasta

On a side note, you can convert a BAM file to a SAM file like so:

  • samtools view sorted.bam > sorted.sam

As typing these can be a real pain, I actually automate the process using batch files...

Some notes about output... The below three are considered "identical" for our purposes

baligner
SLXA-EAS1_89:1:1:672:654/1   0   gi|170079663|ref|NC_010473.1|   4017473   65   35M   *   0   0   GCTACGGAATAAAACCAGGAACAACAGACCCAGCA   cccccccccccccccccccc]c``cVcZccbSYbY   NM:i:0   MD:Z:35   AS:i:35

bwa
SLXA-EAS1_89:1:1:672:654   0   gi|170079663|ref|NC_010473.1|   4017473   37   35M   *   0   0   GCTACGGAATAAAACCAGGAACAACAGACCCAGCA   cccccccccccccccccccc]c``cVcZccbSYbY   XT:A:U   NM:i:0   X0:i:1   X1:i:0   XM:i:0   XO:i:0   XG:i:0   MD:Z:35

bowtie
SLXA-EAS1_89:1:1:672:654/1   0   gi|170079663|ref|NC_010473.1|   4017473   255   35M   *   0   0   GCTACGGAATAAAACCAGGAACAACAGACCCAGCA   cccccccccccccccccccc]c``cVcZccbSYbY   XA:i:0   MD:Z:35   NM:i:0

Some notes about the columns:

  1. name: may vary slightly only
  2. flag: should be identical
  3. reference: should be identical
  4. position: identical for unique reads only (not for multimap, to see if this is true, check the XT tag at the end bwa output
    • XT:A:U means the read is a single Unique read, which XT:A:R means it's a Randomly selected multimap read
  5. mapping quality: different (our numbers should be nearly identical to MOSAIK, another aligner)
  6. cigar: identical for unique reads only (an encoding of the aligned read)
  7. mate reference: * for single end
  8. mate position: 0 for single end
  9. insert size: 0 for single end
  10. sequence: should be identical
  11. quality: should be identical
  12. everything beyond is a tag, which vary greatly between programs but when present...
    • NM is the number of mismatches, and should be identical
    • MD is an encoding of the aligned reference, and should be identical

A more interesting example:

baligner
SLXA-EAS1_89:1:1:713:81/1   0   gi|170079663|ref|NC_010473.1|   4058875   5   17M1D18M   *   0   0   AAAAGCTGGGTGAGTGGCGATGACGCGCAATAAAA   cccccccccccccccccccccccccccJcc^^bb^   NM:i:1   MD:Z:17^G18   AS:i:32

bwa
SLXA-EAS1_89:1:1:713:81   0   gi|170079663|ref|NC_010473.1|   4058875   37   15M1D20M   *   0   0   AAAAGCTGGGTGAGTGGCGATGACGCGCAATAAAA   cccccccccccccccccccccccccccJcc^^bb^   XT:A:U   NM:i:1   X0:i:1   X1:i:0   XM:i:0   XO:i:1   XG:i:1   MD:Z:15^G20

bowtie
SLXA-EAS1_89:1:1:713:81/1   4   *   0   0   *   *   0   0   AAAAGCTGGGTGAGTGGCGATGACGCGCAATAAAA   cccccccccccccccccccccccccccJcc^^bb^   XM:i:0

In this case, bowtie returns an unmapped hit (it doesn't allow indels by default), while bwa returns an identical unique hit. Note that the CIGAR and MD are slightly different, but upon further investigation, the different is due to where the gap is introduced: we place the gap on the last G in the triplet of G's, while bwa places the gap on the first.

And finally, a multimap example, when allow_multimap is true:

baligner
SLXA-EAS1_89:1:1:721:668/1.1   0   gi|170079663|ref|NC_010473.1|   521562   62   35M   *   0   0   GCTGTAGATCTGGAAATCGCAACGGAGGAAGAAAG   ccccccccccccccccccccccccccccccbbbbb   NM:i:0   MD:Z:35   AS:i:35
SLXA-EAS1_89:1:1:721:668/1.2   0   gi|170079663|ref|NC_010473.1|   634822   62   35M   *   0   0   GCTGTAGATCTGGAAATCGCAACGGAGGAAGAAAG   ccccccccccccccccccccccccccccccbbbbb   NM:i:0   MD:Z:35   AS:i:35

bwa
SLXA-EAS1_89:1:1:721:668   0   gi|170079663|ref|NC_010473.1|   521562   0   35M   *   0   0   GCTGTAGATCTGGAAATCGCAACGGAGGAAGAAAG   ccccccccccccccccccccccccccccccbbbbb   XT:A:R   NM:i:0   X0:i:2   X1:i:2   XM:i:0   XO:i:0   XG:i:0   MD:Z:35   XA:Z:gi|170079663|ref|NC_010473.1|,+634822,35M,0;gi|170079663|ref|NC_010473.1|,+633775,35M,1;gi|170079663|ref|NC_010473.1|,+520515,35M,1;

bowtie
SLXA-EAS1_89:1:1:721:668/1   0   gi|170079663|ref|NC_010473.1|   521562   255   35M   *   0   0   GCTGTAGATCTGGAAATCGCAACGGAGGAAGAAAG   ccccccccccccccccccccccccccccccbbbbb   XA:i:0   MD:Z:35   NM:i:0

Note that our programs finds two exact match hits, and lists them both with unique names.

bwa is slightly interesting:

  • it randomly selects one of the two positions we found
  • XT:A:R means that it's a Random multimap
  • X0:i:2 means that it found 2 exact matches (0 mismatches)
  • X1:i:2 means that it found 2 matches with 1 mismatch
  • XA:Z:gi|170079663|ref|NC_010473.1|,+634822,35M,0;gi|170079663|ref|NC_010473.1|,+633775,35M,1;gi|170079663|ref|NC_010473.1|,+520515,35M,1; lists the three other matches (the 1 other perfect match + the two 1 mismatch hits)

bowtie just picks one, and doesn't tell you that it's a multimap at all... (consequently, bowtie runs twice as fast as bwa)

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View |  Raw edit | More topic actions
Topic revision: r4 - 2010-05-20 - jujubix
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback