Which de novo assembly algorithm is best for my data?

There are a number of assemblers available in Geneious. Some assemblers are not bundled with Geneious but may be installed as optional plugins from Tools -> Plugins. The best assembler to use may depend on your data. Below is a brief overview of the advantages and disadvantages of some assemblers.
 
Geneious
 
Advantages:
      - Produces large contigs
      - Produces contigs containing reads
      - Can produce list of unused reads
      - Can produce circular contigs

Disadvantages:
      - Slow (not feasible to use on genomes over 100 Mbp)
      - High memory usage


Tadpole (Geneious R9 onwards)
 
Advantages:
      - Extremely fast
      - Very low mis-assembly rate
 
Disadvantages:
      - Produces only consensus sequences
      - May produce shorter contigs
      - May not work as well on gappy error models (e.g. 454, IonTorrent, PacBio)
      - Does not produce scaffolds

 

SPAdes (Geneious 10.1 onwards)

Advantages:
      - Produces long and accurate contigs
      - Works with many data types (Note: Oxford Nanopore, PacBio, and Sanger reads can only be used in hybrid assemblies with higher quality short read data). 

      - Supports RNA and metagenome data
 
Disadvantages:
      - Produces only consensus sequences
      - Doesn't work with low coverage
      - Not designed for large genomes

 

Flye (plugin for Prime 2020.1 onwards)

Advantages:
      - Fast
      - Designed for PacBio and Nanopore data
      - Works with metagenome data

Disadvantages:
      - Doesn't support short reads (under 1000 bp)
      - Produces consensus sequences only


Velvet (plugin)
 
Advantages:
      - Fast
      - Widely used  
      - More efficient on larger genomes than Geneious assembler

Disadvantages:
      - Produces only consensus sequences
      - May not work as well on gappy error models (e.g. 454, IonTorrent, PacBio)

MIRA (plugin)

Advantages:
      - Works very well on bacterial genomes
      - Produces large contigs
      - Produces contigs containing reads 

Disadvantages:
      - Not feasible to use on large genomes     

Have more questions? Submit a request

Comments