What are the hardware requirements for assembly of NGS data using the Geneious de novo Assembler?

The Geneious de novo assembler is intended for use with bacterial genomes and small eukaryote genomes.  For larger eukaryote genomes we recommend using Velvet, which is available as a Geneious plugin.  For more information on which assembler to use see the following Knowledge base article Which-de-novo-assembly-algorithm-is-best-for-my-data-

Storage: The Geneious de novo assembler has very high disk space requirements, so for large assemblies 10’s to 100’s of gigabytes temporary storage may be required.  Storing your database on a solid state drive (SSD) will greatly improve speed and performance.

Processor:  The Geneious de novo assembler is partially multithreaded so adding processors will speed it up somewhat.  As with most things computing, the faster the processor and more cores the quicker things will be.

RAM: How much RAM you require will depend primarily on the size of your data set and the quality of your dataset.  The graph below provides a rough guide for the minimum and maximum amount of RAM you might need depending on your data set size and type.

The Geneious de novo Assembler always provides a minimum/maximum estimate of how much RAM will be required for assembly of a selected dataset and will not run if it expects it will not have enough RAM to complete the assembly. If in doubt, simply select your data and start the Assembler to see predicted memory requirements

General rules of thumb for RAM requirements:

  • Assembling data with higher coverage depth will require more RAM – aim for coverage between 50 and 100x

  • Assembling lower quality data, with more miscalls, indels and gaps, will require more RAM

  • Doubling the size of your dataset (total nucleotides) will roughly double RAM requirements

  • For illumina data roughly 1 GB of RAM will be required to assemble a data set of 1 million reads (with an average read length of 100 nucleotides)

 geneious_de_novo_100.png

Note that the Geneious de novo assembler will allow you to trade off RAM requirements for speed (Choose Custom Sensitivity & More Options to reduce RAM requirements).  The lowest setting will reduce your RAM requirements by roughly half.

 

How long will my de novo assembly take?:   See the following Link

The time required for the Geneious de novo assembler to complete will depend on your hardware, the size of your dataset, and the settings used for assembly.  Reducing the “Sensitivity” setting will increase the stringency of the assembler and reduce the time required to complete assembly.

 

Have more questions? Submit a request

Comments

  • Avatar
    Jeanmaire Molina
    I have about 1 million contigs of a plant genome pre-assembled in CLC. Can I improve the assembly (get bigger contigs) in Geneious? How much Ram would I need?
  • Avatar
    Matt Kearse

    The Geneious de novo assembler handles reads of any length, so yes you can use it to produce bigger contigs from the output of other assemblers although you'll often get better results just using the original data when possible.

    Memory usage depends quite a bit on the read lengths and settings used. The best way to estimate memory usage is to select the data in Geneious and click de novo assemble and Geneious will provide you with an estimate based on your data and the settings currently selected.

    Data sets often have excessive coverage (over 100 fold) which can cause the assembler to run slower and use more memory than required. In this case we recommend you select the top assembly option to use a subset of your data so that your expected genome size lies within the suggested range.

  • Avatar
    Saeid Kadkhodaei

    In order to work on an Eukaryote RNA seq project having around 60 million reads per treatments, are the following specifications possibly accepable for analysis of such data:

    - RAM = 16 GB

    - Processor: 3.5 gHz

  • Avatar
    Hilary Miller

    Hi Saeid, we don't recommend the Geneious de novo assembler for assembling RNAseq data as the algorithm is not specifically designed for transcriptomes.  

    However, the Geneious Mapper can be used for RNA Seq if you have a reference genome, and in version 9 we have released an RNA-seq specific algorithm for the mapper.  The RAM requirements for mapping are much less than that required for de novo assembly.  It really depends on the size of your reference genome as to whether 16GB of RAM will be sufficient.  For a genome similar to the human genome we would normally recommend a machine with 32GB of RAM, but you may be able to get away with only 16GB.  If you have multiple CPUs this will also speed things up.