What’s the difference between Pairwise/Multiple alignment, de novo Assembly, and Map to Reference?

Pairwise/Multiple alignment should be used when you are wanting to determine the homology between a set of sequences that span the same gene or genomic region. This function aims to minimize gaps and maximise the overall homology between sequences. Thus, Pairwise/Multiple alignment is not appropriate for assembling short sequences into one longer sequence, or for aligning sets of primers into a longer sequence as it will try to put all the sequences on top of each other to minimize gaps in the alignment and is likely to produce an incorrect result.   

Assembly is the correct function to use when you wish to merge overlapping fragments of a DNA sequence into a longer contig. This can also be done using a known reference sequence to assist the assembly (Map to Reference), or without a reference sequence (de novo assembly).  De novo assembly is generally more computationally intensive than Map to Reference and can require large amounts of RAM.  

Have more questions? Submit a request

Comments

  • Avatar
    jf x

    How robust is "Map to Reference" when the sample differs from the best available reference? For example, if the sample has a large 100,000bp insertion/deletion/duplication event, or a rearrangement/inversion, such as might happen in cancer genomes. How will Geneious handle this large discrepancy during "Map to Reference"? Will those reads be ignored? Will they be annotated as a large insertion/deletion? How does utilize the consensus information in the reads to create a new reference?

  • Avatar
    Matt Kearse

    Map to reference won't handle insertions longer than the read length. For these you are best to de novo assemble first then map the contigs to the reference sequence.

    Map to reference can handle large deletions if you increase the maximum gap size, although it won't be that efficient. You can run the variant finder after mapping to annotate the deletions and insertions. Duplication events are noticeable if you select the option to map all reads to all matching locations. Inversions are easiest to identify with paired reads that have wrong directions or insert sizes around the breaks.

    Map to reference also has an iterative mode where it forms the consensus sequence from the first iteration and maps reads to the consensus on successive iterations. This will allow reads to better align to each other around indels.

    The upcoming release of R9 will probably include some new improvements we're working on for the mapper to efficiently identify large deletions/rearrangements/inversions as part of the mapping process and then map reads spanning these. It will also annotate these variants on the reference sequence.

  • Avatar
    AF L

    When aligning to reference is the reference sequence 'included' in the resulting consensus sequence?

    If yes, what needs to be done to get consensus sequence based only on the assembled reads?

     

  • Avatar
    Hilary Miller

    The consensus sequence is the consensus of the reads only and does not include the reference.