Post

2 followers Follow
0
Avatar

Reduce the tedium of intron/CDS annotations

First, my situation: I had a dataset of 32 full mitochondrial genomes, unannotated. They were riddled with introns (98 total across 15 genes) which vary considerably between genomes, and these introns are huge (1000-5000 bp). I aligned them (by hand) in a Geneious alignment for a total alignment of ~225,000 bp.

While aligning them, I annotated genes, exons, and introns as I identified them. Now I need to annotate CDS for each gene, and I feel it's much more tedious than it needs to be.

(1) It would be great if CDS could automatically be annotated within a gene, either by automatically selecting all 'exon' annotations within a 'gene' annotation and adding a 'CDS' annotation; or by selecting all regions of a 'gene' annotation where there is not also an 'intron' annotation and adding a 'CDS' annotation. With a dataset this size, the best I could come up with is to create a consensus genome (which included all possible introns among the 32 genomes), add it back to the alignment and annotate it, and annotate each individual CDS by hand by ctrl+clicking the exons for each gene and adding a CDS annotation. Then I used transfer annotation (one at a time for each of the 32 genomes) to transfer those CDS to every other genome. But, transferring annotations just does not work to close gaps in CDS where introns are absent, which leads me to point #2. 

(2) It'd be great if you could at least ctrl+click two annotation intervals (or individual annotations) and 'merge' them together in the right click menu. At this point, I had to go in and join every CDS region where there was not an intron one at a time by hand. This is also harder that it should be; the CDS needs to be double clicked, then the exact region found in the list of regions (some of these genes have as many as 17 introns, so that's a lot of CDS regions) and deleted, then the preceding region stretched by hand to cover the deleted region. Or, what about being able to join any intervals that have only gaps (or nothing) between them?

Either of these would have saved hours of time, and I can imagine datasets several times larger than this but can't imagine how they could practically be worked with. Even the annotation merging feature would be a huge stress- and time-saver, for both this and other applications. But if suggestion (1) were implemented, i.e. automatic CDS annotating for each sequence, it would avert this entire process considering each sequence already has gene, exon, and intron annotations, all three of which transfer over fine as long as the alignment is good.

The bottom line is that the accurate transfer of CDS from a reference to a sequence that significantly differs in introns is problematic.

Thanks!

Chase Mayers

Please sign in to leave a comment.

1 comment

0
Avatar

Hi Chase, if you've already got your exon annotations where you want them, then you can quickly create a multi-interval CDS by doing the following:

1. Control-click (command-click on mac) all the exon annotations to select all of them.  Then right-click somewhere in the viewer and go to Annotation->Add.

 

2.  Uncheck "create separate annotations for each interval", and configure the rest of the annotation as you want.

This will then create a single, multi-interval annotation spanning the exon intervals.  

You can also select the exons via the Annotations table (select them in the Table, then switch back to the Viewer and they should be selected on the sequence).  This is sometimes easier than control-clicking to select them. 

If you have exons that are not separated by an intron (but are still 2 separate exon annotations), you can select the whole region using Shift-click instead of control click, then do the same "Add Annotation" operation to add a single interval CDS spanning the 2 exons.  

 

Hilary Miller 0 votes