First, my situation: I had a dataset of 32 full mitochondrial genomes, unannotated. They were riddled with introns (98 total across 15 genes) which vary considerably between genomes, and these introns are huge (1000-5000 bp). I aligned them (by hand) in a Geneious alignment for a total alignment of ~225,000 bp.
While aligning them, I annotated genes, exons, and introns as I identified them. Now I need to annotate CDS for each gene, and I feel it's much more tedious than it needs to be.
(1) It would be great if CDS could automatically be annotated within a gene, either by automatically selecting all 'exon' annotations within a 'gene' annotation and adding a 'CDS' annotation; or by selecting all regions of a 'gene' annotation where there is not also an 'intron' annotation and adding a 'CDS' annotation. With a dataset this size, the best I could come up with is to create a consensus genome (which included all possible introns among the 32 genomes), add it back to the alignment and annotate it, and annotate each individual CDS by hand by ctrl+clicking the exons for each gene and adding a CDS annotation. Then I used transfer annotation (one at a time for each of the 32 genomes) to transfer those CDS to every other genome. But, transferring annotations just does not work to close gaps in CDS where introns are absent, which leads me to point #2.
(2) It'd be great if you could at least ctrl+click two annotation intervals (or individual annotations) and 'merge' them together in the right click menu. At this point, I had to go in and join every CDS region where there was not an intron one at a time by hand. This is also harder that it should be; the CDS needs to be double clicked, then the exact region found in the list of regions (some of these genes have as many as 17 introns, so that's a lot of CDS regions) and deleted, then the preceding region stretched by hand to cover the deleted region. Or, what about being able to join any intervals that have only gaps (or nothing) between them?
Either of these would have saved hours of time, and I can imagine datasets several times larger than this but can't imagine how they could practically be worked with. Even the annotation merging feature would be a huge stress- and time-saver, for both this and other applications. But if suggestion (1) were implemented, i.e. automatic CDS annotating for each sequence, it would avert this entire process considering each sequence already has gene, exon, and intron annotations, all three of which transfer over fine as long as the alignment is good.
The bottom line is that the accurate transfer of CDS from a reference to a sequence that significantly differs in introns is problematic.