Genomic Database Conundrum: Widespread Misannotation of rRNA Sequences as Protein Sequences

ECU Author/Contributor (non-ECU co-authors, if there are any, appear on document)
Miranda Raymond (Creator)
Institution
East Carolina University (ECU )
Web Site: http://www.ecu.edu/lib/

Abstract: The genomics revolution introduced affordable technology capable of rapidly analyzing and comparing massive amounts of biological sequence data. Using the Basic Local Alignment Search Tool (BLAST) program on the National Center for Biotechnology Information (NCBI) website , a highly expressed gene sequence obtained from the plant Leptosiphon jepsonii was analyzed. This sequence was compared against other sequences archived in the NCBI database for similarities. These comparisons encompassed various phyla of life including other green plants , fungi , metazoans , algae and single-celled organisms. The original sequence query was compared to inferred protein sequences. Then the mRNA sequences corresponding to these proteins were analyzed against complete nucleotide accessions through reciprocal BLAST searches to ensure accuracy of results. The most similar sequences from these reciprocal BLAST searches were rRNA rather than mRNA sequences. This result indicates that numerous accessions in NCBI are inappropriately characterized as mRNAs and proteins , rather than ribosomal sequences. To explore the breadth of this misannotation issue , sequences from a wide range of organisms , including model genomes , were also examined. This study indicates that rapid , automated computational analyses of massive amounts of sequence data , combined with a heightened focus on novel findings , has led to a sizable influx of erroneous data within even the most reputable databases.

Additional Information

Publication
Thesis
Language: English
Date: 2017
Keywords
NCBI, BLAST, misannotation, rRNA, proteins
Subjects

Email this document to

This item references:

TitleLocation & LinkType of Relationship
Genomic Database Conundrum: Widespread Misannotation of rRNA Sequences as Protein Sequenceshttp://hdl.handle.net/10342/6562The described resource references, cites, or otherwise points to the related resource.