Genomic Database Conundrum: Widespread Misannotation of rRNA Sequences as Protein Sequences
- ECU Author/Contributor (non-ECU co-authors, if there are any, appear on document)
- Miranda Raymond (Creator)
- Institution
- East Carolina University (ECU )
- Web Site: http://www.ecu.edu/lib/
Abstract: The genomics revolution introduced affordable technology capable of rapidly analyzing and comparing massive amounts of biological sequence data. Using the Basic Local Alignment Search Tool (BLAST) program on the National Center for Biotechnology Information (NCBI) website , a highly expressed gene sequence obtained from the plant Leptosiphon jepsonii was analyzed. This sequence was compared against other sequences archived in the NCBI database for similarities. These comparisons encompassed various phyla of life including other green plants , fungi , metazoans , algae and single-celled organisms. The original sequence query was compared to inferred protein sequences. Then the mRNA sequences corresponding to these proteins were analyzed against complete nucleotide accessions through reciprocal BLAST searches to ensure accuracy of results. The most similar sequences from these reciprocal BLAST searches were rRNA rather than mRNA sequences. This result indicates that numerous accessions in NCBI are inappropriately characterized as mRNAs and proteins , rather than ribosomal sequences. To explore the breadth of this misannotation issue , sequences from a wide range of organisms , including model genomes , were also examined. This study indicates that rapid , automated computational analyses of massive amounts of sequence data , combined with a heightened focus on novel findings , has led to a sizable influx of erroneous data within even the most reputable databases.
Additional Information
- Publication
- Thesis
- Language: English
- Date: 2017
- Keywords
- NCBI, BLAST, misannotation, rRNA, proteins
- Subjects
Title | Location & Link | Type of Relationship |
Genomic Database Conundrum: Widespread Misannotation of rRNA Sequences as Protein Sequences | http://hdl.handle.net/10342/6562 | The described resource references, cites, or otherwise points to the related resource. |