AN ALIGNMENT-FREE METHOD FOR SEQUENCE IDENTIFICATION USING CHAOS GAME REPRESENTATION

ECSU Author/Contributor (non-ECSU co-authors, if there are any, appear on document)
Matthew D. Hill, student (Creator)
Institution
Elizabeth City State University (ECSU )
Web Site: https://www.ecsu.edu/academics/library/index.html

Abstract: Recent events in the area of public health have to lead to the need for advancements in techniques to better understand viruses. A method of graphically representing biological sequences known as chaos game representation(CGR) was proposed by H.J. Jeffrey in 1990 [1] and has proved useful eventoday in the field of bioinformatics. CGR uses the midpoint distance formula to transform a sequence of characters into a graph that can help distinguish between biological sequences through pattern recognition. Initially,CGR was applied to DNA sequences, but in our case, we apply it to protein sequences. For this report, CGR is used for the identi cation of several hundred protein sequences into their respective viral groups through feature extraction using python programming language. These feature include, CGR centroid, amino acid frequency, compounded frequency, Shannon entropy,and Kullback-Lieber Discrimination Information. In turn better classi cation and identi cation of viruses is achieved.

Additional Information

Publication
Dissertation
Language: English
Date: 2021
Keywords
public health, viruses, CGR, bioinformatics, biological sequences

Email this document to