AN ALIGNMENT-FREE METHOD FOR SEQUENCE IDENTIFICATION USING CHAOS GAME REPRESENTATION
- ECSU Author/Contributor (non-ECSU co-authors, if there are any, appear on document)
- Matthew D. Hill, student (Creator)
- Institution
- Elizabeth City State University (ECSU )
- Web Site: https://www.ecsu.edu/academics/library/index.html
Abstract: Recent events in the area of public health have to lead to the need for advancements in techniques to better understand viruses. A method of graphically representing biological sequences known as chaos game representation(CGR) was proposed by H.J. Jeffrey in 1990 [1] and has proved useful eventoday in the field of bioinformatics. CGR uses the midpoint distance formula to transform a sequence of characters into a graph that can help distinguish between biological sequences through pattern recognition. Initially,CGR was applied to DNA sequences, but in our case, we apply it to protein sequences. For this report, CGR is used for the identication of several hundred protein sequences into their respective viral groups through feature extraction using python programming language. These feature include, CGR centroid, amino acid frequency, compounded frequency, Shannon entropy,and Kullback-Lieber Discrimination Information. In turn better classication and identication of viruses is achieved.
AN ALIGNMENT-FREE METHOD FOR SEQUENCE IDENTIFICATION USING CHAOS GAME REPRESENTATION
PDF (Portable Document Format)
21059 KB
Created on 8/8/2021
Views: 537
Additional Information
- Publication
- Dissertation
- Language: English
- Date: 2021
- Keywords
- public health, viruses, CGR, bioinformatics, biological sequences