Feature extraction and feature reduction for spoken letter recognition

UNCG Author/Contributor (non-UNCG co-authors, if there are any, appear on document)
Tyler James Wendell (Creator)
Institution
The University of North Carolina at Greensboro (UNCG )
Web Site: http://library.uncg.edu/
Advisor
Shanmugathasan Suthaharan

Abstract: The complexity of finding the relevant features for the classification of spoken letters is due to the phonetic similarities between letters and their high dimensionality. Spoken letter classification in machine learning literature has often led to very convoluted algorithms to achieve successful classification. The success in this work can be found in the high classification rate as well as the relatively small amount of computation required between signal retrieval to feature selection. The relevant features spring from an analysis of the sequential properties between the vectors produced from a Fourier transform. The study mainly focuses on the classification of fricative letters f and s, m and n, and the eset (b,c,d,e,g,p,t,v,z) which are highly indistinguishable, especially when transmitted over the modern VoIP digital devices. Another feature of this research is the dataset produced did not include signal processing that reduces noise which is shown to produce equivalent and sometimes better results. All pops and static noises that appear were kept as part of the sound files. This is in contrast to other research that recorded their dataset with high grade equipment and noise reduction algorithms. To classify the audio files, the machine learning algorithm that was used is called the random forest algorithm. This algorithm was successful because the features produced were largely separable in relatively few dimensions. Classification accuracies were in the 92\%-97\% depending on the dataset.

Additional Information

Publication
Thesis
Language: English
Date: 2016
Keywords
Audio Analysis, Feature Extraction, Feature Reduction, Spoken Word Recognition
Subjects
Automatic speech recognition $x Data processing
Speech processing systems
Human-computer interaction

Email this document to