Disambiguating human spoken diary entries using context information

UNCW Author/Contributor (non-UNCW co-authors, if there are any, appear on document)
Daniel Rayburn-Reeves (Creator)
Institution
The University of North Carolina Wilmington (UNCW )
Web Site: http://library.uncw.edu/
Advisor
Curry Guinn

Abstract: The EPA has commissioned studies to gather fine-grained time / activity / location / exposure data from a diverse cross-section of the population. The information is recorded into digital voice diaries and transcribed by a human for classification into a standard representational system, the Consolidated Human Activity Database. Analysis of the diary entries is a long and tedious process for a human encoder. Automating the process and providing useful information can greatly assist a human encoder in correctly classifying the diary entries. This paper will discuss utilizing Natural Language Processing (NLP) techniques to analyze spoken diary entries and classify the locations and activities into semantic categories. There will be three main foci that form the hypotheses of the study: improving diary classification accuracy using context information, using thresholds to balance precision and recall tradeoffs, and utilizing the CHAD database structure to improve accuracy by generalizing the semantic ontologies. The word and context based system shows the relevance of using context information to improve CHAD code classification by using the surrounding diary entry context to augment the word analysis of the diary entries. The threshold-based system shows relative difference levels between top scoring CHAD codes can be utilized to balance tradeoffs between precision and recall. The semantic ontology system shows that generalizing semantic ontologies by employing the CHAD database structure can improve classification accuracy by reducing granularity.

Additional Information

Publication
Thesis
A Thesis Submitted to the University of North Carolina Wilmington in Partial Fulfillment of the Requirements for the Degree of Master of Science
Language: English
Date: 2009
Keywords
Computer programs , Natural language processing (Computer science)
Subjects
Natural language processing (Computer science)
Computer programs