NEural models for ontology annotations - NEMO

UNCG Author/Contributor (non-UNCG co-authors, if there are any, appear on document)
Pratik Devkota (Creator)
Institution
The University of North Carolina at Greensboro (UNCG )
Web Site: http://library.uncg.edu/
Advisor
Somya Mohant

Abstract: The rapid progression of technology has allowed a significant increase in the pace of modern, novel scientific experimentations. Important results from these experiments are often buried in rather comprehensive documents and thus information retrieval is difficult. To facilitate retrieval and knowledge discovery, domain experts have been using ontologies (a formal way to represent knowledge within a domain) to annotate important entities. These annotations are generally curated manually which is a slow and laborious process and hence unscalable. As a solution for scalable ontology annotations, Named Entity Recognition (NER) is critical. NER is the task of recognizing ontology concepts from the text. Traditionally, entity recognition was achieved using syntactic analysis, lexical approaches, and traditional machine learning. In recent years, deep learning has shown improved results in terms of concept recognition. This research explores different approaches to improve the state-of-the-art deep learning models for automated ontology annotations. Here, CRAFT (a manually curated biomedical corpus for ontologies) is used as a gold standard corpus for training and evaluating the performance of different deep learning architectures. We augment the information from CRAFT with several existing knowledge bases. This study demonstrates that we can improve the prediction accuracy of existing deep learning models by including additional information as input pipelines to existing architectures. Additionally, ontologies are hierarchical and have semantic relations between concepts. While deep learning models generally fail to take this hierarchy into account, our work also explores the possibility of making the models ontology-aware and shows improvement over baseline models. Furthermore, we implement a novel concept called Ontology Boosting to boost the prediction accuracy of pre-trained models through post-processing steps.

Additional Information

Publication
Thesis
Language: English
Date: 2022
Keywords
Automated Ontology Curation, Biontology Annotation, Deep Learning, Named Entity Recognition, Natural Language Processing
Subjects
Ontologies (Information retrieval)
Natural language processing (Computer science)
Deep learning (Machine learning)

Email this document to