Investigating Heart Disease Datasets and Building Predictive Models

ECSU Author/Contributor (non-ECSU co-authors, if there are any, appear on document): Brandon Simmons , student (Creator); Julian A. D. Allagan , Associate Professor (Contributor)
Institution: Elizabeth City State University (ECSU ); Web Site: https://www.ecsu.edu/academics/library/index.html

Abstract: We investigate several heart disease datasets commonly found on popular datasites such as Kaggle, Dataport, and the UCI machine learning repository. We discoveredmany issues in our attempts to authenticate these medical datasets as they relateto human errors (encoding) and sometimes negligence (duplicates); these underlyingissues have undoubtedly weakened many inferences or predictive models built onsome of the datasets that are already published. We addressed these issues throughfeatures analysis. Further, using Random forest and logistic regressions, we determinethe best dataset for machine learning and statistical analysis: the Cleveland data ona reduced set of six features. Three of which are statistically significant at explainingor classifying patients as ’Heart Disease’. They are thalach (maximmum heart rate),oldpeak and cp (chest pain).

Investigating Heart Disease Datasets and Building Predictive Models
PDF (Portable Document Format)
2394 KB
Created on 9/8/2021
Views: 12842

Additional Information

Publication: Dissertation; Language: English; Date: 2021
Keywords: heart disease, medical datasets, machine learning, Kaggle, Dataport

Email this document to

Browse All

Theses & Dissertations

Submissions

Investigating Heart Disease Datasets and Building Predictive Models

Additional Information