Investigating Heart Disease Datasets and Building Predictive Models
- ECSU Author/Contributor (non-ECSU co-authors, if there are any, appear on document)
- Brandon Simmons , student (Creator)
- Julian A. D. Allagan , Associate Professor (Contributor)
- Institution
- Elizabeth City State University (ECSU )
- Web Site: https://www.ecsu.edu/academics/library/index.html
Abstract: We investigate several heart disease datasets commonly found on popular datasites such as Kaggle, Dataport, and the UCI machine learning repository. We discoveredmany issues in our attempts to authenticate these medical datasets as they relateto human errors (encoding) and sometimes negligence (duplicates); these underlyingissues have undoubtedly weakened many inferences or predictive models built onsome of the datasets that are already published. We addressed these issues throughfeatures analysis. Further, using Random forest and logistic regressions, we determinethe best dataset for machine learning and statistical analysis: the Cleveland data ona reduced set of six features. Three of which are statistically significant at explainingor classifying patients as ’Heart Disease’. They are thalach (maximmum heart rate),oldpeak and cp (chest pain).
Investigating Heart Disease Datasets and Building Predictive Models
PDF (Portable Document Format)
2394 KB
Created on 9/8/2021
Views: 11272
Additional Information
- Publication
- Dissertation
- Language: English
- Date: 2021
- Keywords
- heart disease, medical datasets, machine learning, Kaggle, Dataport