An Empirical Exploration of Python Machine Learning API Usage

ECU Author/Contributor (non-ECU co-authors, if there are any, appear on document)
Aleksei Vilkomir (Creator)
Institution
East Carolina University (ECU )
Web Site: http://www.ecu.edu/lib/

Abstract: Machine learning is becoming an increasingly important part of many domains, both inside and outside of computer science. With this has come an increase in developers learning to write machine learning applications in languages like Python, using application programming interfaces (APIs) such as pandas and scikit-learn. However, given the complexity of these APIs, they can be challenging to learn, especially for new programmers. To create better tools for assisting developers with machine learning APIs, we need to understand how these APIs are currently used. In this thesis, we present a study of machine learning API usage in Python code in a corpus of machine learning projects hosted on Kaggle, a machine learning education and competition community site. We analyzed the most frequently used machine learning related libraries and the sub-modules of those libraries. Next, we studied the usage of different calls used by the developers to solve machine learning tasks. We also found information about which libraries are used in combination and discovered a number of cases where the libraries were imported but never used. We end by discussing potential next steps for further research and developments based on our work results.

Additional Information

Publication
Thesis
Language: English
Date: 2020
Keywords
Machine Learning API, Python Machine Learning, Machine Learning exploratory

Email this document to

This item references:

TitleLocation & LinkType of Relationship
An Empirical Exploration of Python Machine Learning API Usagehttp://hdl.handle.net/10342/8796The described resource references, cites, or otherwise points to the related resource.