Security problems and challenges in a machine learning-based hybrid big data processing network systems

UNCG Author/Contributor (non-UNCG co-authors, if there are any, appear on document)
Shanmugatha "Shan" Suthaharan, Associate Professor (Creator)
Jeffrey N. Whitworth (Creator)
The University of North Carolina at Greensboro (UNCG )
Web Site:

Abstract: The data source that produces data continuously in high volume and high velocity with large varieties of data types creates Big Data, and causes problems and challenges to Machine Learning (ML) techniques that help extract, analyze and visualize important information. To overcome these problems and challenges, we propose to make use of the hybrid networking model that consists of multiple components such as Hadoop distributed file system (HDFS), cloud storage system, security module and ML unit. Processing of Big Data in this networking environment with ML technique requires user interaction and additional storage hence some artificial delay between the arrivals of data domains through external storage can help HDFSto process the Big Data efficiently. To address this problem we suggest using public cloud for data storage which will induce meaningful time delay to the data while making use of its storage capability. However, the use of public cloud will lead to security vulnerability to the data transmission and storage. Therefore, we need some form of security algorithm that provides a flexible key-based encryption technique that can provide tradeoffs between time-delay, security strength and storage risks. In this paper we propose a model for using public cloud provider trust levels to select encryption types for data storage for use within a Big Data analytics network topology.

Additional Information

ACM SIGMETRICS Performance Evaluation Review, vol. 41, no. 4, pp. 82-85
Language: English
Date: 2014
Big Data, hybrid Cloud, encryption, retrievability, machine learning

Email this document to