Efficient Evaluation of Sparse Data Cubes

UNCG Author/Contributor (non-UNCG co-authors, if there are any, appear on document)
Lixin Fu, Associate Professor (Creator)
Institution
The University of North Carolina at Greensboro (UNCG )
Web Site: http://library.uncg.edu/

Abstract: Computing data cubes requires the aggregation of measures over arbitrary combinations of dimensions in a data set. Efficient data cube evaluation remains challenging because of the potentially very large sizes of input datasets (e.g., in the data warehousing context), the well-known curse of dimensionality, and the complexity of queries that need to be supported. This paper proposes a new dynamic data structure called SST (Sparse Statistics Trees) and a novel, in-teractive, and fast cube evaluation algorithm called CUPS (Cubing by Pruning SST), which is especially well suitable for computing aggregates in cubes whose data sets are sparse. SST only stores the aggregations of non-empty cube cells instead of the detailed records. Furthermore, it retains in memory the dense cubes (a.k.a. iceberg cubes) whose aggregate values are above a threshold. Sparse cubes are stored on disks. This allows a fast, accurate approximation for queries. If users desire more refined answers, related sparse cubes are aggregated. SST is incrementally maintainable, which makes CUPS suitable for data warehousing and analysis of streaming data. Experiment results demonstrate the excellent performance and good scalability of our approach.

Additional Information

Publication
Advances in Web-Age Information Management: 5th International Conference (WAIM'04), Dalian, China, July 15-17, 2004: Springer-Verlag GmbH, 2004, pp. 336-345
Language: English
Date: 2004
Keywords
Computing data cubes, Data set