Fast Data Analysis for Scientific Big Data Applications

Friday, 22 November 2013 - 11:00am
SL 109

Dr. Yong Chen, Texas Tech UniversityMany scientific computing and high-performance computing applications have become increasingly data intensive. Recent studies have started to utilize indexing, sub - setting, and data reorganization to manage the increasingly large datasets. In this talk, I will present our recent study of Fast Analysis with Statistical Metadata (FASM) intending to boost the data analytics performance via data sub - setting and integrating a small amount of statistics into the original datasets. The added statistical in formation illustrates the data shape and provides knowledge of the data distribution; therefore the original scientific libraries can utilize these statistical metadata to perform fast queries and analyses. We will also introduce segmented analysis, pre - an alysis, and decoupled execution paradigm concepts and ideas for reducing data movements and to speed up data analysis for scientific big data applications. These concepts and ideas can potentially lead to new data analytics methodologies and can have an im pact on scientific discovery productivity

Download flyer.

Yong Chen is an Assistant Professor and Director of the Data - Intensive Scalable Computing Laboratory in the Computer Science Department of Texas Tech University. His research interest s include data - intensive /big data computing, parallel and distributed comput ing, and high - performance computing . His research group has been funded by NSF, DOE/ANL, ORAU, Dell, and NVidia. Prior to joining Texas Tech, he was a postdoc researcher in the Fut ure Technologies Group of Oak Ridge National Laboratory. He has received the Ralph E. Powe Junior Faculty Enhancement Award, the Best Paper Award of the 9th IEEE International Symposium on Parallel and Distributed Processing with Applications, Best Paper f inalist and Best Student Paper finalist at the ACM/IEEE Supercomputing Conference 2008.