BigDAWG Polystore: programmer productivity for complex, heterogeneous big data applications

Thursday, September 22, 2016 - 7:00pm
32-G449 (Kiva)
Tim Mattson, Intel
Lecturer Photo

If every algorithm looked like "map reduce" and all data naturally fit a single data store, solving Big Data problems would be straightforward. The real world, however, is not so simple. Most big data problems require complex analytics over data that is spread out among multiple data stores. Current technology could be force-fit to address these problems, but only by sacrificing programmer productivity.

Research at the Intel Big Data Science and Technology Center (based at MIT with support from 4 other universities) is addressing this problem. Our central idea is a concept we call "polystore". In a polystore system, multiple database systems with potentially different data models are exposed to the programmer through a single framework. Middleware supports location transparency and semantic completeness through a uniform interface. Our reference implementation for this concept is the BigDAWG stack (Big Data Analytics working group). In this talk, we will discuss the motivations and vision for BigDAWG, the current state of its architecture, the progress we have made in implementing it, and highlight the major challenges that lie ahead of us.

An overview of some of the work is contained in

Tim Mattson is a parallel programmer (Ph.D. Chemistry, UCSC, 1985). Tim has been with Intel since 1993 where he has worked with brilliant people on great projects such as: (1) the first TFLOP computer (ASCI Red), (2) the OpenMP API for shared memory programming, (3) the OpenCL programming language for heterogeneous platforms, (4) Intel's first TFLOP chip (the 80 core research chip), and (5) Intel's 48 core, SCC research processor. Currently Tim is working in the Parallel Computing lab. He is (1) the PI for our Big Data