SciDB: Big Analytics on Big Data

When: 
Wednesday, November 17, 2010 - 7:00pm
Room: 
Auditorium
Lecturer(s): 
Michael Stonebraker
Michael Stonebraker

Serious analytic applications (Big Analytics) entail clustering and transforming data as well as non-trivial computations such as finding eigenvalues, and curve fitting. Traditionally, such applications have been found in various natural science fields, but increasingly they are required by large web properties for personalization of ranking functions, optimization of advertising placement and budgets, etc. Big Analytics should be distinguished from traditional business intelligence (small analytics) which is well served by standard SQL aggregates and grouping functions.

When performing big analytics on small (main memory) data, a customer is well served by standard statistical packages such as R or MatLab. The focus of this talk is on big (disk-based) data, and we present a new approach in this application area, SciDB, which adopts an array data model and a query language with DBMS as well as analytical primitives. In addition, SciDB supports uncertain data and version control and allows arrays to be “chunked” across multiple nodes in a cluster, perhaps with overlap among the chunks. Early benchmarking results show SciDB to be 10-100 times faster than an RDBMS on big analytics – big data applications.

We sketch the design of SciDB, discuss its implementation status, open source business model, and contrast our approach with other options, including RDBMS, Hadoop and stat packages.

Mike Stonebraker is Adjunct Professor of Computer Science at M.I.T. and CTO of 3 companies that are commercializing new database related technologies that he initiated. He is widely recognized as one of the world's foremost experts in database technology and is noted for his insight in operating systems and expert systems. Mike received a Bachelor of Science degree from Princeton University and Master of Science and Doctor of Philosophy degrees from the University of Michigan. He has held visiting professorships at the Pontifico Universitade Catholique (PUC), Rio de Janeiro, Brazil; the University of California, Santa Cruz; and the University of Grenoble, France. Dr. Stonebraker was the main architect of the INGRES relational DBMS and the object-relational DBMS, POSTGRES. These prototypes were developed at the University of California at Berkeley, where Stonebraker was a Professor of Computer Science for twenty five years. More recently at M.I.T. he was a co-architect of the Aurora/Borealis stream processing engine, the C-Store column-oriented DBMS, the H-Store transaction engine, the Morpheus search engine and the SciDB complex analytics engine, all of which have been commercialized. Presently he serves as Chief Technology Officer of Zetics (commercial SciDB), VoltDB (commercial H-Store), and Goby (commercial Morpheus). Professor Stonebraker is the author of scores of research papers on data base technology, operating systems and the architecture of system software services. He received the ACM System Software Award in 1992 for his work on INGRES. Additionally, he received the first annual Innovation award from the ACM SIGMOD special interest group in 1994 and was elected to the National Academy of Engineering in 1997. He received the IEEE John Von Neumann award in 2005 and is presently an Adjunct Professor of Computer Science at M.I.T., where he is working on a variety of future-generation data-oriented projects. See http://en.wikipedia.org/wiki/Michael_Stonebraker and http://www.csail.mit.edu/user/1547 for more details.