Database Decay and What To Do About It

When: 
Tuesday, October 25, 2016 - 7:00pm
Room: 
Auditorium on 1st floor
Lecturer(s): 
Michael Stonebraker
Lecturer Photo

The traditional wisdom for designing database schemas is to use a design tool (typically based on a UML or ER model) to construct an initial data model for one's data and its instantiation as a collection of relational tables. Then applications are coded against this relational schema. When business circumstances change (as they do frequently) one should run the tool to produce a new data model and a new collection of tables. The new schema is populated from the old schema, and the applications are altered to work on the new schema, using relational views whenever possible to ease the migration. In this way, the database remains in 3rd normal form, which represents a "good" schema, as defined by DBMS researchers.

In a survey of 20 DBAs at three large companies in the Boston area, we found that this traditional wisdom is rarely-to-never followed for large, multi-department applications. Instead DBAs try very hard not to change the schema when business conditions change, preferring to "make things work" without schema changes. If they must change the schema, they work directly from the relational tables in place. Using these tactics, the ER or UML model (if it ever existed) diverges quickly from reality. Moreover, over time, the actual semantics of the data tend to drift farther and farther from a 3rd normal form data model.

We term this divergence of reality from 3rd normal form principles database decay. This talk explains why database decay occurs in large applications and presents a collection of ideas on how to fight it. These include defensive schemas, defensive applications, and a non-traditional model for application development.

Dr. Stonebraker has been a pioneer of data base research and technology for more than forty years. He was the main architect of the INGRES relational DBMS, and the object-relational DBMS, POSTGRES. These prototypes were developed at the University of California at Berkeley where Stonebraker was a Professor of Computer Science for twenty five years. More recently at M.I.T. he was a co-architect of the Aurora/Borealis stream proce