Database Decay and What To Do About It

Tuesday, October 25, 2016 - 7:00pm
Auditorium on 1st floor
Michael Stonebraker
Lecturer Photo

The traditional wisdom for designing database schemas is to use a design tool (typically based on a UML or ER model) to construct an initial data model for one's data and its instantiation as a collection of relational tables. Then applications are coded against this relational schema. When business circumstances change (as they do frequently) one should run the tool to produce a new data model and a new collection of tables. The new schema is populated from the old schema, and the applications are altered to work on the new schema, using relational views whenever possible to ease the migration. In this way, the database remains in 3rd normal form, which represents a "good" schema, as defined by DBMS researchers.

In a survey of 20 DBAs at three large companies in the Boston area, we found that this traditional wisdom is rarely-to-never followed for large, multi-department applications. Instead DBAs try very hard not to change the schema when business conditions change, preferring to "make things work" without schema changes. If they must change the schema, they work directly from the relational tables in place. Using these tactics, the ER or UML model (if it ever existed) diverges quickly from reality. Moreover, over time, the actual semantics of the data tend to drift farther and farther from a 3rd normal form data model.

We term this divergence of reality from 3rd normal form principles database decay. T