Modeling Denormalization - The Speed You Need, the Order You Crave
Location: Salon 4 Audience level: Intermediate
Most developers have been trained to normalize data wherever possible. When business requirements start walking all over your nicely-normalized data, your queries can grow complex and inefficient.
Denormalization of data can ease the pressure when your queries get out of hand, but it shouldn’t be handled as an after-thought. Creating first-class representations of your denormalized data makes it easy to keep data in sync and developers on the same page.
As a dataset grows it becomes more and more costly to retrieve records from it, and as your application requirements change, new queries are introduced against existing datasets. Applying more indices to the dataset to support these new queries decreases both the clarity (why is this index here again?) and scalability (additional indices increase write times) of your application.
Instead of shoe-horning denormalization functionality into your core models, break the behavior out into models that express answers to the kinds of questions you want to ask.
Data Denormalization models built on top of ActiveRecord provide a number of benefits:
- AR callbacks make keeping your denorm models up-to-date a cinch
- AR associations and association extensions provide a natural and expressive grammar for working with denormalized data
Key Speaking Points
- be explicit about what objects are responsible for keeping data in-sync
- use commutative updates
- rolling-window coalescing can be achieved with data-overlap, further denormalization
Coalescing data discretely is simpler and more efficient than rolling-windows. Stick to it if your business requirements
Leveraging the DB
Commutative updates that change existing row values in-place are valuable as they eliminate race conditions for row-specific changes. Using broad-reaching queries that update multiple rows simultaneously are helpful as well as they can update several denormalized models of different granularities residing in the same table.
Use the features of your database. Blobs can be used to store marshaled objects. Using ruby’s built-in marshaling instead of Rails’ YAML is faster (benchmarks provided!) and maps easily to database blobs.
Other Benefits of Denormalization
Automatic snapshotting – Data that is denormalized and then not updated along with the canonical version provides a history.