Recently there was a great post at Dzone recounting how one “tech savvy startup” moved away from its NoSQL database management system to a relational one. The writer, Matt Butcher, plays out the reasons under these main points:
- Our data is relational
- We need better querying
- We have access to better resources
Summing up: “The bottom line: choose the right tool.” A subsequent post quotes this succinct statement from another site’s article on his post:
“We here on El Reg’s database cluster are sure that NoSQL technologies have some great benefits … but many startups risk being spiked by arcane problems that come about as they scale. We recommend they take a hard look at traditional systems and work out if they can have a relationship with them.”
I really liked this post, but particularly for these two points:
- All data is relational, but NoSQL is useful because sometimes it isn’t practical to treat it as such for volume/complexity reasons.
- In the comments, Jonathon Fisher remarked that NoSQL is really old technology, not new. (Of course you have to like any commenter that uses the word “defenestrated”).
All Data is Relational
Forgetting about its implementations, the relational model is a way of understanding data. It provides the simplest complete organization of data in a given domain, in which all relationships derive from inherent functional dependencies.
In terms of efficiency, data in a relational model is optimized for the set of all possible operations. But of course, any given application only needs to do some of them. So we often denormalize a relational database or use a different tool altogether.
Volume and Velocity: Too big and too fast
Two things that drive towards non-relational solutions are very high volume or velocity of data. NoSQL caught on as data for specific applications just got too big or changed too quickly for established relational DBMSs, notably clickstreams; unstructured data like text, images, sound, and video; system logs; scientific data like weather or seismic observations; and so on. Those who attempted such applications in relational DBMSs quickly hit scalability limits.
Variety: Unknown or unpredictable structure
NoSQL solutions also make sense when you can’t know data structure in advance. In a conversation on this article, Ron DiFrango provided two examples:
1) “Audit Data – We created a generic service to store data that is has an unknown set of key/value pairs and while we shoehorned it into a relational model it makes looking at the data and processing it harder.
2) “Reference Data – On our services tier we have this concept of reference data which is mainly used to map from one error to the other, but its also used to hold messages, etc. so the columns are generic and again we shoehorned it into a relational model but the processing side to “de-normalize” it was less than efficient.”
In both of these cases it would have made more sense to apply the NoSQL philosophy of storing data as delivered and interpreting at query time rather than squeezing the unpredictable streams into a sort-of-correct relational model.
Back to the Future?
The comment from Jonathon Fisher makes the point that NoSQL represents a return to pre-relational values when DBMSs sought speed of storage and retrieval but were relatively weak in transaction management and in supporting alternate access paths. Todd Homa recounts one horror story that shows how NoSQL data modelers must be aware of the corners into which they paint themselves as they optimize one access path at the expense of others.
To SQL or to NoSQL?
Database professionals fled from NoSQL-like DBMSs in the 1980s toward support for transactions and alternate access paths. Today we don’t have that option when data volume, complexity, and indeterminate structure drive us back to non-relational solutions.