As a relational database professional I couldn’t help but feel like something would be lost with the emergence of the new Big Data/NoSQL database management systems (DBMS). After about two years of buzz around the topic, I’m really excited about the emerging possibilities. However, I’m pretty sure we’ll miss the relational model’s strengths in requirements definition and conceptual design.
Big Data is in a great place right now. The initial fuss has died down (but the NoSQL Wikipedia article isn’t quite done yet) and a cool thing about the emerging big database models is that there are so many. The explosion in data quantity has driven performance requirements beyond the capabilities of the current generation of relational database management systems. Businesses need to analyze clickstreams, mine customer reviews, monitor Twitter traffic, store and manage video, and search unstructured documents. A variety of Big Data database management systems (DBMS) have emerged that have made some question the future usefulness of the relational model.
A strange thing happened over the years that relational databases dominated the IT planet. Business-oriented developers began to realize the relational model was more than just a way to structure data. The structure of data itself is relational and correctly (relationally) structured data reflects and encapsulates business rules. As a result, defining the relational database for an application became a cornerstone of the requirements process. Business stakeholders often participated in conceptual database design, and the physical design that followed made every effort to preserve the business rules while improving the design for efficiency.
That kind of joint business/developer modeling activity doesn’t seem to exist yet in the NoSQL world. Big data professional Ilya Katsov puts it this way:
“NoSQL data modeling often starts from the application-specific queries as opposed to relational modeling:
- Relational modeling is typically driven by the structure of available data. The main design theme is ‘What answers do I have?’
- NoSQL data modeling is typically driven by application-specific access patterns, i.e. the types of queries to be supported. The main design theme is ‘What questions do I have?'” (his emphasis)
So rather than following the inherent structure of the data, which reflects business rules and is accessible to the detail-oriented business audience, current big data solutions hardwire the database to perform well for the problem at hand. In that sense, Big Data doesn’t seem post-relational at all but rather a return to the pre-relational days when lightweight DBMSs like Model 204 and Total enabled efficient applications on the hardware of the time.
But that comparison really isn’t fair. The difference between then and now is not only scale but the different types of data involved. The term “Big Data” seems inadequate. “Big Information” might be better. NoSQL DBMSs process both structured data and unstructured text, emails, images, videos, and more. Even if unstructured information could be described in an Entity-Relationship Diagram with entities, attributes, and relationships, the complexity of the model and sheer number of meta-objects involved would make the model too complex for human consumption.
To me, the biggest hurdle we face with Big Data isn’t that relational DBMSs can’t handle the volume (which is true but a problem that conceivably could be solved). The challenge is that the relational model is inadequate to describe unstructured information. As a result, teams looking to deliver business results with large collections of unstructured objects need to hardwire for the solution.
The requirements process for these new applications will lack the IT/business linkage that the relational model provides. The onus will be on analysts and developers to understand the business problem and solution design well enough to ensure shared understanding without the help of an underlying structure like the relational model. Maybe we can look forward to an information “theory of everything” that will provide that framework for Big Data, but until then it is time to sharpen up our pre-relational conceptual design skills again.