In data management and analytics, we often focus on correcting apparent inability and unwillingness on the part of business leaders to effectively gather and capitalize on data resources. With that perspective, we often see ethics as a side issue difficult to prioritize given the scale and persistence of our other challenges.
At least that was my perspective, and my initial response when confronted recently by a family member on this topic. Her view from outside the field was that ethics should be a primary concern. As I’ve reflected on this conversation, I’ve come around to her point.
In recent years we’ve seen many examples of data misuse due to ethical lapses. Here’s a post that gives five examples, including police officers looking up data on individuals not related to any police business, an employee passing personal data including SSNs to a text sharing site, and Uber’s “god view”, available at the corporate level, which an employee used in 2014 to track a journalist’s location. Continue reading →
What is Data Quality anyway? If you are a data professional, I’m sure someone from outside our field has asked you that question, and if you’re like me you’ve fallen into the trap of answering in data-speak.
To my listener, I’d guess that the experience was similar to having a customer service rep who has just turned down his simple request justify it by describing byzantine company policies.
There’s a ton of great writing available on data quality, and I in no way mean to disparage it or its value in the field. But in that writing I’ve yet to find a concise and compelling definition that’s useful to non-data professionals. I’ll review one or two prevailing definitions and then offer one that could help us unlock real data quality improvements. Continue reading →
Modern data architectures, by enabling data analytics insights, promise to drive order of magnitude value gains across many business sectors (here, here, and here). Not so long ago, big data presented a daunting challenge. Although tools were plentiful, we struggled to conceptualize the architecture and organization within which to capitalize on those tools. Now solid frameworks have emerged. This post reviews two promising models for modern data architecture, and discusses two key cultural values critical to their successful adoption: drive to solve business challenges and drive for universal data correctness. Continue reading →
Tableau desktop (10.2.2 on Windows 7 at work) was consistently locking up my computer or causing a BSOD when I tried to start it. After struggling for a while trying to solve the problem, I found out it was because it used all resources when opening the log file, which had over time grown to 24gig. Apparently my version of Tableau desktop doesn’t periodically clean up the log files.
However, if the …/Logs folder isn’t there at Tableau startup, it just builds a new one and starts fresh, so whenever Tableau isn’t running you can just delete it. So, to make that happen automatically, I’ve added a batch file with these commands to my startup folder: Continue reading →
A year ago I recounted proceedings from the 2017 EDW World conference, which included release of the Leader’s Data Manifesto (LDM). Last week’s EDW World 2018 served as a one-year status report on the Manifesto. The verdict: there’s still a long way to go, but speakers and attendees report dramatic progress and emergence of shared values supporting data management’s role in enabling success and reducing risk.
To me the most compelling example of progress was the story of the Lopez women, told by Tommie Lawrence, who leads patient data quality efforts at Sharp Healthcare, a major San Diego, Ca, healthcare network. Ms. Lawrence’s team is responsible for data quality related to about six million patient records in the 40 highest priority of Sharp’s ~400 systems containing Patient Health Information (PHI).
A few years ago, Sharp Healthcare had two patients named Maria Lopez*, with birthdays one day apart. One suffered from kidney disease, the other had cancer. After a long wait a kidney was found, and the hospital called the Maria with kidney disease and asked her to come to the hospital for a transplant immediately. During operation prep, an assistant noticed that Maria had cancer, and put a halt to proceedings – it didn’t make sense to give the kidney to someone with cancer. Continue reading →
In response to findings that many “safety-related events were caused by or related to incorrect patient identification”, the Office of the National Coordinator for Health Information Technology (ONC) worked with CMMI to develop the PDDQ Framework in order help organizations implement effective, sustainable data management practices around patient data management.
Groundbreaking? Yes. As a lean framework appropriate for small business the PDDQ shows how you can rightsize the Data Management Maturity Model to match your situation. That it is freely available demonstrates CMMI’s commitment to improving data quality in healthcare. Continue reading →
If you are a SQL developer or data analyst working with Teradata, it is likely you’ve gotten this error message: “Select Failed.  No more spool space”. Roughly speaking, Teradata “spool” is the space DBAs assign to each user account as work space for queries. So, for example, if your query needs to build an intermediate table behind the scenes to sort or otherwise process before it hands over your result set, that happens in spool space. It is limited, in part, to keep your potentially runaway query from using up too much space and clogging up the system.
After briefly setting the stage, this post presents the top three tactics I use to avoid or overcome spool space errors. For the second two tactics I’ll show working code. At the end of the post you’ll find volatile DDL that you can use to get the queries to run. Continue reading →
Even now the business case for a metadata tool seems unclear and difficult to quantify, but it isn’t impossible.
We in the data management business tend to devalue solutions that don’t clearly derive from a coherent top-level view. We seek applications defined from an enterprise architecture, database designs from an enterprise data model, and data elements consistent with the enterprise business glossary.
However, sometimes tactical gains make sense even when the big picture is missing, and tactical successes of metadata for analytics teams can raise consciousness that helps set the stage for evolving data management improvements. Continue reading →
I recently found myself in a series of conversations in which I needed to make a case for dimensional data modeling. The discussions involved a group of highly skilled data architects who were surely familiar with dimensional techniques but didn’t see them as the best solution in the case at hand.
I thought it would be easy to find a quick, jargon free summary of best reporting database design principles aimed at a technical audience. There were a number of good summaries (cited at the end of this post), but none pitched just right for this highly-technical-but-outside-the-data-warehouse-world crowd.
I wanted to raise the dimensional model because, for most business reporting scenarios, it not only delivers on reporting needs, but also helps report developers handle changes to those needs as a side effect of the design.
So these are the notes I prepared for the conversation. They helped us all get on the same page, hopefully they will be useful to others: Continue reading →
Data quality doesn’t have to be a train wreck. Increased regulatory scrutiny, NoSQL performance gains, and the needs of data scientists are quietly changing views and approaches toward data quality. The result: a pathway to optimism and data quality improvement.
Here’s how you can get on the new and improved data quality train in each of those three areas: Continue reading →