Category Archives: IT

Anonymize Data for Better Executive Analytics

Reading articles about data anonymization makes it clear that it is not an entirely effective security measure (here and here), but still part of a robust security capability, and required if your organization is affected by GDPR. (I use “anonymization” as a general term encompassing techniques that de-identify personal data within a given data set.)

But there’s a positive side of anonymized data that hasn’t received much press. Providing anonymous data to senior managers who don’t need access to personal data can encourage them to take a broader perspective, and thereby bring new energy to fact-based senior planning and analysis. Continue reading

Toward an Analytics Code of Ethics

In data management and analytics, we often focus on correcting apparent inability and unwillingness on the part of business leaders to effectively gather and capitalize on data resources. With that perspective, we often see ethics as a side issue difficult to prioritize given the scale and persistence of our other challenges.

At least that was my perspective, and my initial response when confronted recently by a family member on this topic. Her view from outside the field was that ethics should be a primary concern. As I’ve reflected on this conversation, I’ve come around to her point.

In recent years we’ve seen many examples of data misuse due to ethical lapses. Here’s a post that gives five examples, including police officers looking up data on individuals not related to any police business, an employee passing personal data including SSNs to a text sharing site, and Uber’s “god view”, available at the corporate level, which an employee used in 2014 to track a journalist’s location. Continue reading

Data Integration Benefits? They’re Obvious.

“At least 84 percent of consumers across all industries say their experiences using digital tools and services fall short of expectations.”* That quote headed a recent article by David Roe on the role of data integration in digital workplace apps. However, the opening quote reflects the pervasive dearth of integrated data among the companies most of us frequent.

We’ve all experienced the effects. Last week I was in a fender bender. Due to a mixup I didn’t have my insurance card with me, so I called the insurance company to get the info. They had no record of me associated with my car. It turned out that my car is insured under my wife’s name, hers under mine. Although I’ve been their customer for 25 years, and was driving my own car, they couldn’t give me insurance info. Sure, they were following good security practices. But I’m not letting them off the hook.  Continue reading

Meaningful Requirements Start Successful Data Projects

To me, development projects fail or succeed in the first few weeks. Once a project starts off in the wrong direction, momentum and expectations tend to prevent a return to the proper path. With today’s wealth of database options each addressing exciting new possibilities, the right choice for the application’s data foundation plays a large part in steering a project to success.

At this year’s Enterprise Data World conference, William Brooks showed the relations among different data modeling approaches, in effect detailing how to derive nine different model types from a detailed conceptual entity relationship model. Mr Brooks’ presentation hinted at a way to correctly frame up your data direction early on in a project, setting the stage for success.

According to his presentation, called “Symmetry in Modeling Approaches“, the different model types — relational, graph, dimensional, JSON, XML, and so on — all represent different perspectives on the same data relationships. Each suits a different application, like dimensional for reporting applications, data vault for data warehouses, graph databases for multi-layered search, and so on. However, if properly constructed they all map back in predictable and specific ways to a normalized entity-relationship model.

I and others write that ER modeling should be integral to requirements definition, but Mr. Brooks’ presentation implies that ER modeling can also serve as the basis for application architecture as well. Continue reading

Start Data Quality Improvements with a New Definition

What is Data Quality anyway? If you are a data professional, I’m sure someone from outside our field has asked you that question, and if you’re like me you’ve fallen into the trap of answering in data-speak.

To my listener, I’d guess that the experience was similar to having a customer service rep who has just turned down his simple request justify it by describing byzantine company policies.

There’s a ton of great writing available on data quality, and I in no way mean to disparage it or its value in the field. But in that writing I’ve yet to find a concise and compelling definition that’s useful to non-data professionals. I’ll review one or two prevailing definitions and then offer one that could help us unlock real data quality improvements. Continue reading

Sound Data Culture Enables Modern Data Architectures

Modern data architectures, by enabling data analytics insights, promise to drive order of magnitude value gains across many business sectors (here, here, and here). Not so long ago, big data presented a daunting challenge. Although tools were plentiful, we struggled to conceptualize the architecture and organization within which to capitalize on those tools. Now solid frameworks have emerged. This post reviews two promising models for modern data architecture, and discusses two key cultural values critical to their successful adoption: drive to solve business challenges and drive for universal data correctness. Continue reading

Fixing Tableau Desktop Blue Screen or Unresponsive

Tableau desktop (10.2.2 on Windows 7 at work) was consistently locking up my computer or causing a BSOD when I tried to start it. After struggling for a while trying to solve the problem, I found out it was because it used all resources when opening the log file, which had over time grown to 24gig. Apparently my version of Tableau desktop doesn’t periodically clean up the log files.

However, if the …/Logs folder isn’t there at Tableau startup, it just builds a new one and starts fresh, so whenever Tableau isn’t running you can just delete it. So, to make that happen automatically, I’ve added a batch file with these commands to my startup folder: Continue reading

The PDDQ Framework: Lean Data Quality for Patient Records

For most of us it may have slipped under the radar, but in December a groundbreaking Patient Demographic Data Quality framework was jointly released by a US government agency and the CMMI Institute.

In response to findings that many “safety-related events were caused by or related to incorrect patient identification”, the Office of the National Coordinator for Health Information Technology (ONC) worked with CMMI to develop the PDDQ Framework in order help organizations implement effective, sustainable data management practices around patient data management. 

Groundbreaking? Yes. As a lean framework appropriate for small business the PDDQ shows how you can rightsize the Data Management Maturity Model to match your situation. That it is freely available demonstrates CMMI’s commitment to improving data quality in healthcare. Continue reading

Values and Behaviors of the Successful Agilist

Of course, any discussion of Agile values starts with the Agile Manifesto. The first sentence declares that Agile development is about seeking better ways and helping others. Then, as if espousing self-evident truths, the founders present four relative value statements. Finally, they emphasize appropriate balance, saying that the relatively less valued items aren’t worthless: implying that they are to be maintained inasmuch as they support the relatively more valued items.

While there is value in the four relative value statements, I believe most successful Agilists value the first and last statements more. So to me, the core Agile values are continuous improvement, helping others, and balance.

There’s a lot written about Agile behaviors, but as I read most is geared toward scrummasters or managers, and most is about transitioning from the waterfall world. Starting from the premise that Agile methods are established, focusing on participants rather than managers, and based on the assumption that behaviors are grounded in values, this post details the values and behaviors I’ve observed of those who succeed as Agile team members.

Continue reading

Escaping Teradata Purgatory (Select Failed. [2646] No more spool space)

If you are a SQL developer or data analyst working with Teradata, it is likely you’ve gotten this error message: “Select Failed. [2646] No more spool space”. Roughly speaking, Teradata “spool” is the space DBAs assign to each user account as work space for queries. So, for example, if your query needs to build an intermediate table behind the scenes to sort or otherwise process before it hands over your result set, that happens in spool space. It is limited, in part, to keep your potentially runaway query from using up too much space and clogging up the system.

After briefly setting the stage, this post presents the top three tactics I use to avoid or overcome spool space errors. For the second two tactics I’ll show working code. At the end of the post you’ll find volatile DDL that you can use to get the queries to run. Continue reading