Tag Archives: Data Science

Leader’s Data Manifesto at #EDW19: Building a Foundation for Data Science

It’s been a truism that data is a resource, but to prove it you just have to follow the money. As the illustration shows, the vast majority of corporate market value draws from intangible assets. Just as money is an abstraction that represents wealth, data is an abstraction that represents these intangible assets.

It’s year three after initial rollout of the Leader’s Data Manifesto (LDM). Since then, many widely publicized events have highlighted the value of data and metadata, and the importance of sound data management (here, here, and here). Recently at Enterprise Data World, John Ladley, Danette McGilvray, James Price, and Tom Redman presented this year’s LDM update. They reintroduced the Manifesto, recounted events of the past year, discussed strategy for the coming year, and issued a call to action for data professionals. Continue reading

Enterprise Data Prep for Analytics: Two Principles

Data scientists spend most of their time doing data integration rather than gathering insights. In my interview with data scientist Yan Li, she said that data collection and prep takes at least 70% of her time. Obviously, there’s a lot of integration work to do on data that’s new to analytics efforts, but not every analysis uses brand new data. Organizations can improve analytics efficiency by staging commonly used data pre-integrated for data science.

For years, large organizations have supported data warehouses, but prevailing data warehousing practices often fail when faced with “big data” volume and velocity. Still, warehousing teams in large organizations can pre-prep frequently-used internal data. Examples include reference and master data, production and sales records, and so on.    Continue reading

Toward an Analytics Code of Ethics

In data management and analytics, we often focus on correcting apparent inability and unwillingness on the part of business leaders to effectively gather and capitalize on data resources. With that perspective, we often see ethics as a side issue difficult to prioritize given the scale and persistence of our other challenges.

At least that was my perspective, and my initial response when confronted recently by a family member on this topic. Her view from outside the field was that ethics should be a primary concern. As I’ve reflected on this conversation, I’ve come around to her point.

In recent years we’ve seen many examples of data misuse due to ethical lapses. Here’s a post that gives five examples, including police officers looking up data on individuals not related to any police business, an employee passing personal data including SSNs to a text sharing site, and Uber’s “god view”, available at the corporate level, which an employee used in 2014 to track a journalist’s location. Continue reading

Fixing Tableau Desktop Blue Screen or Unresponsive

Tableau desktop (10.2.2 on Windows 7 at work) was consistently locking up my computer or causing a BSOD when I tried to start it. After struggling for a while trying to solve the problem, I found out it was because it used all resources when opening the log file, which had over time grown to 24gig. Apparently my version of Tableau desktop doesn’t periodically clean up the log files.

However, if the …/Logs folder isn’t there at Tableau startup, it just builds a new one and starts fresh, so whenever Tableau isn’t running you can just delete it. So, to make that happen automatically, I’ve added a batch file with these commands to my startup folder: Continue reading

Leader’s Data Manifesto Annual Review: “It’s About the Lopez Women”

A year ago I recounted proceedings from the 2017 EDW World conference, which included release of the Leader’s Data Manifesto (LDM). Last week’s EDW World 2018 served as a one-year status report on the Manifesto. The verdict: there’s still a long way to go, but speakers and attendees report dramatic progress and emergence of shared values supporting data management’s role in enabling success and reducing risk.

To me the most compelling example of progress was the story of the Lopez women, told by Tommie Lawrence, who leads patient data quality efforts at Sharp Healthcare, a major San Diego, Ca, healthcare network. Ms. Lawrence’s team is responsible for data quality related to about six million patient records in the 40 highest priority of Sharp’s ~400 systems containing Patient Health Information (PHI).

A few years ago, Sharp Healthcare had two patients named Maria Lopez*, with birthdays one day apart. One suffered from kidney disease, the other had cancer. After a long wait a kidney was found, and the hospital called the Maria with kidney disease and asked her to come to the hospital for a transplant immediately. During operation prep, an assistant noticed that Maria had cancer, and put a halt to proceedings – it didn’t make sense to give the kidney to someone with cancer. Continue reading

Escaping Teradata Purgatory (Select Failed. [2646] No more spool space)

If you are a SQL developer or data analyst working with Teradata, it is likely you’ve gotten this error message: “Select Failed. [2646] No more spool space”. Roughly speaking, Teradata “spool” is the space DBAs assign to each user account as work space for queries. So, for example, if your query needs to build an intermediate table behind the scenes to sort or otherwise process before it hands over your result set, that happens in spool space. It is limited, in part, to keep your potentially runaway query from using up too much space and clogging up the system.

After briefly setting the stage, this post presents the top three tactics I use to avoid or overcome spool space errors. For the second two tactics I’ll show working code. At the end of the post you’ll find volatile DDL that you can use to get the queries to run. Continue reading

Data Quality, Evolved

Data quality doesn’t have to be a train wreck. Increased regulatory scrutiny, NoSQL performance gains, and the needs of data scientists are quietly changing views and approaches toward data quality. The result: a pathway to optimism and data quality improvement.

Here’s how you can get on the new and improved data quality train in each of those three areas: Continue reading

Tableau Rollout Across Five Dimensions

Standing up any new analytics tool in an organization is complex, and early on, new adopters of Tableau often struggle to include all the complexities in their plan. This post proposes a mental model in the form of a story of how Tableau might have rolled out in one hypothetical installation to uncover common challenges for new adopters.

Tableau’s marketing lends one to imagine that introducing Tableau is easy: “Fast Analytics”, “Ease of Use”, “Big Data, Any Data” and so on. (here, 3/31/2017). Tableau’s position in Gartner’s Magic Quadrant (referenced on the same page) attests to the huge upside for organizations that successfully deploy Tableau, which I’ve been lucky enough to witness firsthand. Continue reading

Tableau Startup: First Lessons Learned

XAs I mentioned in the February post, I’m new to Tableau, and as the tone of that post implied,enjoying it very much. Tableau is a robust and flexible solution for data delivery. Like Qlikview, which I worked with a while ago, it is supported by outstanding, and free, introductory training and a very active user community.

As I’ve made my first steps in Tableau I’ve been a frequent user community visitor, and generally have gotten the answers I’ve been looking for. However, like any tool there still have been a few surprises. I’ll run down the top few in this post:

  • Measures can have complex logic
  • Big extracts are tricky
  • Changing data sources is really tricky
  • Sorry, there are some things you just can’t do

Hopefully this post helps other novices negotiate those first few steps a bit more easily. Continue reading

No Silver BI Bullet: Tableau Edition (It’s a good thing!)

For complex work, a very simple app requires a very smart user. That point was driven home to me in Tableau Fundamentals class this week. I don’t see that as bad news at all.

Not so long ago I wrote a piece that attempted to inject a bit of reality into the claims then made by some data visualization tool vendors. I cited unexpected challenges that those adopting such tools for their obvious and compelling data presentation abilities might face. The challenges included unexpectedly complex data integration, establishing solid reporting standards and practices, scaling report distribution as demand for the visualizations expands, and the conversion work that can result from version upgrades.

Although a Fundamentals class, the experienced and enthusiastic instructor and the small, intelligent student group combined to make the two days immensely valuable, going far beyond the basics on the program (more on specific lessons learned will appear in an upcoming post). The instructor’s focus on principles rather than recipes drove home this point: in order to effectively use Tableau you have to understand not only how to operate Tableau itself but also the underlying data management, usability, and statistics principles.

Could it be that adopting easy-to-use Tableau in place of, say, SSRS, Cognos, or SAS requires an upgrade in staff knowledge and expertise? Continue reading