I’ve had the good fortune to have been involved in many successful application development and analytics efforts (here and here), and a few that were less so (here and here). Recently, I’ve thought about the differences between the successful and the unsuccessful.
As I see it, there are five general characteristics that the successful endeavors share. They:
Closely Align with the Business
Share an End State Vision
Are Reality Grounded
Have a Technically Competent Team
Feature Frequent and Open Communications
To level set, this article focuses on business-facing analytics, data acquisition, application development, and related efforts developing internal capabilities for larger organizations. Continue reading →
A while back I wrote the post A Field Guide to Overloaded Data, which publicized the work of Duane Hufford, who examined different types of overloaded data during the 1990s. Over the years his classifications of overloaded data effectively categorized data anomalies I encountered in the wild.
That is until recently, when a colleague encountered an array of file names in a single SQL Server column. This instance didn’t fit into the three categories detailed in the earlier post, so I’m presenting it here. I’ve also added it to the the original post.
Definition: Bundled data is a situation where a single column in a table contains an array of values. Continue reading →
I’ve written previously about development of Tableau analytics capability from single user to multiple teams across an organization. This article is intended for those who may have first installed Tableau Server to enable folks outside their own sphere to interact with their Tableau creations. For the way ahead, it presents a few guidelines for successful development and deployment that data analysts should internalize as their analytics product grows.
The theme is, from the very start, to develop dashboards as if they serve hundreds of users and access millions of data records. If you do that, then as your analytical tools grow in usefulness and popularity, you’ll avoid difficult conversions and retooling later. Continue reading →
Although Agile writers and thinkers agree that “there is no sign-off” in Agile methodology, the practice of requiring product owners and business customers to sign off on requirements and delivered work products persists in Agile settings. I’ve seen it most when an agile team faces delivery challenges and leaders perceive the problem is scope creep or failure of the UAT process before delivery. In those situations, adding a formal sign-off provides an illusion of a stronger process but does nothing to resolve the underlying issue.
Sure, sometimes sign-off is necessary, especially when two or more separate organizations work together on a project. For example, consulting contracts often require sign-off on interim and final work products. However, addition of a sign-off step is common within organizations in hopes of a remedy for delivery or quality challenges.
The commenter David on this post says that “the purpose of a sign-off (or whatever you wanna call it) is a confirmation from a product owner that artifact A is fine for the time being, and can be used as basis for work on artifact B.” That’s all well and good, but in a well-run Agile context sign-off is a meaningless formality that’s dispensed with because it’s unnecessary.
How could that be? Others have written, often emphatically, on why sign-off is unnecessary in an agile context, including here and here. This quick video explains how “definition of done” and a fully committed, reliable team work together together make sign-off irrelevant. Continue reading →
Frequently in my career I’ve selected or helped select ETL and reporting professionals who need SQL skills. For some of those opportunities, placement firms returned resumes with interminable, and nearly identical, lists of technical achievements with excruciating unnecessary detail (paraphrasing: “Wrote SELECT statements using GROUP BY”, “Applied both inner and outer joins”). Before interviewing we typically ranked candidates in order of preference based on resumes. Candidates’ interview success bore little relation to resume-based rankings.
With some candidates I’ve encountered consistent pauses after fact questions, sometimes accompanied by keyboard clicks. They obviously used “think time” to look up answers on the internet. As a result, “fact” questions didn’t distinguish one candidate from another.
On the other hand, open ended questions worked well. I’ve written before that interviews should ask opinion rather than fact questions. Open ended questions or thought exercise, as opposed to fact questions, assess SQL skill level, are hard to quickly look up on the web, and have the added benefit of demonstrating a candidate’s reasoning and communication skills. Here are two of my favorite examples: Continue reading →
Sometimes success seems like a data analytics team’s worst enemy. A few successful visualizations packaged up into a dashboard by a small skunkworks team can generate interest such that a year later the team has published scores of mission critical dashboards. As their use spreads throughout the organization, and as features expand to meet the needs of an expanding user base, the dashboards can slow down and data refreshes fail as they exceed database and analytics tool time and resource limits.
There are steps teams can take to deal with such slowdowns. Analytics tool vendors typically offer efficiency guides, like this one, that help resolve dashboard response time issues. A frequent recommendation is for the dashboard to use summary tables rather than full detail, reducing the amount of data that the dashboard has to parse as the user waits for a viz to render.*
Summary tables also help resolve data refresh timeouts, but their long term success for the team depends on the foundation on which they are built and how they are organized. The most obvious approach is to build custom summaries serving each dashboard. While report-specific tables stand out as a quick win, analysis shows they are a suboptimal solution because they tend to (1) reduce ability to respond to requirements evolution, and (2) make metrics in different dashboards less consistent. Continue reading →
It’s not unusual for talented teams of business analysts to find themselves maintaining significant inventories of Tableau dashboards. In addition to sound development practices, following two key principles in data source design help these teams spend less time in maintenance and focus more on building new visualizations: publishing Tableau data sources separately from workbooks and waiting until the last opportunity to join dimension and fact data.
Imagine a business team — let’s call it Marketing Analytics — with read-only access to a Hadoop store or an enterprise data warehouse. They gain approval for Tableau licenses and Tableau Server publication rights for five tech-savvy data analysts. After a few initial successes with some impactful visualizations, the team gathers steam. After a while the team finds itself supporting scores of published workbooks serving a few hundred managers and executives. In spite of generally sound practices, Marketing Analytics struggles to maintain consistency from one Tableau workbook to another.
Not too long ago I posted on how to avoid the dreaded “No more spool space” error in Teradata SQL. That post recounted approaches to restructuring SQL queries so that they would avoid being cancelled for using inordinate amounts of Teradata resources. Teradata is an immensely powerful, even if aging, database engine but it does little to help one not steeped in knowledge of its structure to use its resources efficiently.
But what if, as sometimes happens, your DB admin team further tightens the screws by reducing spool space, or imposing new execution time or CPU usage limits? Then, you’ll have to go further to make queries efficient, as happened on one team that I was a part of. Beyond the steps previously recommended, here’s what we did: Continue reading →
By now Agile has taken over waterfall as the dominant app dev project pattern*. In many large organizations, the traditional waterfall PMO also owns Agile projects. One aspect of PMO oversight that can work against Agile culture is the project audit. If the goal of an audit is to ensure the project reflects Agile values, it can help ensure working software and a satisfied customer. If not, an Agile project audit can reinforce process, documentation, and other values that don’t directly promote project success.
In this post I’ll briefly review the Agile Manifesto, recount some prominent advice for auditors of Agile projects, and offer suggestions for auditors who want to reinforce rather than suppress Agile values. Continue reading →
Data quality improvements follow specific, clear leadership from the top. Project leaders count data quality among project goals when senior management encourages them to do so with unequivocal incentives, a common business vocabulary, shared understanding of data quality principles, and general agreement on the objects of interest to the business and their key characteristics.
Poor data quality costs businesses about “$15 million per year in losses, according to Gartner.” AsTendü Yoğurtçuputs it, “artificial intelligence (AI) and machine learning algorithms are only as effective as the data they use.” Data scientistsunderstandthe difficulties well, as they spend over 70% of their time in data prep.
Recent studies report that data entry typos are the largest source of poor data quality (here and here). My experience says otherwise. From what I’ve seen, operational data is generally good, and data errors only appear when data changes context. In this post I’ll detail why data quality is management’s responsibility, and why data quality will remain poor until leadership makes it a priority.Continue reading →