Not too long ago I posted on how to avoid the dreaded “No more spool space” error in Teradata SQL. That post recounted approaches to restructuring SQL queries so that they would avoid being cancelled for using inordinate amounts of Teradata resources. Teradata is an immensely powerful, even if aging, database engine but it does little to help one not steeped in knowledge of its structure to use its resources efficiently.
But what if, as sometimes happens, your DB admin team further tightens the screws by reducing spool space, or imposing new execution time or CPU usage limits? Then, you’ll have to go further to make queries efficient, as happened on one team that I was a part of. Beyond the steps previously recommended, here’s what we did: Continue reading →
I recently listened to Brian O’Neill’s excellent interview with Tom Davenport, headlined “Why on a scale of 1-10, the field of analytics has only gone from a one to about a two in ten years time.”
The conversation covered a lot of ground as Mr O’Neill and Mr Davenport explored the reasons why. Highlights included general lack of technical literacy and lack of an organizational data driven culture. But to their credit, they took responsibility on behalf of analytics professionals, emphasizing how we in the field could change in order to make more analytics efforts successful. Rather than focusing on providing technology-centered solutions, they recommended that data and AI professionals seek first to understand and empathize with their clients or internal customers, enabling data and AI pros to develop more effective analytics capabilities in light of that understanding.
I agree that analytics professionals can improve their game. However, as a former consultant who’s switched over to the client side, I think there’s room for improvement all around. To me, clients who work proactively to prepare for an analytics project position themselves for better outcomes. Continue reading →
In my experience, data management is both a mission critical and an undervalued capability. Perhaps recent customer data losses and regulatory initiatives like GDPR tend to raise the stock of data maturity efforts, but it remains undervalued. For example, any Fortune 1000 firm building end-to-end processes finds that much of the cost goes to translating data from different systems that integrate into the process.
Today we have available stage models like CMMI’s Data Management Maturity Model (DMMM) which, as I’ve written, help organizations assess an organization’s maturity level. However, the DMM model aims to assess data maturity at a single agency. It lacks mechanisms to compare multiple agencies or business functions, and therefore can be difficult to translate to prioritized plans for improvement.
Of course no one would do that on purpose, but I as a consultant over many years I’ve often seen it. A vendor fulfills a contract to the letter, which unfortunately allows them to deliver required reports in various, sometimes changing, formats with suspect data quality. The customer company absorbs these costs, leaning on the data analyst to update PowerPoint decks on schedule before the next monthly management meeting in spite of the extra programming work.
These contracts have been for various goods and services, but almost every business contract today is also a contract for data. If a regional gas company hires a vendor to inspect residential lines, then I suspect it wants reports showing inspections conducted and results; a healthcare firm that sends nurses on house calls needs data detailing call schedules and results; and so on.
Companies that supply goods or provide services often don’t feature data management as a core competency, and the quality of their reporting often doesn’t match the quality of their goods or services. Someone in the customer organization has to code around every addition or omission of an expected Excel column, every “N/A” in a numeric field, and every unexpected change from imperial to metric units. Continue reading →
By now Agile has taken over waterfall as the dominant app dev project pattern*. In many large organizations, the traditional waterfall PMO also owns Agile projects. One aspect of PMO oversight that can work against Agile culture is the project audit. If the goal of an audit is to ensure the project reflects Agile values, it can help ensure working software and a satisfied customer. If not, an Agile project audit can reinforce process, documentation, and other values that don’t directly promote project success.
In this post I’ll briefly review the Agile Manifesto, recount some prominent advice for auditors of Agile projects, and offer suggestions for auditors who want to reinforce rather than suppress Agile values. Continue reading →
Data quality improvements follow specific, clear leadership from the top. Project leaders count data quality among project goals when senior management encourages them to do so with unequivocal incentives, a common business vocabulary, shared understanding of data quality principles, and general agreement on the objects of interest to the business and their key characteristics.
Poor data quality costs businesses about “$15 million per year in losses, according to Gartner.” AsTendü Yoğurtçuputs it, “artificial intelligence (AI) and machine learning algorithms are only as effective as the data they use.” Data scientistsunderstandthe difficulties well, as they spend over 70% of their time in data prep.
Recent studies report that data entry typos are the largest source of poor data quality (here and here). My experience says otherwise. From what I’ve seen, operational data is generally good, and data errors only appear when data changes context. In this post I’ll detail why data quality is management’s responsibility, and why data quality will remain poor until leadership makes it a priority.Continue reading →
It’s been a truism that data is a resource, but to prove it you just have to follow the money. As the illustration shows, the vast majority of corporate market value draws from intangible assets. Just as money is an abstraction that represents wealth, data is an abstraction that represents these intangible assets.
It’s year three after initial rollout of the Leader’s Data Manifesto (LDM). Since then, many widely publicized events have highlighted the value of data and metadata, and the importance of sound data management (here, here, and here). Recently at Enterprise Data World, John Ladley, Danette McGilvray, James Price, and Tom Redman presented this year’s LDM update. They reintroduced the Manifesto, recounted events of the past year, discussed strategy for the coming year, and issued a call to action for data professionals. Continue reading →
Data scientists spend most of their time doing data integration rather than gathering insights. In my interview with data scientist Yan Li, she said that data collection and prep takes at least 70% of her time. Obviously, there’s a lot of integration work to do on data that’s new to analytics efforts, but not every analysis uses brand new data. Organizations can improve analytics efficiency by staging commonly used data pre-integrated for data science.
For years, large organizations have supported data warehouses, but prevailing data warehousing practices often fail when faced with “big data” volume and velocity. Still, warehousing teams in large organizations can pre-prep frequently-used internal data. Examples include reference and master data, production and sales records, and so on. Continue reading →
Reading articles about data anonymization makes it clear that it is not an entirely effective security measure (here and here), but still part of a robust security capability, and required if your organization is affected by GDPR. (I use “anonymization” as a general term encompassing techniques that de-identify personal data within a given data set.)
But there’s a positive side of anonymized data that hasn’t received much press. Providing anonymous data to senior managers who don’t need access to personal data can encourage them to take a broader perspective, and thereby bring new energy to fact-based senior planning and analysis. Continue reading →
In data management and analytics, we often focus on correcting apparent inability and unwillingness on the part of business leaders to effectively gather and capitalize on data resources. With that perspective, we often see ethics as a side issue difficult to prioritize given the scale and persistence of our other challenges.
At least that was my perspective, and my initial response when confronted recently by a family member on this topic. Her view from outside the field was that ethics should be a primary concern. As I’ve reflected on this conversation, I’ve come around to her point.
In recent years we’ve seen many examples of data misuse due to ethical lapses. Here’s a post that gives five examples, including police officers looking up data on individuals not related to any police business, an employee passing personal data including SSNs to a text sharing site, and Uber’s “god view”, available at the corporate level, which an employee used in 2014 to track a journalist’s location. Continue reading →