Tag Archives: Application Development

Two key interview questions for SQL developer candidates

Frequently in my career I’ve selected or helped select ETL and reporting professionals who need SQL skills. For some of those opportunities, placement firms returned resumes with interminable, and nearly identical, lists of technical achievements with excruciating unnecessary detail (paraphrasing: “Wrote SELECT statements using GROUP BY”, “Applied both inner and outer joins”). Before interviewing we typically ranked candidates in order of preference based on resumes. Candidates’ interview success bore little relation to resume-based rankings.

With some candidates I’ve encountered consistent pauses after fact questions, sometimes accompanied by keyboard clicks. They obviously used “think time” to look up answers on the internet. As a result, “fact” questions didn’t distinguish one candidate from another.

On the other hand, open ended questions worked well. I’ve written before that interviews should ask opinion rather than fact questions. Open ended questions or thought exercise, as opposed to fact questions, assess SQL skill level, are hard to quickly look up on the web, and have the added benefit of demonstrating a candidate’s reasoning and communication skills. Here are two of my favorite examples: Continue reading

Data Architecture for Improved Dashboard Performance

Sometimes success seems like a data analytics team’s worst enemy. A few successful visualizations packaged up into a dashboard by a small skunkworks team can generate interest such that a year later the team has published scores of mission critical dashboards. As their use spreads throughout the organization, and as features expand to meet the needs of an expanding user base, the dashboards can slow down and data refreshes fail as they exceed database and  analytics tool time and resource limits.

There are steps teams can take to deal with such slowdowns. Analytics tool vendors typically offer efficiency guides, like this one, that help resolve dashboard response time issues. A frequent recommendation is for the dashboard to use summary tables rather than full detail, reducing the amount of data that the dashboard has to parse as the user waits for a viz to render.*

Summary tables also help resolve data refresh timeouts, but their long term success for the team depends on the foundation on which they are built and how they are organized. The most obvious approach is to build custom summaries serving each dashboard. While report-specific tables stand out as a quick win, analysis shows they are a suboptimal solution because they tend to (1) reduce ability to respond to requirements evolution, and (2) make metrics in different dashboards less consistent. Continue reading

Two Design Principles for Tableau Data Sources

It’s not unusual for talented teams of business analysts to find themselves maintaining significant inventories of Tableau dashboards. In addition to sound development practices, following two key principles in data source design help these teams spend less time in maintenance and focus more on building new visualizations: publishing Tableau data sources separately from workbooks and waiting until the last opportunity to join dimension and fact data.


Imagine a business team — let’s call it Marketing Analytics — with read-only access to a Hadoop store or an enterprise data warehouse. They gain approval for Tableau licenses and Tableau Server publication rights for five tech-savvy data analysts. After a few initial successes with some impactful visualizations, the team gathers steam. After a while the team finds itself supporting scores of published workbooks serving a few hundred managers and executives. In spite of generally sound practices, Marketing Analytics struggles to maintain consistency from one Tableau workbook to another.

Continue reading

More on “Select Failed. [2646] No more spool space”

Also see the previous related post Escaping Teradata Purgatory (Select Failed. [2646] No more spool space)

Not too long ago I posted on how to avoid the dreaded “No more spool space” error in Teradata SQL. That post recounted approaches to restructuring SQL queries so that they would avoid being cancelled for using inordinate amounts of Teradata resources. Teradata is an immensely powerful, even if aging, database engine but it does little to help one not steeped in knowledge of its structure to use its resources efficiently.

But what if, as sometimes happens, your DB admin team further tightens the screws by  reducing spool space, or imposing new execution time or CPU usage limits? Then, you’ll have to go further to make queries efficient, as happened on one team that I was a part of. Beyond the steps previously recommended, here’s what we did: Continue reading

Toward a Values-Based Approach to Auditing Agile Projects

By now Agile has taken over waterfall as the dominant app dev project pattern*. In many large organizations, the traditional waterfall PMO also owns Agile projects. One aspect of PMO oversight that can work against Agile culture is the project audit. If the goal of an audit is to ensure the project reflects Agile values, it can help ensure working software and a satisfied customer. If not, an Agile project audit can reinforce process, documentation, and other values that don’t directly promote project success.

In this post I’ll briefly review the Agile Manifesto, recount some prominent advice for auditors of Agile projects, and offer suggestions for auditors who want to reinforce rather than suppress Agile values. Continue reading

Leadership Must Prioritize Data Quality

Data quality improvements follow specific, clear leadership from the top. Project leaders count data quality among project goals when senior management encourages them to do so with unequivocal incentives, a common business vocabulary, shared understanding of data quality principles, and general agreement on the objects of interest to the business and their key characteristics.

Poor data quality costs businesses about “$15 million per year in losses, according to Gartner.” As Tendü Yoğurtçu puts it, “artificial intelligence (AI) and machine learning algorithms are only as effective as the data they use.” Data scientists understand the difficulties well, as they spend over 70% of their time in data prep.

Recent studies report that data entry typos are the largest source of poor data quality (here and here). My experience says otherwise. From what I’ve seen, operational data is generally good, and data errors only appear when data changes context. In this post I’ll detail why data quality is management’s responsibility, and why data quality will remain poor until leadership makes it a priority. Continue reading

Meaningful Requirements Start Successful Data Projects

To me, development projects fail or succeed in the first few weeks. Once a project starts off in the wrong direction, momentum and expectations tend to prevent a return to the proper path. With today’s wealth of database options each addressing exciting new possibilities, the right choice for the application’s data foundation plays a large part in steering a project to success.

At this year’s Enterprise Data World conference, William Brooks showed the relations among different data modeling approaches, in effect detailing how to derive nine different model types from a detailed conceptual entity relationship model. Mr Brooks’ presentation hinted at a way to correctly frame up your data direction early on in a project, setting the stage for success.

According to his presentation, called “Symmetry in Modeling Approaches“, the different model types — relational, graph, dimensional, JSON, XML, and so on — all represent different perspectives on the same data relationships. Each suits a different application, like dimensional for reporting applications, data vault for data warehouses, graph databases for multi-layered search, and so on. However, if properly constructed they all map back in predictable and specific ways to a normalized entity-relationship model.

I and others write that ER modeling should be integral to requirements definition, but Mr. Brooks’ presentation implies that ER modeling can also serve as the basis for application architecture as well. Continue reading

Values and Behaviors of the Successful Agilist

Of course, any discussion of Agile values starts with the Agile Manifesto. The first sentence declares that Agile development is about seeking better ways and helping others. Then, as if espousing self-evident truths, the founders present four relative value statements. Finally, they emphasize appropriate balance, saying that the relatively less valued items aren’t worthless: implying that they are to be maintained inasmuch as they support the relatively more valued items.

While there is value in the four relative value statements, I believe most successful Agilists value the first and last statements more. So to me, the core Agile values are continuous improvement, helping others, and balance.

There’s a lot written about Agile behaviors, but as I read most is geared toward scrummasters or managers, and most is about transitioning from the waterfall world. Starting from the premise that Agile methods are established, focusing on participants rather than managers, and based on the assumption that behaviors are grounded in values, this post details the values and behaviors I’ve observed of those who succeed as Agile team members.

Continue reading

Escaping Teradata Purgatory (Select Failed. [2646] No more spool space)

Also see the related post More on “Select Failed. [2646] No more spool space”

If you are a SQL developer or data analyst working with Teradata, it is likely you’ve gotten this error message: “Select Failed. [2646] No more spool space”. Roughly speaking, Teradata “spool” is the space DBAs assign to each user account as work space for queries. So, for example, if your query needs to build an intermediate table behind the scenes to sort or otherwise process before it hands over your result set, that happens in spool space. It is limited, in part, to keep your potentially runaway query from using up too much space and clogging up the system.

After briefly setting the stage, this post presents the top three tactics I use to avoid or overcome spool space errors. For the second two tactics I’ll show working code. At the end of the post you’ll find volatile DDL that you can use to get the queries to run. Continue reading

Reporting Database Design Guidelines: Dimensional Values and Strategies

I recently found myself in a series of conversations in which I needed to make a case for dimensional data modeling. The discussions involved a group of highly skilled data architects who were surely familiar with dimensional techniques but didn’t see them as the best solution in the case at hand.

I thought it would be easy to find a quick, jargon free summary of best reporting database design principles aimed at a technical audience. There were a number of good summaries (cited at the end of this post), but none pitched just right for this highly-technical-but-outside-the-data-warehouse-world crowd.

I wanted to raise the dimensional model because, for most business reporting scenarios, it not only delivers on reporting needs, but also helps report developers handle changes to those needs as a side effect of the design.

So these are the notes I prepared for the conversation. They helped us all get on the same page, hopefully they will be useful to others: Continue reading