More on “Select Failed. [2646] No more spool space”

Also see the previous related post Escaping Teradata Purgatory (Select Failed. [2646] No more spool space)

Not too long ago I posted on how to avoid the dreaded “No more spool space” error in Teradata SQL. That post recounted approaches to restructuring SQL queries so that they would avoid being cancelled for using inordinate amounts of Teradata resources. Teradata is an immensely powerful, even if aging, database engine but it does little to help one not steeped in knowledge of its structure to use its resources efficiently.

But what if, as sometimes happens, your DB admin team further tightens the screws by reducing spool space, or imposing new execution time or CPU usage limits? Then, you’ll have to go further to make queries efficient, as happened on one team that I was a part of. Beyond the steps previously recommended, here’s what we did:

Converted subqueries to volatile tables
Removed functions from join and where conditions
Identified problem views and extracted them into volatile tables

As in the earlier post, I emphasize I am by no means a Teradata technical expert and welcome comments and corrections from those who are.

Convert subqueries to volatile tables

In my earlier post I recommended replacing some joins with subqueries. Subqueries can improve efficiency by reducing the number of columns and/or rows that your query needs to work with. Under pressure from the new resource constraints, we found that it was more efficient to separate subqueries into separate volatile tables with a primary index.

The earlier post included this example showing a subquery that reduced columns retrieved from the ItemReturns table (see DDL included at the bottom of the earlier post):

SELECT 
 CustomerRegion AS Region
,SUM((OrderLines.Quantity*Orderlines.UnitPrice) 
   - (ItemReturns.QuantityReturned*OrderLines.UnitPrice)) AS TotalSales 
FROM OrderLines 
LEFT JOIN Orders ON Orders.OrderID = Orderlines.OrderID
LEFT JOIN Customers ON Customers.CustomerID = Orders.CustomerID
LEFT JOIN 
(
  SELECT OrderLineID, QuantityReturned FROM ItemReturns 
) AS ItemReturns 
ON OrderLines.OrderLineID = ItemReturns.OrderLineID 
GROUP BY CustomerRegion

First, set up a volatile table extracting data from the ItemReturns table, then refer to that volatile table in the query. Our DBAs advised us to include in the Primary Index any volatile table column that would later be used in a join:

CREATE VOLATILE TABLE ItemReturnsVT
AS
(
    SELECT OrderLineID, QuantityReturned FROM ItemReturns
) WITH DATA PRIMARY INDEX (OrderID, OrderLineID) ON COMMIT PRESERVE ROWS;

SELECT CustomerRegion, 
Sum((OrderLines.Quantity*Orderlines.UnitPrice)
   -(ItemReturnsVT.QuantityReturned*OrderLines.UnitPrice)) AS TotalSales
FROM OrderLines
LEFT JOIN Orders ON Orders.OrderID = Orderlines.OrderID
LEFT JOIN Customers ON Customers.CustomerID = Orders.CustomerID
LEFT JOIN  ItemReturnsVT ON OrderLines.OrderLineID = ItemReturnsVT.OrderLineID
GROUP BY CustomerRegion;

Remove functions from join and where conditions

At our site we had several examples of code like this, that used functions in where or join conditions:

CREATE VOLATILE TABLE ItemReturnsVT
AS
(
    SELECT OrderLineID, QuantityReturned FROM ItemReturns
) WITH DATA PRIMARY INDEX (OrderID, OrderLineID) ON COMMIT PRESERVE ROWS;

SELECT CustomerRegion, 
Sum((OrderLines.Quantity*Orderlines.UnitPrice)
   -(ItemReturnsVT.QuantityReturned*OrderLines.UnitPrice)) AS TotalSales
FROM OrderLines
LEFT JOIN Orders ON Orders.OrderID = Orderlines.OrderID
LEFT JOIN Customers ON Customers.CustomerID = Orders.CustomerID
LEFT JOIN  ItemReturnsVT ON OrderLines.OrderLineID = ItemReturnsVT.OrderLineID
WHERE SUBSTRING(Customers.CustomerName,5,4) = 'Inc.'
GROUP BY CustomerRegion;

In retrospect, it is relatively easy to find reliable sources, like this one, that advise not to use functions in row selection logic. If your poorly performing query includes such statements, you can correct them as follows, using Teradata’s nifty feature of allowing you to use column aliases in join and where conditions:

CREATE VOLATILE TABLE ItemReturnsVT
AS
(
    SELECT OrderLineID, QuantityReturned FROM ItemReturns
) WITH DATA PRIMARY INDEX (OrderID, OrderLineID) ON COMMIT PRESERVE ROWS;

SELECT CustomerRegion, SUBSTRING(Customers.CustomerName,5,4) as Inc_Indicator,
Sum((OrderLines.Quantity*Orderlines.UnitPrice)
   -(ItemReturnsVT.QuantityReturned*OrderLines.UnitPrice)) AS TotalSales
FROM OrderLines
LEFT JOIN Orders ON Orders.OrderID = Orderlines.OrderID
LEFT JOIN Customers ON Customers.CustomerID = Orders.CustomerID
LEFT JOIN  ItemReturnsVT ON OrderLines.OrderLineID = ItemReturnsVT.OrderLineID
WHERE Inc_Indicator = 'Inc.'
GROUP BY CustomerRegion;

Extract problem views into volatile tables

Many sites present Teradata views rather than tables to SQL developers and business users. Among other benefits, this enables DBAs to better control security by excluding sensitive columns or rows from specific user groups. Views don’t store any data, they are essentially queries stored in the database that execute when you use them in a SELECT statement.

The database perceives a view in your SQL as, in effect, additional lines of SQL in your query. So if you reference a view whose SQL definition isn’t as efficient as it could be, then the query suffers as if you had written the SQL that defines the view.

In my experience, most DBAs and database architects are outstanding SQL developers who write efficient code, but things slip sometimes. Maybe a view that’s a “quick and dirty” fix hangs around and becomes permanent, or one built for small volume is no longer efficient as tables grow.

Whatever the reason, it’s possible for a view to cause your performance challenge. When that happened to us we removed the join to the view from our query, added a volatile table with an extract of data from the view, and joined to that volatile table instead of directly to the view.

Sometimes it is tough to find a view that’s causing the problem. A seasoned Teradata expert would interpret the execution plan to identify the issue. If you use SQL Assistant, access the execution plan by hitting F6. I review execution plans, but I’m no expert at interpreting them, so they are useful adds to my troubleshooting process. After working through a few examples, you’ll start to recognize big numbers for your site for rows returned or “estimated time for this step”.

However, I’ve found that problem views won’t always stand out in the plan. In your troubleshooting, take time to review the SQL definitions of the views you reference using “SHOW VIEW [viewname]” to see if they are likely inefficient. If so, create a volatile table that only pulls needed columns and rows from the view, and use that volatile table instead of the view in your query.

Generally I’ve found good mileage from any change that breaks a query into smaller pieces or simplifies it. Please comment below and recount your experience.