Escaping Teradata Purgatory (Select Failed. [2646] No more spool space)

Also see the related post More on “Select Failed. [2646] No more spool space”

If you are a SQL developer or data analyst working with Teradata, it is likely you’ve gotten this error message: “Select Failed. [2646] No more spool space”. Roughly speaking, Teradata “spool” is the space DBAs assign to each user account as work space for queries. So, for example, if your query needs to build an intermediate table behind the scenes to sort or otherwise process before it hands over your result set, that happens in spool space. It is limited, in part, to keep your potentially runaway query from using up too much space and clogging up the system.

After briefly setting the stage, this post presents the top three tactics I use to avoid or overcome spool space errors. For the second two tactics I’ll show working code. At the end of the post you’ll find volatile DDL that you can use to get the queries to run.

[I’m writing from the perspective of a Tableau/SQL report developer, DBAs and data architects please offer comments if I mischaracterize. Another note: a well designed dimensional datamart goes a long way to preventing poorly formed queries that overuse resources. This post is for the many who report from relational operational data.]

You Probably Have Enough Spool

Contrary to how you feel when you get a spool space error, your DBA has likely assigned you plenty. A typical enterprise-level Teradata environment assigns one to 1.5 terabytes to power users. Generally, if the transaction tables you’re working with have ten to about a hundred million rows, with good SQL practices you can accomplish most of what you need to do within your assigned terabyte.

One limitation in many Teradata environments is that SQL power users often don’t have DDL rights or rights to procedural logic, so the above mentioned good practices don’t include building intermediate tables or processes. Given that, here are my three tactics: 

1. Reduce the number of rows and columns you’re working with

Eliminate unneeded columns: Your first step is to carefully look through your select list and make sure you need all columns you’re retrieving. For example, are you selecting a couple of code/description pairs for which everyone knows the codes by heart? Getting rid of a couple of varchar(100) description columns can make a huge difference in spool usage over 100 million rows.

Eliminate unneeded rows: To reduce the count of rows you’re working with, think about how the result set will be used. For example, if your report is for the central region, and you are pulling results for all regions, perhaps you can exclude east and west. Focus your query as much as possible on the problem at hand.

Summarize if it makes sense: If your query supports, for example, a Tableau report that graphs average sales by district, but the result includes each individual sales transaction, consider rolling your query up to averages by district. With this, and the previous “Eliminate unneeded rows” step, be careful not to cut off future options. For example, maybe summarizing by store rather than district would get by the spool error while enabling drill down in the Tableau report.

2. Use Subqueries to Reduce Joined Rows and Columns

For our discussion, a subquery is one that is defined on the fly in a from or join statement. Here’s an example:

SELECT 
 CustomerRegion AS Region
,SUM((OrderLines.Quantity*Orderlines.UnitPrice) 
   - (ItemReturns.QuantityReturned*OrderLines.UnitPrice)) AS TotalSales 
FROM OrderLines 
LEFT JOIN Orders ON Orders.OrderID = Orderlines.OrderID
LEFT JOIN Customers ON Customers.CustomerID = Orders.CustomerID
LEFT JOIN 
(
  SELECT OrderLineID, QuantityReturned FROM ItemReturns 
) AS ItemReturns 
ON OrderLines.OrderLineID = ItemReturns.OrderLineID 
GROUP BY CustomerRegion

In the query above, we’ve isolated only two columns from the hypothetical ItemReturns table to join in order to deduct returns from our sales amount. There are other columns in the returns table, so if we join to the entire table Teradata will also load those columns into our spool (return date, return reason, and so on). Limiting the join to only those two columns substantially reduces the amount of spool space used. Often subqueries get you back in business within your spool space limit.

Joining to subqueries is also useful when you have cardinality problems, that is when you are returning duplicate rows or double-counting in sums, averages, or other aggregates. You can remove columns causing duplication and use Distinct or Group By in your subquery to get the right number of rows (in Teradata, Group By is usually preferred over Select Distinct for removing duplicate rows).

3. Use Volatile Tables

However, sometimes making your joins more efficient with subqueries doesn’t solve spool errors. In those cases, the problem might be that intermediate steps in a complex query use unexpectedly large amounts of data.

For example, say we needed sales by Region, excluding products with more than 100 complaints. Such a query might look like this:

SELECT 
 CustomerRegion
,SUM((OrderLines.Quantity*Orderlines.UnitPrice) 
 - (ItemReturns.QuantityReturned*OrderLines.UnitPrice)) AS TotalSales
FROM OrderLines
LEFT JOIN Orders ON Orders.OrderID = Orderlines.OrderID
LEFT JOIN Customers ON Customers.CustomerID = Orders.CustomerID
LEFT JOIN
(
 SELECT OrderLineID, QuantityReturned FROM ItemReturns
) ItemReturns
ON OrderLines.OrderLineID = ItemReturns.OrderLineID
LEFT JOIN
--******
(
 SELECT Complaints.ProductID, SUM(ComplaintCount) AS ComplaintCount
 FROM Complaints
 JOIN
 (
 SELECT ComplaintID, COUNT(ComplaintLineID) AS ComplaintCount
 FROM ComplaintLines
 GROUP BY 1
 ) ComplaintTotals ON Complaints.ComplaintID = ComplaintTotals.ComplaintID
 GROUP BY 1
) ComplaintTotals
--******
ON ComplaintTotals.ProductID = OrderLines.ProductID
AND ComplaintCount < 100
GROUP BY CustomerRegion

There’s a lot going on in that second subquery, marked by the asterisks. Depending on the contents of the tables, gathering the complaints and complaint lines might involve parsing through many very wide rows, which might exceed your spool capacity. However, in some cases, taking a step like that out and putting it into its own “volatile” table build can make all the difference.

Although you as a business person or BI developer likely don’t have rights to create a persistent table, you may have sufficient privileges to create volatile tables, that persist only during your query session. One example of how this is used in practice is in Tableau’s “initial SQL” capability.

Here’s our previous query redone, with the second subquery rendered as a volatile table build:

CREATE VOLATILE TABLE ComplaintTotals 
AS
(
 SELECT Complaints.ProductID, SUM(ComplaintCount) AS ComplaintCount
 FROM Complaints
 JOIN
 (
 SELECT ComplaintID, COUNT(ComplaintLineID) AS ComplaintCount
 FROM ComplaintLines
 GROUP BY 1
 ) ComplaintTotals ON Complaints.ComplaintID = ComplaintTotals.ComplaintID
 GROUP BY 1
) WITH DATA PRIMARY INDEX (ProductID) ON COMMIT PRESERVE ROWS;
 
COLLECT STATS COLUMN (ProductID) ON ComplaintTotals;
 
SELECT 
 CustomerRegion
,SUM((OrderLines.Quantity*Orderlines.UnitPrice) 
 - (ItemReturns.QuantityReturned*OrderLines.UnitPrice)) AS TotalSales
FROM OrderLines
LEFT JOIN Orders ON Orders.OrderID = Orderlines.OrderID
LEFT JOIN Customers ON Customers.CustomerID = Orders.CustomerID
LEFT JOIN
(
 SELECT OrderLineID, QuantityReturned FROM ItemReturns
) ItemReturns
ON OrderLines.OrderLineID = ItemReturns.OrderLineID
LEFT JOIN ComplaintTotals ON ComplaintTotals.ProductID = OrderLines.ProductID
AND ComplaintCount < 100
GROUP BY CustomerRegion

This approach divides the query into two parts. You can see that creating the volatile table rather than using the subquery removes a substantial amount of work from the query, reducing the amount of data that the query needs to process. Often dividing up a query this way can turn one large spool-consuming query into many small steps that fit into your spool space.

Of course, sometimes data is just too big, so these steps won’t always work for you. However, if you frequently experience spool issues when querying from operational tables, and your transaction row counts are in the ten to hundred million row range, it is a good chance that these measures will help.

————–

Here’s DDL to set up volatile tables supporting the queries above:

/*DROP TABLE ComplaintLines              
;DROP TABLE ItemReturns                  
;DROP TABLE Orders                       
;DROP TABLE OrderLines                   
;DROP TABLE Customers                    
;DROP TABLE ProductCatalog               
;DROP TABLE Complaints         */          
 
CREATE VOLATILE TABLE Orders               
AS
(
  SELECT
  100000 AS OrderID,
  100000 AS CustomerID,
  100000 AS OrderAmount
) WITH DATA PRIMARY INDEX (OrderID) ON COMMIT PRESERVE ROWS;

DELETE FROM Orders;
INSERT INTO Orders VALUES (1001, 3001,5001);
COLLECT STATS  COLUMN (OrderID), COLUMN(CustomerID) ON Orders;
 
CREATE VOLATILE TABLE OrderLines       
AS
(
  SELECT
  100000 AS OrderID,
  100000 AS OrderLineID,
  100000 AS ProductID,
  100000 AS Quantity,
  100000 AS UnitPrice
) WITH DATA PRIMARY INDEX (OrderID, OrderLineID) ON COMMIT PRESERVE ROWS;
 
DELETE FROM OrderLines;
INSERT INTO OrderLines VALUES (1001, 10011, 2001, 10, 350);
COLLECT STATS  COLUMN (OrderID), COLUMN(OrderLineID), COLUMN(ProductID)  
   ON OrderLines;
 
CREATE VOLATILE TABLE ItemReturns
AS
(
  SELECT
  100000 AS OrderLineID,
  100000 AS QuantityReturned,
  'dummy row dummy row dummy row' AS ReturnReason
) WITH DATA PRIMARY INDEX (OrderLineID) ON COMMIT PRESERVE ROWS;
 
DELETE FROM ItemReturns;
INSERT INTO ItemReturns VALUES (10011, 5, 'Didn''t like the color');
COLLECT STATS  COLUMN(OrderLineID) ON ItemReturns;
 
CREATE VOLATILE TABLE Customers       
AS
(
  SELECT 100000 AS CustomerID,
  'dummy value' AS CustomerRegion,
  'dummy value dummy value dummy value dummy value' AS CustomerName
) WITH DATA PRIMARY INDEX (CustomerID) ON COMMIT PRESERVE ROWS;
 
DELETE FROM Customers;
INSERT INTO Customers VALUES (3001,'East','Amalgamated Widgets Inc.');
COLLECT STATS  COLUMN(CustomerID) ON Customers;
 
CREATE VOLATILE TABLE ProductCatalog              
AS
(
  SELECT 100000 AS ProductID,
  'dummy value dummy value dummy value dummy value'  AS ProductName
) WITH DATA PRIMARY INDEX (ProductID) ON COMMIT PRESERVE ROWS;
 
DELETE FROM ProductCatalog;
INSERT INTO ProductCatalog VALUES (2001,'Widget Body Housing.');
COLLECT STATS  COLUMN(ProductID) ON ProductCatalog;
 
CREATE VOLATILE TABLE Complaints
AS
(
  SELECT
  100000 AS ComplaintID,
  100000 AS OrderID,
  100000 AS OrderLineID,
  100000 AS ProductID,
  100000 AS CustomerID
) WITH DATA PRIMARY INDEX (ComplaintID) ON COMMIT PRESERVE ROWS;
 
DELETE FROM Complaints;
INSERT INTO Complaints VALUES (7001,1001, 10011, 2001, 3001);
COLLECT STATS  COLUMN (ComplaintID), COLUMN (OrderID), 
   COLUMN(OrderLineID), COLUMN(ProductID)  ON Complaints;
 
CREATE VOLATILE TABLE ComplaintLines              
AS
(
  SELECT
  100000 AS ComplaintID,
  100000 AS ComplaintLineID,
  CAST('2017-11-01' AS DATE) AS ComplaintDate,
'dummy row dummy row dummy row dummy row dummy row dummy row dummy row dummy row dummy row dummy row dummy row dummy row dummy row dummy row ' AS ComplaintNote, 'dummy row dummy row dummy row' AS ComplaintDisposition
) WITH DATA PRIMARY INDEX (ComplaintID, ComplaintLineID) ON COMMIT PRESERVE ROWS
 
DELETE FROM ComplaintLines
INSERT INTO ComplaintLines VALUES (7001, 70011, CAST('2017-11-01' AS DATE),
'I expected the widget body housing to be purple but it was green','Resolved');
COLLECT STATS  COLUMN (ComplaintID), COLUMN(ComplaintLineID), 
   COLUMN(ComplaintDisposition)  ON ComplaintLines;

13 thoughts on “Escaping Teradata Purgatory (Select Failed. [2646] No more spool space)

  1. CarlosAL

    Hi.

    You don’t seem to understand what your’re talking about and the way Teradata works.

    The main cause of ‘no more spool space’ errors is usually the skew (the thing is, it doesn’t surprise me, since you seem to CREATE TABLES without an explicit PRIMARY INDEX definition).

    The #2 solution is nonsense, since Teradata only projects to spool the columns needed to resolve the query. (You can reduce spool with COMPRESS, btw) .

    Volatile Tables and “Subqueries” are both built in Spool, so the main difference is the distribution of the data (which is taken from the PRIMARY INDEX, which you neglected to define).

    Spool comes often from redistribution of the rows, which is usually because of bad written queries (not using PI JOINS) as the rows must be joined inside the same AMP. Skew is a killer here.

    Regards.

  2. Bob Lambert Post author

    CarlosAL – really appreciate your comment, and I admit my Teradata knowledge is certainly as a business side SQL developer. Solution #2 has worked many times for me, in terms of queries spooling before and not spooling after.
    I agree that not creating primary indexes is an oversight on my part and I’ve added them.
    Many thanks,
    Bob

  3. Saniya Khan

    Please provide more information on “Getting rid of a couple of varchar(100) description columns can make a huge difference in spool “. As one of my query is working on commenting a description column and give spool space when I use the column.

  4. Bob Lambert Post author

    Hi Saniya,
    If I understand your question correctly, your query spools out when the query retrieves the description column, but not if you comment that line out. If you need the column, and a partial description might be of use, try substringing the column to the first 30 or 50 characters of your varchar(100) column and see if that works. The other thing you might try is to pull all columns except the one you need to build your volatile table, then update the volatile table with the description column.
    Hope that helps!
    Bob

  5. Samart Jadhav

    Please suggest as why there isnt spool issue with volatile tables and there is issue with actual table.
    I mean how volatile tables helps in dealing with spool space

  6. Bob Lambert Post author

    Samart,
    Good question. Using volatile tables makes it possible for Teradata to address the problem in smaller steps rather than attempting to solve the problem all at once. For example, a query that spools out without a subquery that requires a sort operation might work if that sort operation is done separately in the volatile table build.
    In item 3 above, the second hypothetical query is more efficient and runs within spool space because it doesn’t have to incur the spool workspace required to generate ComplaintTotals.
    Thanks,
    Bob

  7. Burgandi Cruz

    This is by and far the most helpful piece of information i have seen on avoiding Spool Space. I appreciate it from the POV of a business user and not a DBA because although I have sufficient skills to write and code, I don’t know (nor do I want to know) all the dBA stuff. I love the explanation in plain english and the DDL’s. Thanks so much!

  8. Pingback: More on “Select Failed. [2646] No more spool space” | Bob Lambert

  9. Pingback: Data Architecture for Improved Dashboard Performance | Bob Lambert

  10. KT

    Thank you Bob, the first and second tricks seem to help my spool space problem. I am coming from the same perspective as you.. SQL developer, not DBA ..but try to optimize as much as possible. With your trick, it works beautifully. My 3M rows need to join with calendar table…. now I just do sub query and filter where date is in between min and max from my dataset.

  11. Roland Wenzlofsky

    Hi Bob. I fear your understanding about how Teradata works is limited or wrong. Check Google for Teradata out of spool space and you will see on page one a blog post of mine which describes possible solutions.

Leave a Reply

Your email address will not be published. Required fields are marked *