Thursday, 6 June 2013

SQL: Common Table Expressions (CTE)

This posting is the first around Common Table Expressions (CTE) that were introduced in SQL Server 2005. They can be thought of as temporary views or results that can be used within a query.

My first introduction to this to look at sub-queries and inline views and show how a Common Table Expression can be used instead.

We are going to use the Northwind database for this example. The problem that I want to solve is to calculate is the average number of orders that each customer will make (using only customers who have made orders before - just for simplicity).
SELECT COUNT(*) AS NumberOfOrders FROM Orders
SELECT COUNT(DISTINCT CustomerID) AS NumberOfCustomersWhoHaveMadeOrders 
    FROM Orders
with the results
NumberOfOrders
--------------
830

NumberOfCustomersWhoHaveMadeOrders
----------------------------------
89
The answer is 830/89 - which is 9.3258. The following SQL will do this for you
SELECT 
 COUNT(DISTINCT O.OrderID) / CAST(COUNT(DISTINCT O.CustomerID) AS float)
      AS AverageNumberOfOrdersPerCustomer
 FROM Orders O
with the results
AverageNumberOfOrdersPerCustomer
---------------------------------------
9.32584269662921
One small downside of this - if there are no orders you will get a divide by zero error. In this simple example you won't see this, but taking the problem further we might want to retrieve the average number of orders by month - and we might not have any orders in one month. But the point of this is how to do this with a Common Table Expression.
WITH CTEOrders
AS
(SELECT CustomerID, COUNT(*) AS NumberOfOrdersPerCustomer FROM Orders
GROUP BY CustomerID)
SELECT AVG(CAST(NumberOfOrdersPerCustomer AS float)) 
     AS AverageNumberOfOrdersPerCustomer
FROM CTEOrders
with the results
AverageNumberOfOrdersPerCustomer
--------------------------------
9.32584269662921
Let's look at the syntax of the CTE. We start with the WITH clause followed by the name of the Common Table Expression. Following this we can specify (in brackets) the names of the columns. This is optional if the columns are named within the SELECT statement within the CTE. The SELECT statement is enclosed in brackets.
WITH CTEName (column1, column2, ...)
AS (SELECT statement)
Immediately following the Common Table Expression definition you must use it. After the next statement the result of the CTE is gone.
Which is the best approach? I have always had problems reading Execution Plans, but we can have a go. The execution plan for the first query is this

Execution plan for standard query - click to enlarge
The query is performing a scan of the CustomersOrders Index (which isn't clustered) of 890 rows representing 89% of the overall query. The execution plan for the Common Table Expression is very much identical. It has the same scan of the same index - but this represents more of the query (implying that the rest of the query is very slightly less costly). But overall they are the same in cost.

Execution plan for Common Table Expression - click to enlarge
We can also look at the Client Statistics for each and the CTE comes out slightly on top.

Statistics for standard query
Statistics for Common Table Expression
So very similar in timing and the same results. But lets look at the name - it is called a Common Table Expression. One of the uses is to reuse the results from the CTE. So imagine that as well as the average I want the maximum number of orders per customer and minimum number of orders per customer. To do this with a CTE the query is
WITH CTEOrders
AS
(SELECT CustomerID, COUNT(*) AS NumberOfOrdersPerCustomer FROM Orders
GROUP BY CustomerID)
SELECT MAX(NumberOfOrdersPerCustomer) AS MaxOrders,
       MIN(NumberOfOrdersPerCustomer) AS MinOrders,
       AVG(CAST(NumberOfOrdersPerCustomer AS float))  
     AS AverageNumberOfOrdersPerCustomer
FROM CTEOrders
Giving the results
MaxOrders   MinOrders   AverageNumberOfOrdersPerCustomer
----------- ----------- --------------------------------
31          1           9.32584269662921
Without much additional cost - the extra aggregates will be working with the data returned by the CTE. Doing this with the standard query will require quite a bit more work.

No comments:

Post a Comment