Wednesday, 11 September 2013

SQL: CROSS APPLY and OUTER APPLY

I didn't even notice that something called CROSS APPLY was added to SQL Server (in 2005) until recently. But found it difficult to work out when you would use it - in fact occasionally still not sure as I think I could do most things with other queries. Also not sure I am creating a convoluted example of it's use. Generally I think if you are struggling to work out how to use it, it is possibly an indication of something you wouldn't use often.

Let's set a problem to solve. I know I can solve this with some other SQL. Let's work with the Northwind database and the [Orders] and [Order Details] tables. An order (in the [Orders] table) can consist of multiple parts - the [Order Details]). Note that I have put the table names here in [] as the Northwind database have table names with spaces in them!
SELECT OrderID, CustomerID, OrderDate FROM Orders
Here are some of the columns and rows from the Orders table.
OrderID     CustomerID OrderDate
----------- ---------- -----------------------
10248       VINET      1996-07-04 00:00:00.000
10249       TOMSP      1996-07-05 00:00:00.000
10250       HANAR      1996-07-08 00:00:00.000
10251       VICTE      1996-07-08 00:00:00.000
.....
11076       BONAP      1998-05-06 00:00:00.000
11077       RATTC      1998-05-06 00:00:00.000

(830 row(s) affected)
And the order details with this SQL
SELECT * FROM [Order Details]
giving
OrderID     ProductID   UnitPrice             Quantity Discount
----------- ----------- --------------------- -------- -------------
10248       11          14.00                 12       0
10248       42          9.80                  10       0
10248       72          34.80                 5        0
10249       14          18.60                 9        0
10249       51          42.40                 40       0
10250       41          7.70                  10       0
10250       51          42.40                 35       0.15
10250       65          16.80                 15       0.15
.....
11077       75          7.75                  4        0
11077       77          13.00                 2        0

(2155 row(s) affected)
Here is the first problem we want to solve. Return the highest order item for each order. With an ordinary correlated sub-query this can be done with this SQL
SELECT O.OrderID, O.Freight,
(SELECT (MAX((UnitPrice * Quantity) * (1.0-Discount)))
  FROM [Order Details] OD 
  WHERE OD.OrderID = O.OrderID ) AS MaxOrder
FROM Orders O
Gives us this output
OrderID     Freight               MaxOrder
----------- --------------------- -------------
10248       32.38                 174
10249       11.61                 1696
10250       65.83                 1261.4
.....
11076       38.28                 375
11077       8.53                  364.8

(830 row(s) affected)
We have the highest order detail (as calculated using the unit price, quantity and discount). But what we don't know is the product for the highest order. A while ago I posted about using a Common Table Expression (CTE) as well as OVER with PARTITION, so let's do this with a CTE.
;WITH CTEOrderDetails
AS (
SELECT 
OD.OrderID, OD.ProductID,  ((UnitPrice * Quantity) * (1.0-Discount)) AS TotalCost, 
ROW_NUMBER() OVER (PARTITION BY OD.OrderID ORDER BY ((UnitPrice * Quantity) * (1.0-Discount)) DESC) As RowNum
FROM [Order Details] OD)

SELECT O.OrderId, O.Freight, OD.ProductID, OD.TotalCost
FROM Orders O
JOIN CTEOrderDetails OD ON (OD.OrderID = O.OrderID)
WHERE OD.RowNum = 1 
And now we can get the output we want
OrderId     Freight               ProductID   TotalCost
----------- --------------------- ----------- -------------
10248       32.38                 72          174
10249       11.61                 51          1696
10250       65.83                 51          1261.4
.....
11076       38.28                 6           375
11077       8.53                  2           364.8

(830 row(s) affected)
So a common table expression allows us to do this problem. Now let's look at the solution using CROSS APPLY. Before doing this I looked at this as some SQL - it is a correlated join (which you can't do!)
SELECT O.OrderId, O.Freight, OO.ProductID, OO.TotalCost
FROM Orders O
JOIN
(SELECT TOP 1 OD.OrderID, OD.ProductID, ((UnitPrice * Quantity) * (1.0-Discount)) AS TotalCost
FROM [Order Details] OD 
WHERE OD.OrderID = O.OrderID
) OO ON (O.OrderDate = OO.OrderID)
The idea is to have an "inline view" (i.e. a query) but trying some sort of correlation. When I have a correlated sub-query, I think of it as like a "foreach" statement - for each row returned by the main query, run the sub-query. Which is what I was trying here. But if you change the JOIN to CROSS APPLY and remove the ON clause so you have this SQL
SELECT O.OrderId, O.Freight, OO.ProductID, OO.TotalCost
FROM Orders O
CROSS APPLY
(SELECT TOP 1 OD.OrderID, OD.ProductID, ((UnitPrice * Quantity) * (1.0-Discount)) AS TotalCost
FROM [Order Details] OD 
WHERE OD.OrderID = O.OrderID
) OO 
You get the results we are looking for
OrderId     Freight               ProductID   TotalCost
----------- --------------------- ----------- -------------
10248       32.38                 11          168
10249       11.61                 14          167.4
10250       65.83                 41          77
.....
11077       8.53                  2           364.8

(830 row(s) affected)
A little description of what is happening might be useful. The CROSS APPLY is sort of like an INNER JOIN, in so much as the rows returned from the main query are all used with any matches from the query within the CROSS APPLY clause. So for each row returned by the main query find a row in the CROSS APPLY query (which has a reference to the outer query - here in the WHERE clause). Let's prove the fact that only rows in the main query are returned and insert a row into the [Orders] table
SET IDENTITY_INSERT Orders ON
INSERT INTO Orders 
 (OrderID, CustomerID, EmployeeID, OrderDate, RequiredDate, ShippedDate, ShipVia, Freight, ShipName, ShipAddress, ShipCity, ShipRegion, ShipPostalCode, ShipCountry)
VALUES
 (99999, 'VINET', 5, GETDATE(), GETDATE(), GETDATE(), 3, 40.10, 'Connor Macleod', 'Clan Maclead', 'Glenfinnan', 'Highlands', 'N/A', 'Scotland')
SET IDENTITY_INSERT Orders OFF
If you re-run the query you will see that order 99999 does not display.

You may have noticed in my description of CROSS APPLY I didn't use the terms inner or outer. Microsoft have't use it in it's naming, but if you read the articles on this it talks about the outer query (the query not in the CROSS APPLY). There is a further option we can use - OUTER APPLY. The OUTER APPLY will include the rows from the outer query that do not have matches in the inner query. So using this query
SELECT O.OrderId, O.Freight, OO.ProductID, OO.TotalCost
FROM Orders O
OUTER APPLY
(SELECT TOP 1 OD.OrderID, OD.ProductID, ((UnitPrice * Quantity) * (1.0-Discount)) AS TotalCost
FROM [Order Details] OD 
WHERE OD.OrderID = O.OrderID
) OO
Will show the row that we require
OrderId     Freight               ProductID   TotalCost
----------- --------------------- ----------- -------------
10248       32.38                 11          168
10249       11.61                 14          167.4
10250       65.83                 41          77
.....
11076       38.28                 6           375
11077       8.53                  2           364.8
99999       40.10                 NULL        NULL

(831 row(s) affected)
If you want to do this with a Common Table Expression you need to firstly change the join (in our case to a LEFT OUTER JOIN) and secondly add a clause to the WHERE (remember we are only returning the top row - which would have RowNum = 1), we now need to include the rows that aren't included (so RowNum IS NULL).
;WITH CTEOrderDetails
AS (
SELECT 
OD.OrderID, OD.ProductID,  ((UnitPrice * Quantity) * (1.0-Discount)) AS TotalCost, 
ROW_NUMBER() OVER (PARTITION BY OD.OrderID ORDER BY ((UnitPrice * Quantity) * (1.0-Discount)) DESC) As RowNum
FROM [Order Details] OD)

SELECT O.OrderId, O.Freight, OD.ProductID, OD.TotalCost
FROM Orders O
LEFT OUTER JOIN CTEOrderDetails OD ON (OD.OrderID = O.OrderID)
WHERE OD.RowNum = 1 OR OD.RowNum IS NULL
Now we have the correct results.

No comments:

Post a Comment