I didn't even notice that something called
CROSS APPLY was added to SQL Server (in 2005) until recently. But found it difficult to work out when you would use it - in fact occasionally still not sure as I think I could do most things with other queries. Also not sure I am creating a convoluted example of it's use. Generally I think if you are struggling to work out how to use it, it is possibly an indication of something you wouldn't use often.
Let's set a problem to solve. I know I can solve this with some other SQL. Let's work with the Northwind database and the
[Orders] and
[Order Details] tables. An order (in the
[Orders] table) can consist of multiple parts - the
[Order Details]). Note that I have put the table names here in
[] as the Northwind database have table names with spaces in them!
SELECT OrderID, CustomerID, OrderDate FROM Orders
Here are some of the columns and rows from the Orders table.
OrderID CustomerID OrderDate
----------- ---------- -----------------------
10248 VINET 1996-07-04 00:00:00.000
10249 TOMSP 1996-07-05 00:00:00.000
10250 HANAR 1996-07-08 00:00:00.000
10251 VICTE 1996-07-08 00:00:00.000
.....
11076 BONAP 1998-05-06 00:00:00.000
11077 RATTC 1998-05-06 00:00:00.000
(830 row(s) affected)
And the order details with this SQL
SELECT * FROM [Order Details]
giving
OrderID ProductID UnitPrice Quantity Discount
----------- ----------- --------------------- -------- -------------
10248 11 14.00 12 0
10248 42 9.80 10 0
10248 72 34.80 5 0
10249 14 18.60 9 0
10249 51 42.40 40 0
10250 41 7.70 10 0
10250 51 42.40 35 0.15
10250 65 16.80 15 0.15
.....
11077 75 7.75 4 0
11077 77 13.00 2 0
(2155 row(s) affected)
Here is the first problem we want to solve. Return the highest order item for each order. With an ordinary correlated sub-query this can be done with this SQL
SELECT O.OrderID, O.Freight,
(SELECT (MAX((UnitPrice * Quantity) * (1.0-Discount)))
FROM [Order Details] OD
WHERE OD.OrderID = O.OrderID ) AS MaxOrder
FROM Orders O
Gives us this output
OrderID Freight MaxOrder
----------- --------------------- -------------
10248 32.38 174
10249 11.61 1696
10250 65.83 1261.4
.....
11076 38.28 375
11077 8.53 364.8
(830 row(s) affected)
We have the highest order detail (as calculated using the unit price, quantity and discount). But what we don't know is the product for the highest order. A while ago I posted about using a
Common Table Expression (CTE) as well as
OVER with PARTITION, so let's do this with a CTE.
;WITH CTEOrderDetails
AS (
SELECT
OD.OrderID, OD.ProductID, ((UnitPrice * Quantity) * (1.0-Discount)) AS TotalCost,
ROW_NUMBER() OVER (PARTITION BY OD.OrderID ORDER BY ((UnitPrice * Quantity) * (1.0-Discount)) DESC) As RowNum
FROM [Order Details] OD)
SELECT O.OrderId, O.Freight, OD.ProductID, OD.TotalCost
FROM Orders O
JOIN CTEOrderDetails OD ON (OD.OrderID = O.OrderID)
WHERE OD.RowNum = 1
And now we can get the output we want
OrderId Freight ProductID TotalCost
----------- --------------------- ----------- -------------
10248 32.38 72 174
10249 11.61 51 1696
10250 65.83 51 1261.4
.....
11076 38.28 6 375
11077 8.53 2 364.8
(830 row(s) affected)
So a common table expression allows us to do this problem. Now let's look at the solution using
CROSS APPLY. Before doing this I looked at this as some SQL - it is a correlated join (which you can't do!)
SELECT O.OrderId, O.Freight, OO.ProductID, OO.TotalCost
FROM Orders O
JOIN
(SELECT TOP 1 OD.OrderID, OD.ProductID, ((UnitPrice * Quantity) * (1.0-Discount)) AS TotalCost
FROM [Order Details] OD
WHERE OD.OrderID = O.OrderID
) OO ON (O.OrderDate = OO.OrderID)
The idea is to have an "inline view" (i.e. a query) but trying some sort of correlation. When I have a correlated sub-query, I think of it as like a "foreach" statement - for each row returned by the main query, run the sub-query. Which is what I was trying here. But if you change the
JOIN to
CROSS APPLY and remove the
ON clause so you have this SQL
SELECT O.OrderId, O.Freight, OO.ProductID, OO.TotalCost
FROM Orders O
CROSS APPLY
(SELECT TOP 1 OD.OrderID, OD.ProductID, ((UnitPrice * Quantity) * (1.0-Discount)) AS TotalCost
FROM [Order Details] OD
WHERE OD.OrderID = O.OrderID
) OO
You get the results we are looking for
OrderId Freight ProductID TotalCost
----------- --------------------- ----------- -------------
10248 32.38 11 168
10249 11.61 14 167.4
10250 65.83 41 77
.....
11077 8.53 2 364.8
(830 row(s) affected)
A little description of what is happening might be useful. The
CROSS APPLY is sort of like an
INNER JOIN, in so much as the rows returned from the main query are all used with any matches from the query within the
CROSS APPLY clause. So for each row returned by the main query find a row in the
CROSS APPLY query (which has a reference to the outer query - here in the
WHERE clause). Let's prove the fact that only rows in the main query are returned and insert a row into the
[Orders] table
SET IDENTITY_INSERT Orders ON
INSERT INTO Orders
(OrderID, CustomerID, EmployeeID, OrderDate, RequiredDate, ShippedDate, ShipVia, Freight, ShipName, ShipAddress, ShipCity, ShipRegion, ShipPostalCode, ShipCountry)
VALUES
(99999, 'VINET', 5, GETDATE(), GETDATE(), GETDATE(), 3, 40.10, 'Connor Macleod', 'Clan Maclead', 'Glenfinnan', 'Highlands', 'N/A', 'Scotland')
SET IDENTITY_INSERT Orders OFF
If you re-run the query you will see that order 99999 does not display.
You may have noticed in my description of
CROSS APPLY I didn't use the terms inner or outer. Microsoft have't use it in it's naming, but if you read the articles on this it talks about the outer query (the query not in the
CROSS APPLY). There is a further option we can use -
OUTER APPLY. The
OUTER APPLY will include the rows from the outer query that do not have matches in the inner query. So using this query
SELECT O.OrderId, O.Freight, OO.ProductID, OO.TotalCost
FROM Orders O
OUTER APPLY
(SELECT TOP 1 OD.OrderID, OD.ProductID, ((UnitPrice * Quantity) * (1.0-Discount)) AS TotalCost
FROM [Order Details] OD
WHERE OD.OrderID = O.OrderID
) OO
Will show the row that we require
OrderId Freight ProductID TotalCost
----------- --------------------- ----------- -------------
10248 32.38 11 168
10249 11.61 14 167.4
10250 65.83 41 77
.....
11076 38.28 6 375
11077 8.53 2 364.8
99999 40.10 NULL NULL
(831 row(s) affected)
If you want to do this with a Common Table Expression you need to firstly change the join (in our case to a
LEFT OUTER JOIN) and secondly add a clause to the
WHERE (remember we are only returning the top row - which would have
RowNum = 1), we now need to include the rows that aren't included (so
RowNum IS NULL).
;WITH CTEOrderDetails
AS (
SELECT
OD.OrderID, OD.ProductID, ((UnitPrice * Quantity) * (1.0-Discount)) AS TotalCost,
ROW_NUMBER() OVER (PARTITION BY OD.OrderID ORDER BY ((UnitPrice * Quantity) * (1.0-Discount)) DESC) As RowNum
FROM [Order Details] OD)
SELECT O.OrderId, O.Freight, OD.ProductID, OD.TotalCost
FROM Orders O
LEFT OUTER JOIN CTEOrderDetails OD ON (OD.OrderID = O.OrderID)
WHERE OD.RowNum = 1 OR OD.RowNum IS NULL
Now we have the correct results.