Thursday, 25 July 2013

SQL: Using ROW_NUMBER()

This post follows on from post on the use of OVER() and PARTITION and shows the use of ROW_NUMBER().

It is sometimes (often) useful to return a row number for a row in a table that you can then use (e.g. display). Again, using the Northwind Orders table working with the orders for Customers ALFKI, ANATR and ANTON we have the following 17 rows (from the following SQL)
SELECT
   CustomerID, OrderID
FROM Orders
WHERE CustomerID IN ('ALFKI','ANATR','ANTON')
ORDER BY OrderID
giving these orders for each customer
CustomerID OrderID
---------- -----------
ANATR      10308
ANTON      10365
ANTON      10507
ANTON      10535
ANTON      10573
ANATR      10625
ALFKI      10643
ANTON      10677
ANTON      10682
ALFKI      10692
ALFKI      10702
ANATR      10759
ALFKI      10835
ANTON      10856
ANATR      10926
ALFKI      10952
ALFKI      11011
Let's set a few problems to solve:
  • Return a row number corresponding to the order returned above (which is ordered by CustomerID)
  • Return a row number by customer ID and then order ID (so the first entry will be for Customer ALFKI and the lowest order number - which is 10643)
The first one can be solved by using the OVER clause after using the ROW_NUMBER() function. Within the OVER clause we will give it a ORDER BY.
SELECT 
   ROW_NUMBER() OVER (ORDER BY OrderID) AS RowNum,
   CustomerID, 
   OrderID 
FROM Orders
WHERE CustomerID IN ('ALFKI','ANATR','ANTON')
ORDER BY OrderID
The results from this are
RowNum               CustomerID OrderID
-------------------- ---------- -----------
1                    ANATR      10308
2                    ANTON      10365
3                    ANTON      10507
4                    ANTON      10535
5                    ANTON      10573
6                    ANATR      10625
7                    ALFKI      10643
8                    ANTON      10677
9                    ANTON      10682
10                   ALFKI      10692
11                   ALFKI      10702
12                   ANATR      10759
13                   ALFKI      10835
14                   ANTON      10856
15                   ANATR      10926
16                   ALFKI      10952
17                   ALFKI      11011
And the second problem is just an extension of the first - change the ORDER BY to have what we want.
SELECT 
   ROW_NUMBER() OVER ( ORDER BY CustomerID,OrderID) AS RowNum,
   CustomerID, 
   OrderID 
FROM Orders
WHERE CustomerID IN ('ALFKI','ANATR','ANTON')
ORDER BY OrderID
Producing the expected output
RowNum               CustomerID OrderID
-------------------- ---------- -----------
7                    ANATR      10308
11                   ANTON      10365
12                   ANTON      10507
13                   ANTON      10535
14                   ANTON      10573
8                    ANATR      10625
1                    ALFKI      10643
15                   ANTON      10677
16                   ANTON      10682
2                    ALFKI      10692
3                    ALFKI      10702
9                    ANATR      10759
4                    ALFKI      10835
17                   ANTON      10856
10                   ANATR      10926
5                    ALFKI      10952
6                    ALFKI      11011
Let's set another problem. We would like each order to have row numbers starting at 1 for each customer. i.e. the orders for ANTON are 10365, 10507, 10535, 10573 etc - they should have row numbers 1, 2, 3, 4 etc. Similarly the orders for Customer ANATR are 10308, 10625, 10759 and 10926 and they should have row numbers 1, 2, 3 4. For this problem we use both PARTITION and ORDER
SELECT 
   ROW_NUMBER() OVER ( PARTITION BY CustomerID ORDER BY OrderID) AS RowNum,
   CustomerID, 
   OrderID 
FROM Orders
WHERE CustomerID IN ('ALFKI','ANATR','ANTON')
ORDER BY CustomerID, OrderID
giving the output
RowNum               CustomerID OrderID
-------------------- ---------- -----------
1                    ALFKI      10643
2                    ALFKI      10692
3                    ALFKI      10702
4                    ALFKI      10835
5                    ALFKI      10952
6                    ALFKI      11011
1                    ANATR      10308
2                    ANATR      10625
3                    ANATR      10759
4                    ANATR      10926
1                    ANTON      10365
2                    ANTON      10507
3                    ANTON      10535
4                    ANTON      10573
5                    ANTON      10677
6                    ANTON      10682
7                    ANTON      10856
Row Numbers are useful here if you are (say) displaying the orders on a website - rather than create/reset counters within your web language - have the SQL do it all for you. (I am beginning to see less use of facilities like this and more of this type of output performed by web coding). Another useful time to use this is within a Common Table Expression.

Let's set this problem. I'd like to return the two highest freight cost orders for each customer. We can use ROW_NUMBER and order them using this SQL
SELECT 
   ROW_NUMBER() OVER ( PARTITION BY CustomerID ORDER BY Freight DESC) AS RowNum,
   CustomerID, 
   OrderID,
   Freight
FROM Orders
WHERE CustomerID IN ('ALFKI','ANATR','ANTON')
ORDER BY CustomerID, Freight DESC, OrderID
which produces
RowNum               CustomerID OrderID     Freight
-------------------- ---------- ----------- ---------------------
1                    ALFKI      10835       69.53
2                    ALFKI      10692       61.02
3                    ALFKI      10952       40.42
4                    ALFKI      10643       29.46
5                    ALFKI      10702       23.94
6                    ALFKI      11011       1.21
1                    ANATR      10625       43.90
2                    ANATR      10926       39.92
3                    ANATR      10759       11.99
4                    ANATR      10308       1.61
1                    ANTON      10573       84.84
2                    ANTON      10856       58.43
3                    ANTON      10507       47.45
4                    ANTON      10682       36.13
5                    ANTON      10365       22.00
6                    ANTON      10535       15.64
7                    ANTON      10677       4.03
So, we could loop through this only processing rows where RowNum is 1 or 2. However, using a Common Table Expression we could now do this with
WITH CTEOrders
AS
(SELECT 
   ROW_NUMBER() OVER ( PARTITION BY CustomerID ORDER BY Freight DESC) AS RowNum,
   CustomerID, 
   OrderID,
   Freight
FROM Orders
WHERE CustomerID IN ('ALFKI','ANATR','ANTON'))
SELECT * FROM CTEOrders WHERE RowNum IN (1, 2)
Producing exactly what we need!
RowNum               CustomerID OrderID     Freight
-------------------- ---------- ----------- ---------------------
1                    ALFKI      10835       69.53
2                    ALFKI      10692       61.02
1                    ANATR      10625       43.90
2                    ANATR      10926       39.92
1                    ANTON      10573       84.84
2                    ANTON      10856       58.43

No comments:

Post a Comment