Thursday, 25 July 2013

SQL: Using ROW_NUMBER()

This post follows on from post on the use of OVER() and PARTITION and shows the use of ROW_NUMBER().

It is sometimes (often) useful to return a row number for a row in a table that you can then use (e.g. display). Again, using the Northwind Orders table working with the orders for Customers ALFKI, ANATR and ANTON we have the following 17 rows (from the following SQL)
SELECT
   CustomerID, OrderID
FROM Orders
WHERE CustomerID IN ('ALFKI','ANATR','ANTON')
ORDER BY OrderID
giving these orders for each customer
CustomerID OrderID
---------- -----------
ANATR      10308
ANTON      10365
ANTON      10507
ANTON      10535
ANTON      10573
ANATR      10625
ALFKI      10643
ANTON      10677
ANTON      10682
ALFKI      10692
ALFKI      10702
ANATR      10759
ALFKI      10835
ANTON      10856
ANATR      10926
ALFKI      10952
ALFKI      11011
Let's set a few problems to solve:
  • Return a row number corresponding to the order returned above (which is ordered by CustomerID)
  • Return a row number by customer ID and then order ID (so the first entry will be for Customer ALFKI and the lowest order number - which is 10643)
The first one can be solved by using the OVER clause after using the ROW_NUMBER() function. Within the OVER clause we will give it a ORDER BY.
SELECT 
   ROW_NUMBER() OVER (ORDER BY OrderID) AS RowNum,
   CustomerID, 
   OrderID 
FROM Orders
WHERE CustomerID IN ('ALFKI','ANATR','ANTON')
ORDER BY OrderID
The results from this are
RowNum               CustomerID OrderID
-------------------- ---------- -----------
1                    ANATR      10308
2                    ANTON      10365
3                    ANTON      10507
4                    ANTON      10535
5                    ANTON      10573
6                    ANATR      10625
7                    ALFKI      10643
8                    ANTON      10677
9                    ANTON      10682
10                   ALFKI      10692
11                   ALFKI      10702
12                   ANATR      10759
13                   ALFKI      10835
14                   ANTON      10856
15                   ANATR      10926
16                   ALFKI      10952
17                   ALFKI      11011
And the second problem is just an extension of the first - change the ORDER BY to have what we want.
SELECT 
   ROW_NUMBER() OVER ( ORDER BY CustomerID,OrderID) AS RowNum,
   CustomerID, 
   OrderID 
FROM Orders
WHERE CustomerID IN ('ALFKI','ANATR','ANTON')
ORDER BY OrderID
Producing the expected output
RowNum               CustomerID OrderID
-------------------- ---------- -----------
7                    ANATR      10308
11                   ANTON      10365
12                   ANTON      10507
13                   ANTON      10535
14                   ANTON      10573
8                    ANATR      10625
1                    ALFKI      10643
15                   ANTON      10677
16                   ANTON      10682
2                    ALFKI      10692
3                    ALFKI      10702
9                    ANATR      10759
4                    ALFKI      10835
17                   ANTON      10856
10                   ANATR      10926
5                    ALFKI      10952
6                    ALFKI      11011
Let's set another problem. We would like each order to have row numbers starting at 1 for each customer. i.e. the orders for ANTON are 10365, 10507, 10535, 10573 etc - they should have row numbers 1, 2, 3, 4 etc. Similarly the orders for Customer ANATR are 10308, 10625, 10759 and 10926 and they should have row numbers 1, 2, 3 4. For this problem we use both PARTITION and ORDER
SELECT 
   ROW_NUMBER() OVER ( PARTITION BY CustomerID ORDER BY OrderID) AS RowNum,
   CustomerID, 
   OrderID 
FROM Orders
WHERE CustomerID IN ('ALFKI','ANATR','ANTON')
ORDER BY CustomerID, OrderID
giving the output
RowNum               CustomerID OrderID
-------------------- ---------- -----------
1                    ALFKI      10643
2                    ALFKI      10692
3                    ALFKI      10702
4                    ALFKI      10835
5                    ALFKI      10952
6                    ALFKI      11011
1                    ANATR      10308
2                    ANATR      10625
3                    ANATR      10759
4                    ANATR      10926
1                    ANTON      10365
2                    ANTON      10507
3                    ANTON      10535
4                    ANTON      10573
5                    ANTON      10677
6                    ANTON      10682
7                    ANTON      10856
Row Numbers are useful here if you are (say) displaying the orders on a website - rather than create/reset counters within your web language - have the SQL do it all for you. (I am beginning to see less use of facilities like this and more of this type of output performed by web coding). Another useful time to use this is within a Common Table Expression.

Let's set this problem. I'd like to return the two highest freight cost orders for each customer. We can use ROW_NUMBER and order them using this SQL
SELECT 
   ROW_NUMBER() OVER ( PARTITION BY CustomerID ORDER BY Freight DESC) AS RowNum,
   CustomerID, 
   OrderID,
   Freight
FROM Orders
WHERE CustomerID IN ('ALFKI','ANATR','ANTON')
ORDER BY CustomerID, Freight DESC, OrderID
which produces
RowNum               CustomerID OrderID     Freight
-------------------- ---------- ----------- ---------------------
1                    ALFKI      10835       69.53
2                    ALFKI      10692       61.02
3                    ALFKI      10952       40.42
4                    ALFKI      10643       29.46
5                    ALFKI      10702       23.94
6                    ALFKI      11011       1.21
1                    ANATR      10625       43.90
2                    ANATR      10926       39.92
3                    ANATR      10759       11.99
4                    ANATR      10308       1.61
1                    ANTON      10573       84.84
2                    ANTON      10856       58.43
3                    ANTON      10507       47.45
4                    ANTON      10682       36.13
5                    ANTON      10365       22.00
6                    ANTON      10535       15.64
7                    ANTON      10677       4.03
So, we could loop through this only processing rows where RowNum is 1 or 2. However, using a Common Table Expression we could now do this with
WITH CTEOrders
AS
(SELECT 
   ROW_NUMBER() OVER ( PARTITION BY CustomerID ORDER BY Freight DESC) AS RowNum,
   CustomerID, 
   OrderID,
   Freight
FROM Orders
WHERE CustomerID IN ('ALFKI','ANATR','ANTON'))
SELECT * FROM CTEOrders WHERE RowNum IN (1, 2)
Producing exactly what we need!
RowNum               CustomerID OrderID     Freight
-------------------- ---------- ----------- ---------------------
1                    ALFKI      10835       69.53
2                    ALFKI      10692       61.02
1                    ANATR      10625       43.90
2                    ANATR      10926       39.92
1                    ANTON      10573       84.84
2                    ANTON      10856       58.43

Tuesday, 16 July 2013

SQL: Using OVER and PARTITION

Using the OVER clause allows you to return aggregate information within a query - but without needing to use a group by. The clause (within the query) specifies what data you are grouping by.

Let's use an example - from the Northwind database the Orders table has a CustomerID and a Freight (cost). Looking at the first two customers
SELECT TOP 8 WITH TIES CustomerID, Freight
FROM Orders
ORDER BY CustomerID
And the results from this are
CustomerID Freight
---------- ---------------------
ALFKI      29.46
ALFKI      61.02
ALFKI      23.94
ALFKI      69.53
ALFKI      40.42
ALFKI      1.21
ANATR      1.61
ANATR      43.90
ANATR      11.99
ANATR      39.92
If we want to get the Freight cost for each Customer we can do the following
SELECT 
  CustomerID, SUM(Freight) AS FreightPerCustomer 
FROM Orders 
GROUP BY CustomerID
And the results from this are
CustomerID FreightPerCustomer
---------- ---------------------
ALFKI      225.58
ANATR      97.42
ANTON      268.52
AROUT      471.95
...
And to get the total freight cost (in this case for all orders)
SELECT SUM(Freight) AS TotalFreightCost FROM Orders
Which gives us the value
TotalFreightCost
---------------------
64942.69
We can use OVER within a query to produce an aggregate and some other data. Firstly, let's use OVER to tell us the total freight cost along with data for each order. This can be done with
SELECT 
   SUM(Freight) OVER () AS TotalFreightCostForAll, 
   CustomerID, 
   OrderID 
FROM Orders
ORDER BY CustomerID
Producing
TotalFreightCostForAll CustomerID OrderID
---------------------- ---------- -----------
64942.69               ALFKI      10643
64942.69               ALFKI      10692
64942.69               ALFKI      10702
64942.69               ALFKI      10835
64942.69               ALFKI      10952
64942.69               ALFKI      11011
64942.69               ANATR      10926
64942.69               ANATR      10759
64942.69               ANATR      10625
64942.69               ANATR      10308
64942.69               ANTON      10365
...
After the OVER clause we have () - which tells SQL to use all the results returned by the query. If we change the query to only return rows for customer ALFKI with the SQL
SELECT 
   SUM(Freight) OVER () AS TotalFreightCostForAll, 
   CustomerID, 
   OrderID 
FROM Orders
WHERE CustomerID IN ('ALFKI')
ORDER BY CustomerID
the result will be
TotalFreightCostForAll CustomerID OrderID
---------------------- ---------- -----------
225.58                 ALFKI      10643
225.58                 ALFKI      10692
225.58                 ALFKI      10702
225.58                 ALFKI      10835
225.58                 ALFKI      10952
225.58                 ALFKI      11011
But we now have a total. The () produces for the whole result set - we can use PARTITION to split the result set up. Let's examine this SQL
SELECT 
  SUM(Freight) OVER (PARTITION BY CustomerID) AS FreightForCustomer, 
  CustomerID,
  OrderID 
FROM Orders
ORDER BY CustomerID
Producing
FreightForCustomer    CustomerID OrderID
--------------------- ---------- -----------
225.58                ALFKI      10643
225.58                ALFKI      10692
225.58                ALFKI      10702
225.58                ALFKI      10835
225.58                ALFKI      10952
225.58                ALFKI      11011
97.42                 ANATR      10926
97.42                 ANATR      10759
97.42                 ANATR      10625
97.42                 ANATR      10308
268.52                ANTON      10365
...
We can now put these all together - in one query. This query will return the
  • total freight cost
  • the freight cost per customer
  • the percentage freight cost for each customer compared to the total cost
  • the percentage freight cost for an order compared to the customers total
  • the percentage freight cost for an order compared to the overall total
  • plus the CustomerID and OrderID
SELECT 
  SUM(Freight) OVER () AS TotalAll,
  SUM(Freight) OVER (PARTITION BY CustomerID) AS FreightCust,
  SUM(Freight) OVER (PARTITION BY CustomerID) / SUM(Freight) OVER () As PCCustToTotal,
  Freight / SUM(Freight) OVER (PARTITION BY CustomerID) AS PCOrderToCustFreightCost,
  Freight / SUM(Freight) OVER () AS PCOrderToOverallFreight,
  CustomerID,
  OrderID 
FROM Orders
ORDER BY CustomerID
Producing
TotalAll              FreightCust           PCCustToTotal         PCOrderToCustFreightCost PCOrderToOverallFreight CustomerID OrderID
--------------------- --------------------- --------------------- ------------------------ ----------------------- ---------- -----------
64942.69              225.58                0.0034                0.1305                   0.0004                  ALFKI      10643
64942.69              225.58                0.0034                0.2705                   0.0009                  ALFKI      10692
64942.69              225.58                0.0034                0.1061                   0.0003                  ALFKI      10702
64942.69              225.58                0.0034                0.3082                   0.001                   ALFKI      10835
64942.69              225.58                0.0034                0.1791                   0.0006                  ALFKI      10952
64942.69              225.58                0.0034                0.0053                   0.00                    ALFKI      11011
64942.69              97.42                 0.0015                0.4097                   0.0006                  ANATR      10926
64942.69              97.42                 0.0015                0.123                    0.0001                  ANATR      10759
64942.69              97.42                 0.0015                0.4506                   0.0006                  ANATR      10625
64942.69              97.42                 0.0015                0.0165                   0.00                    ANATR      10308
64942.69              268.52                0.0041                0.0819                   0.0003                  ANTON      10365