Wednesday, 14 August 2013

SQL: NTILE() function

You may have read the posts on ROW_NUMBER(), RANK() and DENSE_RANK() functions. There is one more ranking function - NTILE().

This function will return a value for each row that will tell you which "group" the row belongs to. You will specify a value to the NTILE function (e.g. 4) - this tells the function that the group numbers are 1, 2, 3 and 4 (starting at one up to and including the value specified). The rows will be ordered and assigned to the groups depending on the position within the group.

This SQL
SELECT 
   ROW_NUMBER() OVER ( ORDER BY MONTH(OrderDate)) AS RowNum,
   NTILE(4) OVER (ORDER BY MONTH(OrderDate)) AS TileNum,
   MONTH(OrderDate) AS MonthOfOrder,
   CustomerID, 
   OrderID
FROM Orders
WHERE CustomerID IN ('ALFKI','ANATR')
ORDER BY MONTH(OrderDate)
Producing this output
RowNum               TileNum              MonthOfOrder CustomerID OrderID
-------------------- -------------------- ------------ ---------- -----------
1                    1                    1            ALFKI      10835
2                    1                    3            ANATR      10926
3                    1                    3            ALFKI      10952
4                    2                    4            ALFKI      11011
5                    2                    8            ANATR      10625
6                    2                    8            ALFKI      10643
7                    3                    9            ANATR      10308
8                    3                    10           ALFKI      10692
9                    4                    10           ALFKI      10702
10                   4                    11           ANATR      10759
We have ten rows and 4 partitions - partitions 1, 2 and 3 have three rows assigned to that group and the partition 4 only has two rows.

You can also use PARTITION within the OVER clause (as you can do for the other ranking functions).

Saturday, 3 August 2013

SQL: RANK() and DENSE_RANK() functions

This post follows on from the posts on using ROW_NUMBER() - which followed on from the post on OVER and PARTITION. This one deals with the RANK() and DENSE_RANK() functions.

RANK() is very much like the ROW_NUMBER() function - it will rank the rows in a SQL, which will always be ordered (using an ORDER BY clause within the RANK function) and optionally partitioned.

Let's look at the example with ROW_NUMBER(). We will use the Orders table in Northwind - but this time we are going to list them in order of the Month of the order (using the MONTH() function). The SQL is
SELECT 
   ROW_NUMBER() OVER ( ORDER BY MONTH(OrderDate)) AS RowNum,
   MONTH(OrderDate) AS MonthOfOrder,
   CustomerID, 
   OrderID
FROM Orders
WHERE CustomerID IN ('ALFKI','ANATR')
ORDER BY MONTH(OrderDate)
The output is ordered by the Month of the order. This shows Orders 10926 and 10952 are both in month 3 (March).
RowNum               MonthOfOrder CustomerID OrderID
-------------------- ------------ ---------- -----------
1                    1            ALFKI      10835
2                    3            ANATR      10926
3                    3            ALFKI      10952
4                    4            ALFKI      11011
5                    8            ANATR      10625
6                    8            ALFKI      10643
7                    9            ANATR      10308
8                    10           ALFKI      10692
9                    10           ALFKI      10702
10                   11           ANATR      10759
Even though they are both in March - they have different row numbers. However, if we now put in SQL using the RANK() function
SELECT 
   ROW_NUMBER() OVER ( ORDER BY MONTH(OrderDate)) AS RowNum,
   RANK() OVER ( ORDER BY MONTH(OrderDate)) AS RankNum,
   MONTH(OrderDate) AS MonthOfOrder,
   CustomerID, 
   OrderID
FROM Orders
WHERE CustomerID IN ('ALFKI','ANATR')
ORDER BY MONTH(OrderDate)
Now has (in the second column) the output of RANK() - showing that the two orders above (10926 and 10952) now both have a rank of 2. And the rank after this is set to 4.
RowNum               RankNum              MonthOfOrder CustomerID OrderID
-------------------- -------------------- ------------ ---------- -----------
1                    1                    1            ALFKI      10835
2                    2                    3            ANATR      10926
3                    2                    3            ALFKI      10952
4                    4                    4            ALFKI      11011
5                    5                    8            ANATR      10625
6                    5                    8            ALFKI      10643
7                    7                    9            ANATR      10308
8                    8                    10           ALFKI      10692
9                    8                    10           ALFKI      10702
10                   10                   11           ANATR      10759
Another function DENSE_RANK() can be used to remove the "gaps". Above there is no rank 3. So this SQL
SELECT 
   ROW_NUMBER() OVER ( ORDER BY MONTH(OrderDate)) AS RowNum,
   RANK() OVER ( ORDER BY MONTH(OrderDate)) AS RankNum,
   DENSE_RANK() OVER ( ORDER BY MONTH(OrderDate)) AS DenseRankNum,
   MONTH(OrderDate) AS MonthOfOrder,
   CustomerID, 
   OrderID
FROM Orders
WHERE CustomerID IN ('ALFKI','ANATR')
ORDER BY MONTH(OrderDate)
Now removes those gaps
RowNum               RankNum              DenseRankNum         MonthOfOrder CustomerID OrderID
-------------------- -------------------- -------------------- ------------ ---------- -----------
1                    1                    1                    1            ALFKI      10835
2                    2                    2                    3            ANATR      10926
3                    2                    2                    3            ALFKI      10952
4                    4                    3                    4            ALFKI      11011
5                    5                    4                    8            ANATR      10625
6                    5                    4                    8            ALFKI      10643
7                    7                    5                    9            ANATR      10308
8                    8                    6                    10           ALFKI      10692
9                    8                    6                    10           ALFKI      10702
10                   10                   7                    11           ANATR      10759