Tuesday, 29 July 2014

SQL: Introduction to the PIVOT keyword

I haven't written in a while. Normally my articles have been some type of more advanced coding which I have directed some of the talented apprentices I worked with to look at. However, I have moved back to development so there wasn't the greatest impetus. But, I think I will keep going with some more advanced topics. This one is on the use of PIVOT and I have a few more in a series of articles on this.

As before I will be using the Northwind database. Let's set an initial target: we want the total Freight cost in each month. So in the database we can run this SQL to get the list of orders along with the order date and freight cost.
SELECT OrderId, OrderDate, Freight FROM Orders
Produces
OrderId     OrderDate               Freight
----------- ------------- ---------------------
10248       1996-07-04 00:00:00.000 32.38
10249       1996-07-05 00:00:00.000 11.61
10250       1996-07-08 00:00:00.000 65.83
...
11076       1998-05-06 00:00:00.000 38.28
11077       1998-05-06 00:00:00.000 8.53

(830 row(s) affected)
We can get the month using the MONTH function (or DATEPART), like this
SELECT OrderId, Month(OrderDate) AS MonthNumber, Freight FROM Orders
Producing
OrderId     MonthNumber Freight
----------- ----------- ---------------------
10248       7           32.38
10249       7           11.61
10250       7           65.83
...
11076       5           38.28
11077       5           8.53

(830 row(s) affected)
And this data could be summarised using this
SELECT MONTH(OrderDate) AS MonthNumber, SUM(Freight) AS TotalFreight FROM Orders
GROUP BY MONTH(OrderDate)
ORDER BY 1
giving us this output
MonthNumber TotalFreight
----------- ---------------------
1           7702.42
2           5874.39
3           7267.83
4           9332.67
5           4146.48
6           1852.65
7           3746.90
8           4475.44
9           4360.53
10          5466.12
11          4160.71
12          6556.55
The PIVOT function (at it's simplest) will allow us to "Pivot" the data - we will take the month number and make each one our column headings.
SELECT *
FROM 
  (SELECT MONTH(OrderDate) AS MonthNumber, Freight FROM Orders) AS Source
PIVOT
(SUM(Freight)
 FOR MonthNumber 
    IN ([1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12])) AS PivotTable
And now we have our Month Numbers as columns
1        2        3        4        5        6        
-------- -------- -------- -------- -------- -------- 
7702.42  5874.39  7267.83  9332.67  4146.48  1852.65  

7        8        9        10       11       12
-------- -------- -------- -------- -------- --------
3746.90  4475.44  4360.53  5466.12  4160.71  6556.55
Let's look at the syntax of the PIVOT statement (or at least this simple example). We have what is called an "in-line" view for our source data. This is simply our MonthNumber and the Freight cost. If you run this as a separate statement you will get a list of orders (in this case without the OrderId) and the freight cost.

We then have the PIVOT keyword and after this (in brackets) the details of what we are pivoting. You are required (i.e. YOU NEED) to have an aggregation function. In this case we are using SUM (to get the total freight cost for each month). Then we have
FOR MonthNumber IN ([1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12])
This defines where the data will come from (in this case the MonthNumber in our source data) and which column values will be shown (the values 1, 2, 3, etc up to 12). The values in these columns must be convertabile to a string (or more specifically nvarchar) - so can't be an image or text etc.

You need to have the table aliases - even though you don't refer to them. The PIVOT operation will perform the grouping - you do not specify a GROUP BY clause.

There will be a few more articles in this series, looking at more complex examples, a dynamic PIVOT and using UNPIVOT as well as a bit more detail. I like to start with an example rather than the theory. Before those articles here is a slightly more complex example
SELECT *
FROM 
   (SELECT YEAR(OrderDate) AS Year, MONTH(OrderDate) AS MonthNumber, Freight FROM Orders) 
       AS Source
PIVOT
(SUM(Freight)
 FOR MonthNumber 
    IN ([1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12])) AS PivotTable
The only change is that an extra column has been added to our source - Year. And if you run this you get
Year   1         2         3         4         5         6         
------ --------- --------- --------- --------- --------- --------- 
1998   5463.44   4272.94   5379.02   6393.57   685.08    NULL      
1996   NULL      NULL      NULL      NULL      NULL      NULL      
1997   2238.98   1601.45   1888.81   2939.10   3461.40   1852.65   

7         8         9         10        11        12
--------- --------- --------- --------- --------- ---------
NULL      NULL      NULL      NULL      NULL      NULL
1288.18   1397.17   1123.48   1520.59   2151.86   2798.59
2458.72   3078.27   3237.05   3945.53   2008.85   3757.96
Suddenly we have a Year output as well - no extra GROUP BY (this has been done for us). We can order this (add an ORDER BY clause).

One thing to note is that we have NULL's in our output. When we use the SUM aggregate method, if there is nothing to SUM (i.e. for the Month and Year) the output will be NULL.

Let's go back to our source data and run this

SELECT 
     YEAR(OrderDate) AS Year, 
     MONTH(OrderDate) AS MonthNumber, 
     SUM(Freight) AS TotalFreight 
FROM Orders
GROUP BY YEAR(OrderDate), MONTH(OrderDate)
ORDER BY 1
You get output
Year        MonthNumber TotalFreight
----------- ----------- ---------------------
1996        7           1288.18
1996        8           1397.17
1996        9           1123.48
...
1998        4           6393.57
1998        5           685.08

(23 row(s) affected)
There is no entry for January in 1996. So in our source data we can't use some nifty ISNULL method to get our data. It is unfortunately the same for the PIVOT method (although we will look at some techniques for handling this in a later article).

However, if you use the COUNT aggregate function you get zero's
SELECT * 
FROM 
 (SELECT YEAR(OrderDate) AS Year, MONTH(OrderDate) AS MonthNumber, Freight FROM Orders) 
    AS Source
PIVOT
(COUNT(Freight) 
 FOR MonthNumber 
    IN ([1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12])) AS PivotTable
ORDER BY 1
Will produce
Year        1           2           3           4           5           6           
----------- ----------- ----------- ----------- ----------- ----------- ----------- 
1996        0           0           0           0           0           0           
1997        33          29          30          31          32          30          
1998        55          54          73          74          14          0           

7           8           9           10          11          12
----------- ----------- ----------- ----------- ----------- -----------
22          25          23          26          25          31
33          33          37          38          34          48
0           0           0           0           0           0