Thursday, 14 August 2014

SQL: More PIVOT

As you have seen in the previous article the use of PIVOT with the months of the year. This is a common use of PIVOT.

Let's look at another example. Again, using Northwind. This time what I'd like to achieve is a list of employees as columns and a total of orders each has made in each month/year. Let's start by looking at the raw data

SELECT OrderDate, OrderId, CustomerId, EmployeeId FROM Orders
ORDER BY OrderDate, EmployeeId
Gives us some output that isn't quite the data we are looking for
OrderDate               OrderId     CustomerId EmployeeId
----------------------- ----------- ---------- -----------
1996-07-04 00:00:00.000 10248       VINET      5
1996-07-05 00:00:00.000 10249       TOMSP      6
1996-07-08 00:00:00.000 10251       VICTE      3
1996-07-08 00:00:00.000 10250       HANAR      4
1996-07-09 00:00:00.000 10252       SUPRD      4
...
1998-05-06 00:00:00.000 11074       SIMOB      7
1998-05-06 00:00:00.000 11075       RICSU      8

(830 row(s) affected)
So we will get this with the year and month of the OrderDate and then sort it
SELECT
    YEAR(OrderDate) AS YY, 
    MONTH(OrderDate) AS MM, 
    OrderId, 
    CustomerId, 
    EmployeeId FROM Orders
ORDER BY 
    YEAR(OrderDate), 
    MONTH(OrderDate),
    EmployeeId
Giving us something a little bit better
YY          MM          OrderId     CustomerId EmployeeId
----------- ----------- ----------- ---------- -----------
1996        7           10258       ERNSH      1
1996        7           10265       BLONP      2
1996        7           10266       WARTH      3
1996        7           10251       VICTE      3
1996        7           10253       HANAR      3
1996        7           10256       WELLI      3
1996        7           10257       HILAA      4
...
We can see that EmployeeId 1 sold one order in July 1996, EmployeeId 3 sold 4 etc. We can summarise this with
SELECT
    YEAR(OrderDate) AS YY, 
    MONTH(OrderDate) AS MM, 
    EmployeeId, 
    COUNT(*) AS NumberOfOrders 
FROM Orders
GROUP BY 
    YEAR(OrderDate), Month(OrderDate), EmployeeId
ORDER BY 
    YEAR(OrderDate), Month(OrderDate), EmployeeId
Producing
YY          MM          EmployeeId  NumberOfOrders
----------- ----------- ----------- --------------
1996        7           1           1
1996        7           2           1
1996        7           3           4
1996        7           4           7
...
Now to PIVOT this we can simply run this
SELECT * FROM
(
    SELECT 
        YEAR(OrderDate) As YY, 
        MONTH(OrderDate) AS MM, 
        CustomerId,
        EmployeeId from Orders) As Source
PIVOT
(
    COUNT(CustomerId) FOR EmployeeId IN ( [1], [2], [3], [4] )
) AS PivotTable
This gives us the output for Employees 1, 2, 3 and 4
YY          MM          1           2           3           4
----------- ----------- ----------- ----------- ----------- -----------
1997        1           3           4           7           8
1996        8           5           2           2           5
1997        9           8           7           4           5
...
1996        7           1           1           4           7
1998        4           8           18          10          10

(23 row(s) affected)
Not quite right. I'd rather it was in some sort of order
SELECT * FROM
(
    SELECT 
        YEAR(OrderDate) As YY,
        MONTH(OrderDate) AS MM,
        CustomerId,
        EmployeeId 
    FROM Orders) As Source
PIVOT
(
    COUNT(CustomerId) FOR EmployeeId IN ( [1], [2], [3], [4] )
) AS PivotTable
ORDER BY YY, MM
One thing to note - in the ORDER BY clause you can use the column names - you don't have to repeat the clause. The output will be
YY          MM          1           2           3           4
----------- ----------- ----------- ----------- ----------- -----------
1996        7           1           1           4           7
1996        8           5           2           2           5
1996        9           5           5           1           3
...
There are however nine employees. So to get all nine employees we need to modify our query for this.
SELECT * FROM
(
    SELECT 
        YEAR(OrderDate) As YY, 
        MONTH(OrderDate) AS MM, 
        CustomerId, 
        EmployeeId FROM Orders) As Source
PIVOT
(
    COUNT(CustomerId) FOR EmployeeId 
         IN ( [1], [2], [3], [4], [5], [6], [7], [8], [9] )
) AS PivotTable
ORDER BY YY, MM
Finally giving us
YY    MM    1     2     3     4     5     6     7     8     9
----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
1996  7     1     1     4     7     3     2     0     2     2
1996  8     5     2     2     5     0     4     1     6     0
1996  9     5     5     1     3     1     3     2     3     0
...
This creates an issue. What happens when a new Employee joins. We need to modify our query to cope with this. The next article will be about creating a Dynamic PIVOT. No real rocket science, but using generated SQL.

Tuesday, 29 July 2014

SQL: Introduction to the PIVOT keyword

I haven't written in a while. Normally my articles have been some type of more advanced coding which I have directed some of the talented apprentices I worked with to look at. However, I have moved back to development so there wasn't the greatest impetus. But, I think I will keep going with some more advanced topics. This one is on the use of PIVOT and I have a few more in a series of articles on this.

As before I will be using the Northwind database. Let's set an initial target: we want the total Freight cost in each month. So in the database we can run this SQL to get the list of orders along with the order date and freight cost.
SELECT OrderId, OrderDate, Freight FROM Orders
Produces
OrderId     OrderDate               Freight
----------- ------------- ---------------------
10248       1996-07-04 00:00:00.000 32.38
10249       1996-07-05 00:00:00.000 11.61
10250       1996-07-08 00:00:00.000 65.83
...
11076       1998-05-06 00:00:00.000 38.28
11077       1998-05-06 00:00:00.000 8.53

(830 row(s) affected)
We can get the month using the MONTH function (or DATEPART), like this
SELECT OrderId, Month(OrderDate) AS MonthNumber, Freight FROM Orders
Producing
OrderId     MonthNumber Freight
----------- ----------- ---------------------
10248       7           32.38
10249       7           11.61
10250       7           65.83
...
11076       5           38.28
11077       5           8.53

(830 row(s) affected)
And this data could be summarised using this
SELECT MONTH(OrderDate) AS MonthNumber, SUM(Freight) AS TotalFreight FROM Orders
GROUP BY MONTH(OrderDate)
ORDER BY 1
giving us this output
MonthNumber TotalFreight
----------- ---------------------
1           7702.42
2           5874.39
3           7267.83
4           9332.67
5           4146.48
6           1852.65
7           3746.90
8           4475.44
9           4360.53
10          5466.12
11          4160.71
12          6556.55
The PIVOT function (at it's simplest) will allow us to "Pivot" the data - we will take the month number and make each one our column headings.
SELECT *
FROM 
  (SELECT MONTH(OrderDate) AS MonthNumber, Freight FROM Orders) AS Source
PIVOT
(SUM(Freight)
 FOR MonthNumber 
    IN ([1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12])) AS PivotTable
And now we have our Month Numbers as columns
1        2        3        4        5        6        
-------- -------- -------- -------- -------- -------- 
7702.42  5874.39  7267.83  9332.67  4146.48  1852.65  

7        8        9        10       11       12
-------- -------- -------- -------- -------- --------
3746.90  4475.44  4360.53  5466.12  4160.71  6556.55
Let's look at the syntax of the PIVOT statement (or at least this simple example). We have what is called an "in-line" view for our source data. This is simply our MonthNumber and the Freight cost. If you run this as a separate statement you will get a list of orders (in this case without the OrderId) and the freight cost.

We then have the PIVOT keyword and after this (in brackets) the details of what we are pivoting. You are required (i.e. YOU NEED) to have an aggregation function. In this case we are using SUM (to get the total freight cost for each month). Then we have
FOR MonthNumber IN ([1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12])
This defines where the data will come from (in this case the MonthNumber in our source data) and which column values will be shown (the values 1, 2, 3, etc up to 12). The values in these columns must be convertabile to a string (or more specifically nvarchar) - so can't be an image or text etc.

You need to have the table aliases - even though you don't refer to them. The PIVOT operation will perform the grouping - you do not specify a GROUP BY clause.

There will be a few more articles in this series, looking at more complex examples, a dynamic PIVOT and using UNPIVOT as well as a bit more detail. I like to start with an example rather than the theory. Before those articles here is a slightly more complex example
SELECT *
FROM 
   (SELECT YEAR(OrderDate) AS Year, MONTH(OrderDate) AS MonthNumber, Freight FROM Orders) 
       AS Source
PIVOT
(SUM(Freight)
 FOR MonthNumber 
    IN ([1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12])) AS PivotTable
The only change is that an extra column has been added to our source - Year. And if you run this you get
Year   1         2         3         4         5         6         
------ --------- --------- --------- --------- --------- --------- 
1998   5463.44   4272.94   5379.02   6393.57   685.08    NULL      
1996   NULL      NULL      NULL      NULL      NULL      NULL      
1997   2238.98   1601.45   1888.81   2939.10   3461.40   1852.65   

7         8         9         10        11        12
--------- --------- --------- --------- --------- ---------
NULL      NULL      NULL      NULL      NULL      NULL
1288.18   1397.17   1123.48   1520.59   2151.86   2798.59
2458.72   3078.27   3237.05   3945.53   2008.85   3757.96
Suddenly we have a Year output as well - no extra GROUP BY (this has been done for us). We can order this (add an ORDER BY clause).

One thing to note is that we have NULL's in our output. When we use the SUM aggregate method, if there is nothing to SUM (i.e. for the Month and Year) the output will be NULL.

Let's go back to our source data and run this

SELECT 
     YEAR(OrderDate) AS Year, 
     MONTH(OrderDate) AS MonthNumber, 
     SUM(Freight) AS TotalFreight 
FROM Orders
GROUP BY YEAR(OrderDate), MONTH(OrderDate)
ORDER BY 1
You get output
Year        MonthNumber TotalFreight
----------- ----------- ---------------------
1996        7           1288.18
1996        8           1397.17
1996        9           1123.48
...
1998        4           6393.57
1998        5           685.08

(23 row(s) affected)
There is no entry for January in 1996. So in our source data we can't use some nifty ISNULL method to get our data. It is unfortunately the same for the PIVOT method (although we will look at some techniques for handling this in a later article).

However, if you use the COUNT aggregate function you get zero's
SELECT * 
FROM 
 (SELECT YEAR(OrderDate) AS Year, MONTH(OrderDate) AS MonthNumber, Freight FROM Orders) 
    AS Source
PIVOT
(COUNT(Freight) 
 FOR MonthNumber 
    IN ([1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12])) AS PivotTable
ORDER BY 1
Will produce
Year        1           2           3           4           5           6           
----------- ----------- ----------- ----------- ----------- ----------- ----------- 
1996        0           0           0           0           0           0           
1997        33          29          30          31          32          30          
1998        55          54          73          74          14          0           

7           8           9           10          11          12
----------- ----------- ----------- ----------- ----------- -----------
22          25          23          26          25          31
33          33          37          38          34          48
0           0           0           0           0           0

Monday, 24 February 2014

SQL: Using a Collation within a SELECT list or WHERE clause

Sometimes you when you do comparisons on string data within a table you do want it to be case sensitive. However for a lot of queries (or table structures to be more specific) they will come out as case insensitive. For example, the following query on the table Employees in Northwind
SELECT EmployeeID, LastName, FirstName FROM Employees
WHERE FirstName IN ('Andrew', 'robert')
Will return two rows:
EmployeeID  LastName             FirstName
----------- -------------------- ----------
2           Fuller               Andrew
7           King                 Robert
Even though I asked for FirstName of 'robert' - I got Robert (with an uppercase R).

This is due to the collation sequence. If you use sp_help Employees it will show you the structure of the table. I am using this query to return the structure (which is part of the query with in the system stored procedure sp_help).
SELECT
  'Column_name' = name,
  'Type' = type_name(user_type_id),
  'Computed' = CASE WHEN ColumnProperty(object_id, name, 'IsComputed') = 0 
     THEN 'No' ELSE 'Yes' END,
  'Length' = convert(int, max_length),
  'Nullable' = CASE WHEN is_nullable = 0 THEN 'No' ELSE 'Yes' END,
  'Collation' = collation_name
FROM 
  sys.all_columns
WHERE
  object_id = 245575913
AND 
  name IN ('EmployeeID', 'LastName', 'FirstName')
(If you are going to do this check your object_id is correct).

This is showing the structure as
Column_name  Type   Computed Length      Nullable Collation
------------ ------ -------- ----------- -------- --------------------
EmployeeID   int    No       4           No       NULL
LastName     nvarch No       40          No       Latin1_General_CI_AS
FirstName    nvarch No       20          No       Latin1_General_CI_AS
The collation here is Latin1_General_CI_AS - the CI tells us it is case insensitive (AS is accent insensitive and if you see WS/WI this is width sensitivity/insensitivity and KS/KI is kana sensitivity/insensitivity respectively).

But what if I want to perform queries that are case sensitive. After all when I display the data it is in the case as it is stored.

I can convert a column to a different collation - this is done by adding after the column name the COLLATION keyword and the name of collation I want. Thus
SELECT EmployeeID, LastName, FirstName,
FirstName COLLATE Latin1_General_CS_AS AS CS_FirstName
FROM Employees
WHERE FirstName IN ('Andrew', 'robert')
Adds an additional column (with an alias CS_FirstName):
EmployeeID  LastName             FirstName  CS_FirstName
----------- -------------------- ---------- ------------
2           Fuller               Andrew     Andrew
7           King                 Robert     Robert
But I can't use a column alias in a where clause, so using something like this
SELECT EmployeeID, LastName, FirstName,
FirstName COLLATE Latin1_General_CS_AS AS CS_FirstName
FROM Employees
WHERE FirstName COLLATE Latin1_General_CS_AS IN ('Andrew', 'robert')
will now give us the correct results.
EmployeeID  LastName             FirstName  CS_FirstName
----------- -------------------- ---------- ------------
2           Fuller               Andrew     Andrew
Of course you don't need the computed column in the final output.