Tuesday, 16 July 2013

SQL: Using OVER and PARTITION

Using the OVER clause allows you to return aggregate information within a query - but without needing to use a group by. The clause (within the query) specifies what data you are grouping by.

Let's use an example - from the Northwind database the Orders table has a CustomerID and a Freight (cost). Looking at the first two customers
SELECT TOP 8 WITH TIES CustomerID, Freight
FROM Orders
ORDER BY CustomerID
And the results from this are
CustomerID Freight
---------- ---------------------
ALFKI      29.46
ALFKI      61.02
ALFKI      23.94
ALFKI      69.53
ALFKI      40.42
ALFKI      1.21
ANATR      1.61
ANATR      43.90
ANATR      11.99
ANATR      39.92
If we want to get the Freight cost for each Customer we can do the following
SELECT 
  CustomerID, SUM(Freight) AS FreightPerCustomer 
FROM Orders 
GROUP BY CustomerID
And the results from this are
CustomerID FreightPerCustomer
---------- ---------------------
ALFKI      225.58
ANATR      97.42
ANTON      268.52
AROUT      471.95
...
And to get the total freight cost (in this case for all orders)
SELECT SUM(Freight) AS TotalFreightCost FROM Orders
Which gives us the value
TotalFreightCost
---------------------
64942.69
We can use OVER within a query to produce an aggregate and some other data. Firstly, let's use OVER to tell us the total freight cost along with data for each order. This can be done with
SELECT 
   SUM(Freight) OVER () AS TotalFreightCostForAll, 
   CustomerID, 
   OrderID 
FROM Orders
ORDER BY CustomerID
Producing
TotalFreightCostForAll CustomerID OrderID
---------------------- ---------- -----------
64942.69               ALFKI      10643
64942.69               ALFKI      10692
64942.69               ALFKI      10702
64942.69               ALFKI      10835
64942.69               ALFKI      10952
64942.69               ALFKI      11011
64942.69               ANATR      10926
64942.69               ANATR      10759
64942.69               ANATR      10625
64942.69               ANATR      10308
64942.69               ANTON      10365
...
After the OVER clause we have () - which tells SQL to use all the results returned by the query. If we change the query to only return rows for customer ALFKI with the SQL
SELECT 
   SUM(Freight) OVER () AS TotalFreightCostForAll, 
   CustomerID, 
   OrderID 
FROM Orders
WHERE CustomerID IN ('ALFKI')
ORDER BY CustomerID
the result will be
TotalFreightCostForAll CustomerID OrderID
---------------------- ---------- -----------
225.58                 ALFKI      10643
225.58                 ALFKI      10692
225.58                 ALFKI      10702
225.58                 ALFKI      10835
225.58                 ALFKI      10952
225.58                 ALFKI      11011
But we now have a total. The () produces for the whole result set - we can use PARTITION to split the result set up. Let's examine this SQL
SELECT 
  SUM(Freight) OVER (PARTITION BY CustomerID) AS FreightForCustomer, 
  CustomerID,
  OrderID 
FROM Orders
ORDER BY CustomerID
Producing
FreightForCustomer    CustomerID OrderID
--------------------- ---------- -----------
225.58                ALFKI      10643
225.58                ALFKI      10692
225.58                ALFKI      10702
225.58                ALFKI      10835
225.58                ALFKI      10952
225.58                ALFKI      11011
97.42                 ANATR      10926
97.42                 ANATR      10759
97.42                 ANATR      10625
97.42                 ANATR      10308
268.52                ANTON      10365
...
We can now put these all together - in one query. This query will return the
  • total freight cost
  • the freight cost per customer
  • the percentage freight cost for each customer compared to the total cost
  • the percentage freight cost for an order compared to the customers total
  • the percentage freight cost for an order compared to the overall total
  • plus the CustomerID and OrderID
SELECT 
  SUM(Freight) OVER () AS TotalAll,
  SUM(Freight) OVER (PARTITION BY CustomerID) AS FreightCust,
  SUM(Freight) OVER (PARTITION BY CustomerID) / SUM(Freight) OVER () As PCCustToTotal,
  Freight / SUM(Freight) OVER (PARTITION BY CustomerID) AS PCOrderToCustFreightCost,
  Freight / SUM(Freight) OVER () AS PCOrderToOverallFreight,
  CustomerID,
  OrderID 
FROM Orders
ORDER BY CustomerID
Producing
TotalAll              FreightCust           PCCustToTotal         PCOrderToCustFreightCost PCOrderToOverallFreight CustomerID OrderID
--------------------- --------------------- --------------------- ------------------------ ----------------------- ---------- -----------
64942.69              225.58                0.0034                0.1305                   0.0004                  ALFKI      10643
64942.69              225.58                0.0034                0.2705                   0.0009                  ALFKI      10692
64942.69              225.58                0.0034                0.1061                   0.0003                  ALFKI      10702
64942.69              225.58                0.0034                0.3082                   0.001                   ALFKI      10835
64942.69              225.58                0.0034                0.1791                   0.0006                  ALFKI      10952
64942.69              225.58                0.0034                0.0053                   0.00                    ALFKI      11011
64942.69              97.42                 0.0015                0.4097                   0.0006                  ANATR      10926
64942.69              97.42                 0.0015                0.123                    0.0001                  ANATR      10759
64942.69              97.42                 0.0015                0.4506                   0.0006                  ANATR      10625
64942.69              97.42                 0.0015                0.0165                   0.00                    ANATR      10308
64942.69              268.52                0.0041                0.0819                   0.0003                  ANTON      10365


No comments:

Post a Comment