Monday, 16 September 2013

SQL: User Defined Functions - Scalar - Round up

I might have posted my last article on User Defined Functions (UDF) in SQL a little quick. A few words of warning are probably required, or at least be aware. Plus a discussion of using a UDF in a computed column and a persisted computed column (using the PERSISTED keyword).

The example I gave is a simple function to work with some parameters. In this case we expect that one of the parameters might be NULL and handle this. Because this is some functionality we may want to reuse, we are utilising a UDF. All the good reasons for using a function.

The UDF I showed looks very much like a stored procedure. You can put other code within the function, perform some logic and access other tables. We had a little bit of logic in here (i.e. check if Region IS NULL).

Accessing Other tables

Lets show some use of accessing another table. I am going to make an example with a new table. But firstly, has anyone noticed that in the Northwind database Employee "Robert King" has a postcode RG1 9SP, but has the city as London - not Reading as per the postcode. Let's change his City from London to Reading.
UPDATE Employees SET City='Reading' WHERE EmployeeID=7
Now Reading is in Berkshire, but for the sake of argument (or internal Northwind politics!) we are not able to change the Employees table. We will create another table for Alternative Regions, with City and Region to use
CREATE TABLE AlternativeRegions
( City nvarchar(30) NOT NULL PRIMARY KEY,
  Region nvarchar(30) NOT NULL )

INSERT INTO AlternativeRegions VALUES
  ('Reading', 'Berkshire')
Now we can update our function to access this table
ALTER FUNCTION Location 
     (@City nvarchar(30), @Region nvarchar(30), @PostalCode nvarchar(20))
RETURNS nvarchar(80)
AS
BEGIN
 DECLARE @ret nvarchar(80)
 DECLARE @NewRegion nvarchar(30)
 IF @Region IS NULL
 BEGIN
  SELECT @NewRegion = Region
  FROM AlternativeRegions
  WHERE City = @City

  SELECT @ret = 
      CASE WHEN @NewRegion IS NULL THEN
          @City + ' ' + @PostalCode
        ELSE 
       @City + ', ' + @NewRegion + ' ' + @PostalCode
   END
 END
 ELSE
 BEGIN
  SET @ret = @City + ', ' + @Region + ' ' + @PostalCode
 END
RETURN @ret
END
And we can use this when Region is NULL
SELECT dbo.Location('Reading', NULL, 'RG1 9SP')
producing
Reading, Berkshire RG1 9SP
or within SQL
SELECT 
  EmployeeID, LastName,
  dbo.Location(City, Region, PostalCode) AS Location
FROM Employees

Performance

This example is just to show that you can access other tables.

But, I do need to ask when doing something like this why would you. Why not write SQL and structure to allow you to do something about this. If (as in the last piece of SQL) you were accessing all the rows in the Employee table would you then want to do a look-up of another table? And on a per-row basis (i.e. the table lookup will be done for each row returned).

Doing this type of modification, rather than changing the database structure to allow you the functionality you want seems like a maintenance nightmare. But on a performance point you are accessing each row. If you were doing it with only one row then maybe.

An example to really show this, is let's imagine we don't pass as parameters the City, Region and Postcode, but just have a function that takes the Employee ID.
CREATE FUNCTION EmployeeLocation
   (@EmployeeID int)
RETURNS nvarchar(80)
AS
BEGIN
 DECLARE @ret nvarchar(80)

 SELECT @ret = dbo.Location(City, Region, PostalCode)
 FROM Employees
 WHERE EmployeeID = @EmployeeID

 RETURN @ret
END
GO
And we would use this with
SELECT dbo.EmployeeLocation(9)
For accessing one employee, then we get the output
London WG2 7LT
But we can also do this
SELECT 
 EmployeeID, LastName, dbo.EmployeeLocation(EmployeeID)
FROM 
 Employees
And get our expected output. However, we are now accessing the Employees table once for the main query and each function will then access the Employees table for each row.

My advice is to be careful. SQL is very good with queries and how to optimise them. But when functions are involved it is more difficult.

One thing you cannot do in a is change a table. Anything you do within a scalar function is read-only

Using a computed column

Rather than calling the function you could add a computed column to your table. So in this example
ALTER TABLE Employees
   ADD Location AS dbo.Location(City, Region, PostalCode)
And then we can retrieve a column as if it is part of the table
SELECT EmployeeID, LastName, Location FROM Employees
By default computed columns are not physically stored in the database - but calculated each time they are referenced. There is an option that we can use - PERSISTED computed columns. Let's first try and use the Location function as PERSISTED, with this SQL
ALTER TABLE Employees
   ADD LocationPersisted AS dbo.Location(City, Region, PostalCode) PERSISTED
Gives us an error
Msg 4936, Level 16, State 1, Line 1
Computed column 'LocationPersisted' in table 'Employees' 
   cannot be persisted because the column is non-deterministic.
So what is a non-deterministic column. A non-deterministic function returns different results each time they are called with a specific set of parameters, where as a deterministic function will always return the same result any time they are called. Our current Location function accesses our AlternativeRegions table - so this immediately makes it non deterministic. So let's revert to our previous version and create a function LocationNonDeterministic
ALTER FUNCTION LocationNonDeterministic 
     (@City nvarchar(30), @Region nvarchar(30), @PostalCode nvarchar(20))
RETURNS nvarchar(80)
WITH SCHEMABINDING
AS
BEGIN
 DECLARE @ret varchar(30)
 SELECT @ret=
   CASE
  WHEN @Region IS NULL THEN @City + ' ' + @PostalCode
  ELSE @City + ', ' + @Region + ' ' + @PostalCode
   END
 
RETURN @ret
END
Now if we try and add this computed column as PERSISTED we will get the same error.

Another requirement of a non-deterministic is it is "schema bound", here we add WITH SCHEMABINDING. This essentially means that the since the function is used elsewhere, we need to remove the dependencies to it before changing it. To create the function, we can do this
ALTER FUNCTION LocationNonDeterministic 
     (@City nvarchar(30), @Region nvarchar(30), @PostalCode nvarchar(20))
RETURNS nvarchar(80)
WITH SCHEMABINDING
AS
BEGIN
 DECLARE @ret varchar(30)
 SELECT @ret=
   CASE
  WHEN @Region IS NULL THEN @City + ' ' + @PostalCode
  ELSE @City + ', ' + @Region + ' ' + @PostalCode
   END
 
RETURN @ret
END
GO
And now to add the persisted computed column
ALTER TABLE Employees
   ADD LocationPersisted AS dbo.LocationNonDeterministic(City, Region, PostalCode) PERSISTED 
Now we have a computed column which is persisted in the database. Remember our table has now got bigger. Also, we cannot change this function without dropping the persisted column, changing the function and then adding the persisted computed column back.

No comments:

Post a Comment