Friday 28 January 2011

T-SQL: Inline TVF vs Standard TVF

There is sometimes a confusion about when you should make use of a standard TVF and an inline TVF. I have often found myself ignoring the merits of an ITVF and instead just ploughing ahead with STVF, possibly because the syntax just pops out of my fingers and its also the most common one I use.

However, recently I came across an issue whereby a TVF was taking a particularly long time to complete and causing performance issues. I checked the definition of the function to see the query construction and the tables referenced but my first instinct was to check the existence of indexes and possible fragmentation. On glancing back at the function I noticed that although it was returning the results of a single query, it was written as a standard TVF. It was then that I had a moment of clarity and rewriting the query as an inline TVF had immediate performance benefits.

An inline TVF is essentially just a stored parameterised sql statement, similar to a view. A standard TVF is much more than that and is intented to run a batch of SQL statements with the return value being a table of results. At its simplest when using a standard TVF you essentially run a query and insert the results into a table variable and return this table variable. Even as I type that it sounds crazy that in that scenario I would choose a standard over an inline, but thats what I have seen time and again (and been guilty of myself!). There is an obvious overhead of writing the data to a table variable (hitting tempdb) and then just reading that data straight back out. Much more sensible to read the data straight.

I've created a simple example which illustrates the point. I'm running SQL2008 SP1.

Here are the functions:

USE AdventureWorks
GO
-- the standard TVF
CREATE FUNCTION dbo.stvf_GetEmployees(@ManagerID INT)
RETURNS @Employee TABLE (
[EmployeeID] [int], [NationalIDNumber] [nvarchar](15) NOT NULL,
[ContactID] [int] NOT NULL, [LoginID] [nvarchar](256) NOT NULL,
[ManagerID] [int] NULL, [Title] [nvarchar](50) NOT NULL,
[BirthDate] [datetime] NOT NULL, [MaritalStatus] [nchar](1) NOT NULL,
[Gender] [nchar](1) NOT NULL, [HireDate] [datetime] NOT NULL,
[SalariedFlag] [dbo].[Flag] NOT NULL, [VacationHours] [smallint] NOT NULL,
[SickLeaveHours] [smallint] NOT NULL, [CurrentFlag] [dbo].[Flag] NOT NULL,
[ModifiedDate] [datetime] NOT NULL
)
AS
BEGIN

INSERT INTO
@Employee
SELECT EmployeeID, NationalIDNumber, ContactID, LoginID, ManagerID, Title, BirthDate, MaritalStatus, Gender, HireDate, SalariedFlag, VacationHours, SickLeaveHours, CurrentFlag, ModifiedDate
FROM HumanResources.Employee
WHERE ManagerID = @ManagerID

RETURN

END
GO
-- the Inline TVF
CREATE FUNCTION dbo.itvf_GetEmployees(@ManagerID INT)
RETURNS TABLE
AS
RETURN
SELECT
EmployeeID, NationalIDNumber, ContactID, LoginID, ManagerID, Title, BirthDate, MaritalStatus, Gender, HireDate, SalariedFlag, VacationHours, SickLeaveHours, CurrentFlag, ModifiedDate
FROM HumanResources.Employee
WHERE ManagerID = @ManagerID

GO


Running the inline tvf generates a pretty simple estimated execution plan:



Whereas running the standard tvf shows the extra work that is involved - inserting and THEN reading:



Making this change took the execution time for the function down from 10mins to nearer 2mins and had significant benefits further down the chain (as the tvf was used in a CROSS APPLY).

So if you're like me and tend to just stick with the same style TVF then be aware of the potential performance implications. Microsoft has provided these 2 types of functions (probably for this reason!) so make sure you use of the appropriate TVF to ensure optimum performance.

Monday 24 January 2011

Powershell: use it or lose it

Powershell has been around a good few years now and I've heard so much good stuff about it but i'm still finding it difficult to "get into". As i've said before, my favoured method of learning something is to use it in anger but I find that I go through patches of using it and then just naturally drift away. Use it or lose it, is the phrase that springs to mind.

Perhaps this is because at the moment, I don't have a huge server farm to deal with and so maintenance tasks aren't onerous and so its quite tough to force yourself to use a technology if you don't need to.

In saying that, I recognise how its a powerful language and could make many tasks (eg reading error logs) much more straight forward IF you need to do those tasks on multiple servers frequently i.e Powershell helps you benefit from economies of scale.

This year, i'm going to try and make a concerted effort to "learn" powershell and add this string to my bow as I believe its going to be a must have for any database professional in the near future.

Friday 21 January 2011

T-SQL: Nonclustered Primary Keys

In the majority of SQL database applications I have worked with, surrogate keys are prevalent. Essentially, IDENTITY fields are used to define unique rows rather than working out which fields define the PRIMARY KEY for a table. I have no problem with this in principal and use this technique often myself, for example in Logging/Audit tables where I follow design principals that say every table should have a PK.

By default, when you define a primary key it is created as a CLUSTERED index. However with surrogate keys, the field is rarely used in queries and so essentially you have "wasted" your clustered index - in my view, your most prized index possession due the fact there can be only one per table. Therefore, I strongly recommend to create any primary keys as NONCLUSTERED which will allow you to make judicial use of your clustered index on columns which may benefit more. Of course, there are caveats to this, one example being if you are intending on creating XML indexes which requires that the table has a clustered primary key.

Futher to this, when creating tables a good habit I have got into is to create your table without defining constraints in the DDL and then adding the constraints as separate DDL statements. This reminds you to be a lot more explicit in your naming conventions and constraint/index definitions rather than relying on SQL Servers defaults.

An example below:

CREATE TABLE [dbo].[tblAudit](
[Audit_ID] [int] IDENTITY(1,1) NOT NULL,
[Audit_Type_ID] [int] NOT NULL,
[Date_Created] [datetime] NOT NULL,
[Created_By] [nvarchar](255) NULL,
[Audit_Description] [nvarchar](255) NULL
)
GO
ALTER TABLE [dbo].[tblAudit]
ADD CONSTRAINT [PK_tblAudit_AuditID] PRIMARY KEY ([Audit_ID])
GO
ALTER TABLE [dbo].[tblAudit] WITH CHECK ADD CONSTRAINT [FK_tblAudit_AuditTypeID] FOREIGN KEY([Audit_Type_ID])
REFERENCES [dbo].[tblAuditType] ([Audit_Type_ID])
GO
ALTER TABLE [dbo].[tblAudit] ADD CONSTRAINT [DF_tblAudit_DateCreated] DEFAULT (GETDATE()) FOR [Date_Created]
GO

Friday 14 January 2011

Denali SSIS: Execute Packages with T-SQL

Its always been a bother to me that SSIS (DTS) and the Database Engine are essentially separate products. Although they are part of the SQL Server stack, they don't really integrate (ouch!) together, illustrated by the fact they use different IDEs.

In a few roles I've had, I've needed the application to call an SSIS package from a T-SQL stored procedure and although there has been a couple of workarounds I've never been entirely satisfied with this.

At long last this appears to have been rectified and in the first CTP of the next version of SQL Server (Denali), Integration Services has been "promoted" to the Database Engine tree in Object Explorer.

But much more exciting that that is that now, there is the ability to execute SSIS packages with a T-SQL Command. Jamie Thomson has written a blog post on this exact thing so rather than me just rehash what he has done, here's the link!

Wednesday 12 January 2011

The SQL Server Stack - too tall to see the top?

I was chatting with my old boss the other day and the conversation moved on to what versions of software they were running and I asked if they were test driving the next version of SQL Server, Denali.

Now, the company in question are definitely a forward looking firm who look to stay as current as possible - I remember when I was there we were one of the very early adopters of SQL Server 2005. Back then, the move from 2000 -> 2005 was a huge leap and the first major release in some time from MS and was a serious upgrade with many changes. We benefited from major performance and reliability improvements yet we didn't change our codebase significantly. And as it turns out, even to this day the codebase is very similar. When i asked, "do you think (a scaled down) version of your app would run on SQL2000?" the answer was "probably". Indeed, when pushed the only features that seemed to have been embraced were CTEs and a prototype using the spatial data types. Not much in 10 years!

Now we have the first CTP of Denali and yet more features are being bundled into the SQL Server Stack - SSIS improvements, T-SQL enhancements etc. I'm not saying these aren't necessary or what the market demands and its evidence that MS are not resting their laurels, but I do wonder how many of these features are ever used in production environments. SSIS has been a huge success but that was (takes deep breath) essentially just an overhaul of DTS. Not a new feature, just an upgrade of existing technology. We've seen Extended Events, Resource Governor, Change Data Capture and Policy Management which are undoubtedly impressive features but how many people take the transition from Demo/Prototype to adding it in a production environment. It seems to me that its the softer, less revolutionary features that have been the most successful. T-SQL enhancements such as the CTE
and TRY/CATCH that have had the most impact.

Perhaps all this is just a reflection on my recent roles which have tended to be much more focused on Developer activites rather than DBA work? The SQL Server product is now so impressive and vast being much more than just a database engine with SSRS/SSIS/SSAS all justifying specialist skills and challenging the database professional ever more. I find that i'm able to learn and understand a feature if i get to use it in anger and this means more than just running through a set of tutorials.

An example is Extended Events. Jonathan Kehayias has put together a brilliant set of blog posts on them which i've been working my way through them and i've been impressed. But I'm in need of some real life scenarios where I can apply this knowledge otherwise it'll just be consigned to the "stuff i know, but don't know" bin.

With the new year, i'm really keen to get my head round many of the new features of Denali. The difficulty is getting the balancing act between being spread too thinly. In an effort to cover the entire SQL Server Stack you risk diluting your knowledge by not knowing enough. In focussing too much on one area, the risk is that you miss out on exciting (and more importantly) valuable features.

Tuesday 11 January 2011

New Year Targets 2011

Fully aware that its now the second week of January, I thought i'd make a belated attempt to jot down a few targets for 2011. I should stress, that these are purely professional related and actually, very SQL specific and I do have other things I want to achieve this year but thats for a different blog!!

There isn't anything particularly exciting about the targets themselves, but i'm conscious that I need to start protecting my GOLD time in order to achieve them.

1) Write 52 blog posts
2) Complete MCITP certification
3) Stay "current" and keep on top of DenaliCTPs
4) Continue to be an active participant in the SQL forums

So nothing too radical in there, but it will mean a bit of self discipline and some determination to keep them. Its a new year, so lets start with best intentions.

Friday 7 January 2011

SSIS: Can't link Tasks in Container?

I've just come across an issue whereby I have an existing For Each Loop container with a single dataflow task within. To develop the package further, I wanted to add an Execute T-SQL task within the container before the dataflow task so dragged over the task icon and configured it. However, when I tried to join the tasks with a precedence constraint, an error was thrown:

Cannot create connector. Cannot connect the executables from different containers.

This seemed a bit confusing as the tasks were definitely both within the container. However, a simple restart of Visual Studio 2008 solved the issue so this looks like a bug with the GUI.
/* add this crazy stuff in so i can use syntax highlighter