SQL Stuff: 2011

Thursday, 8 December 2011

Admin: Windows Group Login Ambiguity

Its a typical scenario to grant login access to SQL Server via Windows groups as this eases the administrative burden of setting up multiple logins all with the same access. However, its not uncommon for a user to belong to multiple Windows groups and its possible that these groups could each have a SQL login. So whats the problem? Well, consider this example:

User - DOMAIN\PeterPan
Groups - DOMAIN\ReadOnlyGroup, DOMAIN\WriteOnlyGroup

We have 2 windows groups both of which have access to SQL Server, albeit for different purposes. Peter is a member of both groups as he needs access to do both actions. My question is, when Peter logs in, under which Windows Group has he been granted authentication?

Why does it matter, I here you ask?

From a pure permissions point of view, there is no issue as the usual hierarchy applies. Permissions are built on top of each other so SQL Server will combine the permissions from both groups (DENY overriding GRANT etc) and that gives us the behaviour we want. You can validate this by simply adding a differing set of permissions to the different groups and then adding or removing the member from it:

EXEC sp_addrolemember N'db_datareader', N'DOMAIN\ReadOnlyGroup'

EXEC sp_addrolemember N'db_datawriter', N'DOMAIN\WriteOnlyGroup'

You can view the permission set in the following way:

SELECT *, DB_NAME(), SUSER_SNAME() FROM fn_my_permissions(NULL, 'DATABASE')

However, this works because there is a defined hierarchy in place which allows SQL Server to make the correct choice based upon permission settings.

But, what about something where no such hierarchy exists like a default language. If the 2 groups have different default languages then how do we determine which default language the login connection should take? This is particularly important when dealing with dates. US_English and British_English interpret dates in MDY and DMY respectively so the date 30/11/2011 will throw an error in US_English but work correctly in British_English. Worse still, you may not get an error at all and the date is valid but not what the user wants e.g 01/02/2011. Of course, best practice would see us using ISO formatted dates YYYYMMDD but I dare say not every application adheres to this.

I understand that this is a difficult (if not impossible) issue to rectify. I mean, there is no logical hierarchy or set of rules that can be put in place when this situation occurs. However, I'd like to understand how SQL Server does determine which language it uses in this scenario as I couldn't find a method of doing so. Is it the windows group which was last granted access to SQL? Does anyone know?

Thursday, 1 December 2011

T-SQL: Audit Data Changes

I was recently asked about how to implement auditing of data changes for a database. I know SQL Server contains the Change Data Capture feature but (as is often the case with things I want!), its an Enterprise only feature meaning its out of the reach for the application in question.

As it happened, it turned out that there was already an application in-house (that I hadn't developed) which implemented its own auditing of changes based around a simple Audit table and Triggers. I'm still yet to find out who wrote it, but it certainly does the trick. The database it runs on is relatively static so I do have reservations how scalable it is but for some tables you want control over, it may be suitable.

CREATE TABLE [dbo].[Audit](
   [AuditID] [int] IDENTITY(1,1) NOT NULL,
   [Type] [char](1) NULL,
   [TableName] [varchar](128) NULL,
   [PrimaryKeyField] [varchar](1000) NULL,
   [PrimaryKeyValue] [varchar](1000) NULL,
   [FieldName] [varchar](128) NULL,
   [OldValue] [varchar](1000) NULL,
   [NewValue] [varchar](1000) NULL,
   [UpdateDate] [datetime] NULL,
   [UserName] [varchar](128) NULL
) ON [PRIMARY]

GO

ALTER TABLE [dbo].[Audit]
ADD CONSTRAINT [DF_Audit_UpdateDate] DEFAULT (GETDATE()) FOR [UpdateDate]
ALTER TABLE [dbo].[Audit]

ADD  CONSTRAINT [DF_Audit_UserName] DEFAULT (SUSER_SNAME()) FOR [UserName]
GO

CREATE TRIGGER dbo.MyTableChangeTracking
ON dbo.MyTable

FOR INSERT, UPDATE, DELETE
AS

DECLARE @bit INT
DECLARE @field INT
DECLARE @maxfield INT
DECLARE @char INT
DECLARE @fieldname VARCHAR(128)

DECLARE @TableName VARCHAR(128) = 'MyTable';
DECLARE @PKCols VARCHAR(1000)
DECLARE @sql VARCHAR(2000)
DECLARE @UserName VARCHAR(128) = SUSER_SNAME()
DECLARE @Type CHAR(1)
DECLARE @PKFieldSelect VARCHAR(1000)
DECLARE @PKValueSelect VARCHAR(1000)

-- Action
IF EXISTS (SELECT TOP 1 * FROM inserted)

 BEGIN
   IF EXISTS (SELECT TOP 1 * FROM deleted)
       SET @Type = 'U' -- update
   ELSE
       SET @Type = 'I' -- insert

END
ELSE
   SET @Type = 'D' -- delete

-- get all the changed values (need both tables for update)

SELECT * INTO #ins FROM inserted
SELECT * INTO #del FROM deleted

-- Get primary key columns for full outer join
SELECT @PKCols = COALESCE(@PKCols + ' AND', ' ON') + ' i.' + c.COLUMN_NAME + ' = d.' + c.COLUMN_NAME

FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS pk
   INNER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE c
       ON c.TABLE_NAME = pk.TABLE_NAME
           AND c.CONSTRAINT_NAME = pk.CONSTRAINT_NAME

WHERE pk.TABLE_NAME = @TableName
   AND CONSTRAINT_TYPE = 'PRIMARY KEY'

-- Get primary key fields select for insert
SELECT @PKFieldSelect = COALESCE(@PKFieldSelect + '+', '') + '''' + COLUMN_NAME + ''''

FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS pk
   INNER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE c
       ON c.TABLE_NAME = pk.TABLE_NAME
           AND c.CONSTRAINT_NAME = pk.CONSTRAINT_NAME

WHERE pk.TABLE_NAME = @TableName
   AND CONSTRAINT_TYPE = 'PRIMARY KEY'

SELECT @PKValueSelect = COALESCE(@PKValueSelect + '+', '') + 'convert(varchar(100), coalesce(i.' + COLUMN_NAME + ',d.' + COLUMN_NAME + '))'

FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS pk
   INNER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE c
       ON c.TABLE_NAME = pk.TABLE_NAME
           AND c.CONSTRAINT_NAME = pk.CONSTRAINT_NAME

WHERE pk.TABLE_NAME = @TableName
   AND CONSTRAINT_TYPE = 'PRIMARY KEY'

-- raise an error if there is no Primary Key
IF @PKCols IS NULL

BEGIN
   RAISERROR('no PK on table %s', 16, -1, @TableName)
   RETURN
END

SELECT @field = 0, 
   @maxfield = MAX(ORDINAL_POSITION)

FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = @TableName

-- now loop through the fields to log the values from each field
WHILE @field < @maxfield

BEGIN
   SELECT @field = MIN(ORDINAL_POSITION)
   FROM INFORMATION_SCHEMA.COLUMNS 
   WHERE TABLE_NAME = @TableName AND ORDINAL_POSITION > @field
   
   SELECT @bit = (@field - 1 )% 8 + 1
   SELECT @bit = POWER(2,@bit - 1)
   SELECT @char = ((@field - 1) / 8) + 1
   
   IF SUBSTRING(COLUMNS_UPDATED(),@char, 1) &amp; @bit > 0 OR @Type IN ('I','D')
   BEGIN
       SELECT @fieldname = COLUMN_NAME 
       FROM INFORMATION_SCHEMA.COLUMNS
       WHERE TABLE_NAME = @TableName 
           AND ORDINAL_POSITION = @field
       
       -- dynamically build and execute the audit insert statement
       SELECT @sql = 'INSERT INTO dbo.Audit (Type, TableName, PrimaryKeyField, PrimaryKeyValue, FieldName, OldValue, NewValue)'
       SELECT @sql = @sql + ' SELECT ''' + @Type + ''''
       SELECT @sql = @sql + ',''' + @TableName + ''''
       SELECT @sql = @sql + ',' + @PKFieldSelect
       SELECT @sql = @sql + ',' + @PKValueSelect
       SELECT @sql = @sql + ',''' + @fieldname + ''''
       SELECT @sql = @sql + ',CONVERT(VARCHAR(1000),d.' + @fieldname + ')'
       SELECT @sql = @sql + ',CONVERT(VARCHAR(1000),i.'  + @fieldname + ')'
       SELECT @sql = @sql + ' FROM #ins i FULL OUTER JOIN #del d'
       SELECT @sql = @sql + @PKCols
       SELECT @sql = @sql + ' WHERE i.' + @fieldname + ' <> d.' + @fieldname
       SELECT @sql = @sql + ' OR (i.' + @fieldname + ' IS NULL AND d.' + @fieldname + ' IS NOT NULL)'
       SELECT @sql = @sql + ' OR (i.' + @fieldname + ' IS NOT NULL AND d.' + @fieldname + ' IS NULL)'

       EXEC (@sql)
   END
END

GO

Its quite a complicated trigger and took some time to get my head round but it certainly does the trick and you just need to implement this for each table you wish to track changes on. The data is readable too:

But i wasn't comfortable with the overhead of the trigger, the dynamic sql nor the fact that I hadn't written it myself! So I considered a second approach, still using triggers but this time just dumping the changed records to an audit table in XML format. This would have the advantage of a much simpler trigger even at the cost of readability of results and potentially higher storage costs.

CREATE TABLE dbo.tblAudit(
   AuditID INT NOT NULL IDENTITY(1,1),
   TableName VARCHAR(255),
   DataInserted XML,
   DataDeleted XML,
   DateCreated DATETIME,
   CreatedBy VARCHAR(2000)
)

GO

CREATE TRIGGER dbo.MyTable_ChangeTracking
ON dbo.MyTable

FOR INSERT, UPDATE, DELETE
AS

SET NOCOUNT ON-- set the tablename

DECLARE @TableName VARCHAR(255) = 'dbo.MyTable'-- grab the changed data in XML format
DECLARE @InsertXML XML = (SELECT * FROM inserted FOR XML AUTO);
DECLARE @DeleteXML XML = (SELECT * FROM deleted FOR XML AUTO);-- log it
INSERT INTO dbo.tblAudit (TableName, DataInserted, DataDeleted, DateCreated, CreatedBy)
SELECT @TableName, @InsertXML, @DeleteXML, GETDATE(), SUSER_SNAME()
GO

To interogate the results, you'd need to be comfortable working with XML but the simplicity of the trigger is something i really like. Hopefully, you wouldn't need to be looking at this data too often so the ease of logging is the main factor when implementing something like this. You could easily deploy this across multiple tables/servers too with a simple Powershell script.

Now, I wouldn't recommend either of this in a highly transactional database and you would want to consider whether to implement this method if the performance of your DML statements was critical and your storage quite tight. I'm going to experiment a bit with the CDC to see just what benefits it gives over a bespoke implementation (I suspect its much more lightweight) while also hoping that the feature gets downgraded to the lower editions of SQL Server.

Saturday, 26 November 2011

Denali: Now gets a proper name!

Since I last blogged about Denali, Microsoft has announced the official name for this version of SQL Server to be....(drum roll). SQL2012. Ok, ok. I'm pretty underwhelmed too but hey ho. Project Crescent now has the rather grand title of Power View and Juneau has the slightly less impressive Data Tools.

More exciting and interesting though are the changes to the Editions and also the licensing costs, in particular the introduction of a BI edition and the move to per core licensing. You can read more about this in this blog from Geoff Hiten.

I also came across this blog from the SQL Express guys which talks of the new LocalDB edition aimed at Developers. A lightweight database server with less management overhead than a full SQL Express edition. Nice.

Friday, 18 November 2011

Admin: Execute Folder of Scripts

Here's a simple powershell script to loop through a directory and execute the sql files contained within:

ADD-pssnapin SqlServerCmdletSnapin100 

$ServerName = "MySERVER"

$Db = "MyDB"
$scriptDir = "c:\FileDir\"


$files = gci $scriptDir* -include *.sql 

foreach ($file IN $files)

{
Invoke-Sqlcmd -ServerInstance $ServerName -DATABASE $Db -InputFile $File
}

Tuesday, 8 November 2011

SSRS: Use Stored Procedures in Datasets

This is my contribution to T-SQL Tuesday #24 hosted by Brad Schulz (blog ) on the subject of Prox ‘n’ Funx (Stored Procedures and Functions to you and me :-) ).

I'm a big believer in using Stored Procedures (or at the very least, UDFs) for your Reporting Services datasets. and separating your presentation layer from your data layer and moving the SQL code away from the RDL.

The benefits of this are that you as long as the meta-data of your Stored Procedure stays the same, then you able to modify and enhance your SQL code without having to touch the RDL. You essentially abstract away the source code from the report.

Perhaps you're improving performance by moving to JOINs from cursors, extended the business logic to only return rows that meet new criteria or simply doing a refactoring of SQL code to standardise your table names. All of these don't affect the presentation layer and having them reside as Stored Procedures on the database, gives huge maintenance benefits.

Other advantages include having all your T-SQL held in the one place and knowing that you are aware of the impact of any changes without having to worry about dependancies elsewhere. Also, you will often be re-using code (eg for parameter datasets) and using a single stored procedure helps reduce duplication of effort (and probably performance benefits too).

Of course, there are downsides to this approach. If you need to introduce a new parameter to a report (which is passed to your dataset) then you have to change both the RDL and the stored procedure. I can see this being a slight irritation as you now have 2 deployments whereas holding the SQL code "inline" means a simple upload of the new report.

For me though, the former approach still wins and I advocate using Stored Procedures for Reporting Services datasets. I've been experimenting recently with putting Stored Procedures used for Reports into their own schema (acting as a namespace) although I can't categorically say whether this has been a success or not (Jamie Thomson (Blog | Twitter) has an interesting blog post which touches on Schema usage here).

Thursday, 3 November 2011

T-SQL: Why have NULL bit fields?

I was asked the question the other day, why SQL Server allows NULLs for the BIT datatype. What you need to bear in mind is what NULL represents - a value which is unknown - and, so there are 3 states which you need to cater for. True, False or Unknown.

As is my wont, I like to explain things with analogies and this is no different. Consider a table holding job application details for a job board website.

CREATE TABLE dbo.tblJobApplications(
   CandidateID INT,
   JobID INT,
   DateApplied DATE,
   Successful BIT)

-- Application in progress
INSERT INTO dbo.tblJobApplicationsSELECT 1, 1, GETDATE(), NULL

-- Application unsuccessful
INSERT INTO dbo.tblJobApplicationsSELECT 2, 1, GETDATE(), 0

-- Application successful
INSERT INTO dbo.tblJobApplicationsSELECT 3, 1, GETDATE(), 1

The key field to note here is of course the BIT field which indicates the success or failure of the applicaiton. Obviously, when a candidate applies to a job, the success of the application isn't known - the candidate has been accepted, nor rejected. Its only at the end of the lifecycle of the application that this field can take on a meaningful value.

Hopefully, this contrived example helps explain just when you might require a NULL bit field.

Thursday, 27 October 2011

Admin: A day of "a day" series...

There really are some incredible bloggers out there. Despite my recent 5 day series on Continuous Integration, I typically struggle to blog more frequently than once a week. Yet these guys have committed to a whole month of blogging on a single topic and the quality of the blogs isn't diluted at all giving a real "Deep Dive" into a topic.

So here are some of the "A Day" series that i've dipped into recently - I would bet that there will be more to be added. Credit to the guys and I doff my hat to you.

-- A DMV a day
http://sqlserverperformance.wordpress.com/2010/04/

-- XE a day
http://sqlblog.com/blogs/jonathan_kehayias/archive/2010/12/01/a-xevent-a-day-31-days-of-extended-events.aspx

-- A SQL Myth a day
http://www.sqlskills.com/BLOGS/PAUL/post/A-SQL-Server-DBA-myth-a-day-%28130%29-in-flight-transactions-continue-after-a-failover.aspx

-- SSIS a day
http://www.jasonstrate.com/2011/01/31-days-of-ssis-raw-files-are-awesome-131/

Friday, 21 October 2011

Admin: Generate Restore Scripts

In order to speed up the recovery process of database (and to give the unintentional DBAs in my organsiation) a helping hand, I recently started experimenting with generating restore scripts for our production databases.

As is my wont, my technology preference was again Powershell (although it did make use of T-SQL too) as looping through databases on a server and writing to file are all pretty trivial.

I wrote a simple TVF to generate the restore script for a particular database, based upon the system tables in MSDB and based closely on the script found on MSSQLTIPS.

CREATE FUNCTION [dbo].[tvf_GetRestoreCommands](@DatabaseName SYSNAME)RETURNS 
    @AllCommands TABLE (
        [backup_set_id] INT NULL,
        [Command] NVARCHAR (MAX) NULL
        )AS
BEGIN

   DECLARE @backupStartDate DATETIME
   DECLARE @backup_set_id_start INT
   DECLARE @backup_set_id_end INT
   DECLARE @IncludeMoveClause BIT = 1
   DECLARE @MoveClause NVARCHAR(255) = ''

   -- get the most recent full backup
   SELECT @backup_set_id_start = MAX(backup_set_id) 
   FROM  msdb.dbo.backupset 
   WHERE database_name = @databaseName 
       AND TYPE = 'D'

   SELECT @backup_set_id_end = MIN(backup_set_id) 
   FROM  msdb.dbo.backupset 
   WHERE database_name = @databaseName AND TYPE = 'D'
   AND backup_set_id > @backup_set_id_start

   IF @backup_set_id_end IS NULL SET @backup_set_id_end = 999999999

   -- do you want to include the move clause in case of having to go to a different server
   IF @IncludeMoveClause = 1
       SELECT @MoveClause = COALESCE(@MoveClause + ',','') + ' MOVE ' + QUOTENAME(name,'''') + ' TO ' + QUOTENAME(physical_name, '''')
       FROM sys.master_files
       WHERE database_id = DB_ID(@databaseName);

   INSERT INTO @AllCommands
       SELECT backup_set_id, 'RESTORE DATABASE ' + @databaseName + ' FROM DISK = ''' 
              + mf.physical_device_name + ''' WITH NORECOVERY, STATS = 5 ' + @MoveClause
       FROM    msdb.dbo.backupset b
           INNER JOIN msdb.dbo.backupmediafamily mf
               ON b.media_set_id = mf.media_set_id
       WHERE b.database_name = @databaseName
                 AND b.backup_set_id = @backup_set_id_start
       UNION
       SELECT backup_set_id, 'RESTORE LOG ' + @databaseName + ' FROM DISK = ''' 
              + mf.physical_device_name + ''' WITH NORECOVERY'
       FROM    msdb.dbo.backupset b,
                  msdb.dbo.backupmediafamily mf
       WHERE    b.media_set_id = mf.media_set_id
               AND b.database_name = @databaseName
               AND b.backup_set_id >= @backup_set_id_start 
               AND b.backup_set_id < @backup_set_id_end
               AND b.TYPE = 'L'
       UNION
       SELECT 999999999 AS backup_set_id, 'RESTORE DATABASE ' + @databaseName + ' WITH RECOVERY'
       ORDER BY backup_set_id
       
   RETURN
END

This was the sort of thing I needed and then all I was left to do was write a powershell script to execute this command for each database and persist the results to a file:

# Load the SQL Management Objects assembly (Pipe out-null supresses output)[System.Reflection.Assembly]::LoadWithPartialName("Microsoft.SqlServer.SMO") | out-nulladd-pssnapin SqlServerCmdletSnapin100

$server = hostname;$backupdir = "E:\SQLBackup\Scripts\Restore\";$sql = new-object "Microsoft.SqlServer.Management.SMO.Server" $server;# Get databases on our server
$databases = $sql.Databases | Where-object {$_.IsSystemObject -eq $false};# generate the restore commandforeach ($db in $databases){
   $filePath = $backupdir + 'Restore_' + $db.name + '.sql'
   $dbname = $db.Name
   
   $sqlresults = Invoke-Sqlcmd -Query "SELECT [Command] FROM GlobalDB.dbo.tvf_GetRestoreCommands('$dbname')" -ServerInstance $sql_server

   # nasty hack here to force the output to fit in. Hope my restore command isn't > 3000 chars!
   $sqlresults | out-file -filepath $filepath -width 3000
}

The key thing to note here is that I've created the TVF in a central database GlobalDB which I use for admin/logging tasks on the server. This saves me from having to keep a copy of the script in each database (although I could achieve the same behaviour by installing it into the Master database or MSDB database).

Also, there is reference to a hack which I talked about in a previous post.

The final part of the jigsaw was being able to schedule this to run every time there was a backup so it would pickup the latest backup sets. SQL Agent in SQL2008 has a Powershell step but this will actually execute code held within the SQL job. For me, I want to be able to manage my powershell scripts independantly (keeping them under source control etc) so really, I just want to point SQLAgent at a script and execute it. Turns out this can be achieved using the CmdExec step:

powershell "& D:\SQLAdmin\CreateDBRestoreCommands.ps1"

The main downside to this is the permissions as running this Job Step type will mean the job runs under the context of the SQL Agent and, unless I set up a proxy, I will need to grant some database permissions (SELECT on the TVF) to the SQL Agent login account. As a quick fix, I granted SELECT permissions on the TVF to the Public role and off we went.

Thursday, 13 October 2011

Denali SSIS: Loop through a list of servers

Its been tough to get any time recently to play with Denali, so I thought i'd put together a quick tutorial on SSIS using Denali to give people a tiny flavour of what it looks like. I intend (time permitting) to use this tutorial in the future to extend the package to show off a few of the new features of Denali.

Note: this task can be achieved in previous versions of SSIS with minimal changes.

The aim of this package is a simple one: to iterate over a list of Servers and execute a SQL task against each one. I've also thrown in a Script task too for good measure.

Setup

First up, we'll set up the test data in Management Studio (look at that cool Denali syntax colouring!):

Then we move into Visual Studio to start our package:

Although I've not configured the Tasks yet, you can immediately get a feel for what the purpose of this package is going to be. As you might expect, there is improved "kerb appeal" from MS in this release with smoother graphics and features such as the magnifier which allow you to make your design more readable.

Configuration

1) T-SQL - Get Servers:

The Execute T-SQL dialog box is straight forward enough to configure. The main thing to note on the General tab is that you need to set the ResultSet appropriately to Full result set.

On the Result Set tab, you need to set your Result to a variable of type Object. If you haven't already created your variable, you can create one from this dialog box.

2) For Each Loop - Servers

We just need to choose the Foreach ADO Enumerator and select the object source variable to be that you populated in the previous task. Simple.

Click onto the Variable Mappings and here is where you'll pull out the relevant details from your object. In our case, we just need to grab the ServerName and populate a simple string variable to use in the tasks within the container. You need to map these variables by Zero based Index and we're only interested in the first column, hence Index 0.

3) T-SQL - Get Version

Another T-SQL task here, but the clever part is that we need the connection to be dynamic. In other words, for each server in our collection, we need to connect to that server and get its version.

First, we add an extra connection using the Connection Manager (Note, I also change my connection name to show that its dynamic - i've called it DynamicSQLConn)

We can make the connection dynamic by changing the ConnectionString property on each iteration of the loop. To do this, we dive into the Properties window and click on Expressions:

We just then set the ConnectionString property to something like the following:

"Data Source=" + @[User::ServerName] + ";Initial Catalog=master;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;"

Now we can just use this connection string in our T-SQL task. I've chosen to return the results of @@VERSION. Its a single column, single row result set so i've chosen the Single Row result set.

Now we just need to configure the Result Set by sending the output to another string variable

4) Script Task - Show Version

Now typically, you'd want to do something more appropriate than just showing your results via a Message Box but this is exactly what i'm going to do. If nothing else, it illustrates the use of the Script Task.

We just open up the task and we need to pass in the variables we wish to use. You can type them in, or just use the select dialog box thats provided. As we're only displaying, they just need to be ReadOnlyVariables.

When you click Edit Script, you get a new instance of Visual Studio open up to add your code. We only need to modify the Main method:

And thats it!!

Execution

You can test and execute the package through Visual Studio and see even more of that "Kerb Appeal" that I talked about earlier. Gone are the garish colours associated with previous versions of SSIS and they've been replaced by more subtle and sexy icons:

Hopefully this has been helpful in giving a quick glance at the look and feel of Denali SSIS while also showing how you can SSIS to loop over a dataset.

Specifically the main points to take away are:
1) For Each Loop Container with an ADO Enumerator, using the Object variable and accessing properties of the object
2) T-SQL Task - using a dynamic connection with Expressions, using different Result Sets and populating variables from result sets
3) Script Task - passing in variables and writing a simple task.

Tuesday, 4 October 2011

T-SQL: Helping the optimiser can hinder

I've been meaning to blog about this for a while and noticing the (earlier than usual) T-SQL Tuesday #23 hosted by Stuart Ainsworth (blog | twitter) on the topic of Joins, I thought i'd cobble something together.

Essentially, this post is about reading the results of SHOWPLAN_TEXT, being aware of the different physical join operators and also how the optimiser can be influenced (poorly) by a bad WHERE clause.

I have the following query on a view which abstracts the 4 tables joined:

SELECT *, CAST(Val AS NUMERIC(22,16)) FROM DDL.vwData

The tables behind the view aren't really that important, as I just want to concentrate on the plans generated but basically, there is a lookup table for RateID and then a maindata table which joins to this table (via other intermediate tables). The key thing to note in the query though is the CAST - sometimes the column Val (from maindata) has a value which can't be converted - indeed running this query as is gives the error:

Msg 8115, Level 16, State 8, Line 2
Arithmetic overflow error converting numeric to data type numeric.

I know the data for RateID does have Vals which are CASTable to the decimal precision/scope and changing the query to only return those records where RateID = 1, returns a result set without errors:

SELECT *, CAST(Val AS NUMERIC(22,16)) FROM DDL.vwRateData WHERE RateId = 1

However, if I "improve" the query to filter out these records further to only return data from the last month using a non-sargable clause, i get that pesky overflow error again:

SELECT *, CAST(Val AS NUMERIC(22,16)) FROM DDL.vwRateData WHERE RateId = 1 AND DATEDIFF(D,GETDATE(),[Date])< 365

So whats different? First glance suggests something is wrong because if the whole dataset for RateId=1 returns without error, then choosing a further subset of this data should also work.

I delved into the query plans to find out and the clue was there:

The Good query generated a plan that was using a Nested Loops inner join. In a nutshell, the first thing this query does is filter out the correct rows from the lookup table (RateID = 1) and then iterates through the main table where there is a match on that RateId. In other words, it doesn't matter whether or not the Val column for other RateIDs is CASTable as this plan only touches RateID = 1.

On the other hand, the Bad query generated a plan that was using a Hash Match join. This differs in that it will read all the rows in both the tables, and thus attempting to run the CAST function over every record. Only later in the query plan does it do the filtering on the RateID.

The different types of JOINs (both logical and physical) are explained really well in this article but the "takeaway" from this post is that you need to be careful with your query construction as the optimiser may choose a plan that can influence both performance and even robustness. Logic may tell you that adding extra filtering criteria will help the optimiser choose a more efficient plan, but in this example it has chosen a plan that has caused the query to fail.

Friday, 30 September 2011

Powershell: Out-File truncate text

I was looking to use Powershell to write out some data from a SQL table using a combination of Invoke-Sqlcmd with out-file. Everything was going nicely until my output file suddenly started having data truncated. Something like:

Invoke-Sqlcmd -Query "SELECT LongText FROM MyTable" -ServerInstance $sql | out-file -filepath $filepath

Produced a file containing:

LongText
----------------
This was my really really long and boring piece of text that I had writ...

The thing is, it happened quite randomly. Or so I thought. Turns out that the out-file cmdlet outputs exactly as if it was appearing within the console on that machine and my console was too narrow for the text. (This is all in the documentation to be fair). So manipulating the size of your console window (by editing the properties) results in different output lengths.

However, each powrshell install on a machine can be different so I really need my script to be able to handle this limitation. As of writing, I couldn't find anything fullproof and my best hack was to just explicity set a width that I "knew" was big enough to cater for my text.

Invoke-Sqlcmd -Query "SELECT LongText FROM MyTable" -ServerInstance $sql | out-file -filepath $filepath -width 2000

Friday, 23 September 2011

T-SQL: A Week Of Continuous Integration Day 5

This is the final post of a 5 part series outlining my journey to getting SQL Server databases under source control and into a continuous integration environment.

The other posts are:
Day 1 - Powershell & Scripting Objects
Day 2 - Making Use of a Build Server
Day 3 - Database Developer Toolkits
Day 4 - Adding Value to Continuous Integration

Putting it all together

My journey is at an end and I finally have a working solution which has the following components:

1) A SQL Server dedicated to Continuous Integration
2) A build server/agent with Team City & Final Builder
3) Red-Gate Developer Toolbelt installed on the build agent and the developer machine
4) SVN as my version control choice and databases under Red-Gate source control
5) Tests written with the NUnit framework

Below I have outlined the full process with a few supporting screenshots to help show how things work. This isn't intended as an explanation of all the technologies used, but merely how I have bolted the pieces together to serve my purpose. Further reading can be found by visiting the wesbites of the software vendors.

SQL Under Source Control

Here we have the database SubversionTest under source control using RedGates SQL Source Control. Nothing too surprising there, just a set of directories with scripts.

With the database linked to SVN, adding the object dbo.NewTable to the database gives a visual aid to let you know changes need to be commited.

On browsing to the SQL Source Control tab, you get to view the DDL for the new object and, if the object has changed, the DDL that exists within SVN.

Here is the generic FinalBuilder script I use to deploy the database to the continuous integration server. Its very simple and just drops the database if it exists, creates a new one and then uses the red-gate compare tools to deploy the database objects. I've wrapped these up in a batch file for maintenance but they don't do anything clever and they just use a combination of the switches which the sqlcompare.exe's support.

When a "build" is run, it also runs some tests. Again, I won't delve into exactly which tests are being run but this just demonstrates the sort of screen you may see subsequent to the deploy of your database. Of course, these tests are run against the Continuous Integration server and are written using the NUnit framework - something which TeamCity supports. In this instance we only have a couple of quick tests to sanity check the two objects but on more complex apps these could take a significant amount of time.

Finally, we have the artifacts generated by the process in the form of some sql scripts which will bring a production server (which the sqlcompare has run against) to the latest build we have in source control. There is also a supporting HTML file which is just a graphical representation of the changes that will be made by executing the sql script.

To ensure that the development is as bug free as possible, I've hooked the CI to occur whenever a database change is commited to SVN. This will give you confidence that the changes made have not caused any issues and at the very least haven't lead to an invalid database build. It may be that you are able to run a full run of an application which may not be suitable for "on check-in" so its possible to schedule some builds/tests to run during quiet time.

Taking it even further

If you ship a database enabled product, one of the real benefits of a solution such as this is that you can quickly and easily target different versions/editions of SQL Server. It may be that a client runs a particular hotfix of SQL Server and wants to know whether this can be supported. If you can quickly provision an instance with to mimic that of a client, you can easily point your TeamCity projects at the new instance and see if your database builds and runs successfully.

I've talked before about using extended properties to version your database objects and you can use this automated process to help that. SVN has built in revision numbers to when docs get checked in so you could easily set up a build step within teamcity to run a T-SQL script to update extended properties with this number. I haven't done this but will definitely be experimenting with it.

The final word

Hvaing worked with databases out of source control for sometime, I'm really excited about how this is going to improve the quality of our database development going forward. The tools that Red-Gate have provided are easy to use and the simplicity of them is whats so impressive. I have hardly had to change my working practices at all and know that i'm already in a better place. Time will tell whether the versioning and targeting of different database editions takes off but I know that its not going to be a big challenge to implement.

I hope this has been useful to you and would be interested in hearing from people who may have implemented similar solutions or have any ideas on how to make this even better.

References

Red Gate SQL Source Control
Red Gate SQL Toolbelt
FinalBuilder
Team City
Microsoft Visual Studio
Microsoft SQL Server
Powershell
NUnit