SQL Stuff: T-SQL Tuesday

Showing posts with label T-SQL Tuesday. Show all posts

Tuesday, 17 July 2012

T-SQL Tuesday #32 - A Day In The Life

T-SQL Tuesday has been a little off my radar for the last few months (as has all blogging activity!), so I was delighted when Twitter reminded me that it was that time of the month. Even better, the topic chosen by Erin Stellato (Blog | Twitter) was a great one that I felt I could really engage with.

So here we go. For the record, my job title is Database Administrator.

The task as laid out in the invitation suggested to talk through your day on the 11th or 12th July which just so happened to be the day I resigned from my current role. Writing down a handover list was an interesting task in itself as it gave me the opportunity to put down in words what the key parts of my job were and what skills gap would be left. I have to say, I was surprised (and pleased) that the number of tasks on the list that required significant handover to another collegue were minimal as I have been very diligent in ensuring that I was not a Single Point of Failure in my organisation. As such, I have written plenty of documentation and made extensive use of Source Control and Task Tracking systems to ensure that even if people do not know what I've done, there is a paper trail to explain it.

What was apparent was the varying skills I employ in my day to day, many of which are not even related directly to SQL Server. Much of my day is spent in project meetings, planning or analysing roadmaps and although the applications typically have a database backend, its not the focus of my attention. I am fortunate to be able to dedicate some time each day to reading blogs and technical articles to help stay up to date although I also spend my fair share of time in firefighting mode. The traditional DBA element probably takes up less than 1 day a month partly due to the small size of the estate but also because I've been effective (backslap for myself here!) in automating many of the tasks.

Is my job really a Database Administrator role? I don't think so and in truth it wasn't from the outset but this hasn't been a bad thing. I enjoy the variety of being involved in the different aspects of SQL Server and feel it makes me a much more effective professional and I imagine that this is the case for many other fellow SQL Server pros. However, I've encouraged my company to not advertise the role under this job description as I believe it may attract candidates with unrealistic expectations.

As for me, I'm not sure what my next role has in store but I hope it affords me the time to participate in future T-SQL Tuesdays!!

Tuesday, 8 November 2011

SSRS: Use Stored Procedures in Datasets

This is my contribution to T-SQL Tuesday #24 hosted by Brad Schulz (blog ) on the subject of Prox ‘n’ Funx (Stored Procedures and Functions to you and me :-) ).

I'm a big believer in using Stored Procedures (or at the very least, UDFs) for your Reporting Services datasets. and separating your presentation layer from your data layer and moving the SQL code away from the RDL.

The benefits of this are that you as long as the meta-data of your Stored Procedure stays the same, then you able to modify and enhance your SQL code without having to touch the RDL. You essentially abstract away the source code from the report.

Perhaps you're improving performance by moving to JOINs from cursors, extended the business logic to only return rows that meet new criteria or simply doing a refactoring of SQL code to standardise your table names. All of these don't affect the presentation layer and having them reside as Stored Procedures on the database, gives huge maintenance benefits.

Other advantages include having all your T-SQL held in the one place and knowing that you are aware of the impact of any changes without having to worry about dependancies elsewhere. Also, you will often be re-using code (eg for parameter datasets) and using a single stored procedure helps reduce duplication of effort (and probably performance benefits too).

Of course, there are downsides to this approach. If you need to introduce a new parameter to a report (which is passed to your dataset) then you have to change both the RDL and the stored procedure. I can see this being a slight irritation as you now have 2 deployments whereas holding the SQL code "inline" means a simple upload of the new report.

For me though, the former approach still wins and I advocate using Stored Procedures for Reporting Services datasets. I've been experimenting recently with putting Stored Procedures used for Reports into their own schema (acting as a namespace) although I can't categorically say whether this has been a success or not (Jamie Thomson (Blog | Twitter) has an interesting blog post which touches on Schema usage here).

Tuesday, 4 October 2011

T-SQL: Helping the optimiser can hinder

I've been meaning to blog about this for a while and noticing the (earlier than usual) T-SQL Tuesday #23 hosted by Stuart Ainsworth (blog | twitter) on the topic of Joins, I thought i'd cobble something together.

Essentially, this post is about reading the results of SHOWPLAN_TEXT, being aware of the different physical join operators and also how the optimiser can be influenced (poorly) by a bad WHERE clause.

I have the following query on a view which abstracts the 4 tables joined:

SELECT *, CAST(Val AS NUMERIC(22,16)) FROM DDL.vwData

The tables behind the view aren't really that important, as I just want to concentrate on the plans generated but basically, there is a lookup table for RateID and then a maindata table which joins to this table (via other intermediate tables). The key thing to note in the query though is the CAST - sometimes the column Val (from maindata) has a value which can't be converted - indeed running this query as is gives the error:

Msg 8115, Level 16, State 8, Line 2
Arithmetic overflow error converting numeric to data type numeric.

I know the data for RateID does have Vals which are CASTable to the decimal precision/scope and changing the query to only return those records where RateID = 1, returns a result set without errors:

SELECT *, CAST(Val AS NUMERIC(22,16)) FROM DDL.vwRateData WHERE RateId = 1

However, if I "improve" the query to filter out these records further to only return data from the last month using a non-sargable clause, i get that pesky overflow error again:

SELECT *, CAST(Val AS NUMERIC(22,16)) FROM DDL.vwRateData WHERE RateId = 1 AND DATEDIFF(D,GETDATE(),[Date])< 365

So whats different? First glance suggests something is wrong because if the whole dataset for RateId=1 returns without error, then choosing a further subset of this data should also work.

I delved into the query plans to find out and the clue was there:

The Good query generated a plan that was using a Nested Loops inner join. In a nutshell, the first thing this query does is filter out the correct rows from the lookup table (RateID = 1) and then iterates through the main table where there is a match on that RateId. In other words, it doesn't matter whether or not the Val column for other RateIDs is CASTable as this plan only touches RateID = 1.

On the other hand, the Bad query generated a plan that was using a Hash Match join. This differs in that it will read all the rows in both the tables, and thus attempting to run the CAST function over every record. Only later in the query plan does it do the filtering on the RateID.

The different types of JOINs (both logical and physical) are explained really well in this article but the "takeaway" from this post is that you need to be careful with your query construction as the optimiser may choose a plan that can influence both performance and even robustness. Logic may tell you that adding extra filtering criteria will help the optimiser choose a more efficient plan, but in this example it has chosen a plan that has caused the query to fail.

Wednesday, 10 August 2011

T-SQL: Beware the hardcoded switch

This months T-SQL Tuesday is brought to us by Adam Machanic (Blog|Twitter) - and as is his right, has chosen to go for Wednesday rather than Tuesday just to keep everyone on their toes. The subject is around c**p code, something which we're all guilty of in some form of another either as a result of laziness or lack of knowledge.

My contribution is fairly short and sweet today and something I'm often guilty of: The hardcoded lookup value embedded within a Stored Procedure or Function to control logic. You'll have seen this before (certainly if you've ever worked with me!)

IF @TheIDOfThisPerson = 15
EXEC dbo.UseThisStoredProcedure
ELSE
EXEC dbo.UseThisOtherOne

I often do this as a "temporary" hack to test some code or demonstrate a proof of concept, but it shouldn't come as a surprise to me when this finds its way into production. Fortunately, its been rare when the issue has caught me out in any serious way but I often find myself fixing stuff in a test environment after promoting code through the Development tiers because the ID values don't match across all servers. There is almost always a better way to achieve this and its often only an extra few clicks on the keyboard but somehow it still creeps into my game and other developers I work with.

I don't think anyone consciously CHOOSES to write something poorly but as we're constantly learning, its an inevitable byproduct. But lets embrace it. Just as long as we don't beat ourselves (or each other) about it, we can use it in a positive way as a learning point. While a developers goal is often to write less code, I prefer to focus on writing less C**P code and I don't think you have to sacrifice one over the other.

Tuesday, 12 July 2011

T-SQL: ORDER BY on SELECT not INSERT

This post is contributing to T-SQL Tuesday #20 hosted by Amit Banerjee blog | @banerjeeamit on the topic of Best Practices.

I recently was called in to help one of the developers with an inconsistency they were having with an SSRS report. Essentially, thre report was "randomly" producing incorrect results but it yet it was impossible to faithfully recreate the issue. As is a developers way, he was on the attack against SQL Server suggesting it was a bug with the software. I must admit, I was a bit confused and my mind was wandering way ahead of itself checking for Service Pack or Hotfixes that may address the issue.

And then I remembered to go back to first principals and check that the basics were all covered so I opened up the function used in the report.

CREATE FUNCTION [dbo].[tvf_GetData](@Id INT)
RETURNS 
    @Results TABLE (
        [ID] INT NULL,
        [Output_ID] INT NULL,
        [Value] DECIMAL (30, 15) NULL
        )
AS
BEGIN

   INSERT INTO @Results
   SELECT Id, OutputId, Val
   FROM dbo.tbl1
   WHERE Id = @Id
   ORDER BY Id -- ORDERING HERE ON THE INSERT
   
   INSERT INTO @Results
   SELECT Id, OutputId, Val
   FROM dbo.tbl2
   WHERE Id = @Id
   ORDER BY Id -- ORDERING HERE ON THE INSERT
   
   RETURN

END
GO

The report was highly dependent on the results of the above query coming back in a particular order. As you can see from my comments in the function body, the developer had attempted to force the order of the data by inserting it into the temporary table in a specific order. The mistaken belief that the data would come out in the same order as it went in was at the crux of this issue. The query calling this function did not have an explicit order statement and so it was left to the optimiser to make the call on how the data was returned. In this instance, the ORDER BY clauses on the insert statements are more of a hinderance than a help as they may adversely affect performance (ordering can be an expensive operation to perform) without adding any benefit.

The simple way of resolving the issue was just to add an explicit ORDER BY clause when querying the function:

SELECT *
FROM [dbo].[tvf_GetData]()
ORDER BY Id ASC

Ordering Best Practice

I don't care how many times I hear things along the lines of "you don't need an ORDER BY clause if you have a clustered index". I'm of the opinion that if you don't explicitly specify an ORDER BY clause then you are taking your chances if you rely on data coming out in a particular order. Sure, there may be emipirical evidence to suggest that not using one doesn't make a difference in certain scenarios but I like to err on the side of caution.

Not using one essentially puts you at the mercy of the query optimiser and although its a very mature and effective engine, its behaviour is out of your control. Should MS release a patch which effects how it interprets queries then your whole system would be compromised. If you have an explicit ORDER BY clause then your chances of being affected are negligible.