Tuesday 12 July 2011

T-SQL: ORDER BY on SELECT not INSERT



This post is contributing to T-SQL Tuesday #20 hosted by Amit Banerjee blog | @banerjeeamit on the topic of Best Practices.

I recently was called in to help one of the developers with an inconsistency they were having with an SSRS report. Essentially, thre report was "randomly" producing incorrect results but it yet it was impossible to faithfully recreate the issue. As is a developers way, he was on the attack against SQL Server suggesting it was a bug with the software. I must admit, I was a bit confused and my mind was wandering way ahead of itself checking for Service Pack or Hotfixes that may address the issue.

And then I remembered to go back to first principals and check that the basics were all covered so I opened up the function used in the report.

CREATE FUNCTION [dbo].[tvf_GetData](@Id INT)
RETURNS
@Results TABLE (
[ID] INT NULL,
[Output_ID] INT NULL,
[Value] DECIMAL (30, 15) NULL
)
AS
BEGIN

INSERT INTO
@Results
SELECT Id, OutputId, Val
FROM dbo.tbl1
WHERE Id = @Id
ORDER BY Id -- ORDERING HERE ON THE INSERT

INSERT INTO @Results
SELECT Id, OutputId, Val
FROM dbo.tbl2
WHERE Id = @Id
ORDER BY Id -- ORDERING HERE ON THE INSERT

RETURN

END
GO


The report was highly dependent on the results of the above query coming back in a particular order. As you can see from my comments in the function body, the developer had attempted to force the order of the data by inserting it into the temporary table in a specific order. The mistaken belief that the data would come out in the same order as it went in was at the crux of this issue. The query calling this function did not have an explicit order statement and so it was left to the optimiser to make the call on how the data was returned. In this instance, the ORDER BY clauses on the insert statements are more of a hinderance than a help as they may adversely affect performance (ordering can be an expensive operation to perform) without adding any benefit.

The simple way of resolving the issue was just to add an explicit ORDER BY clause when querying the function:

SELECT *
FROM [dbo].[tvf_GetData]()
ORDER BY Id ASC



Ordering Best Practice

I don't care how many times I hear things along the lines of "you don't need an ORDER BY clause if you have a clustered index". I'm of the opinion that if you don't explicitly specify an ORDER BY clause then you are taking your chances if you rely on data coming out in a particular order. Sure, there may be emipirical evidence to suggest that not using one doesn't make a difference in certain scenarios but I like to err on the side of caution.

Not using one essentially puts you at the mercy of the query optimiser and although its a very mature and effective engine, its behaviour is out of your control. Should MS release a patch which effects how it interprets queries then your whole system would be compromised. If you have an explicit ORDER BY clause then your chances of being affected are negligible.

No comments:

Post a Comment

/* add this crazy stuff in so i can use syntax highlighter