Thursday 9 February 2012

T-SQL: INSERT with XML shredding is very slow

Back in 2010, I came across a performance bug with SQL2008 SP1. When shredding XML using the nodes query, the performance is fine with a SELECT but when you want to INSERT the data to a table/temptable/tablevariable the performance becomes very poor.

The query I was having problems with was something like this (Thanks for the sample data wBob!):

DECLARE @xml XML;
-- Spin up some test data 
WITH cte AS 
( 
SELECT 65 x 
UNION ALL 
SELECT x + 1 
FROM cte 
WHERE x < 90 ) 

SELECT @xml = (
    
SELECT CHAR( a.x ) +
' & co.' AS "name"
    a.x
AS "value"
    
'FIELD' AS "paramdata/field"

    FROM cte a 
      CROSS JOIN cte b 
      CROSS JOIN cte c
    
FOR XML PATH('root'), TYPE
    
) 


DECLARE @ConfigData TABLE (ID INT, Name NVARCHAR(255), Value NVARCHAR(255), ParamData XML)

INSERT INTO @ConfigData (ID, Name , Value, ParamData) 
SELECT 1,
    
tbl.cols.value('name[1]', 'varchar(1000)'),
    
tbl.cols.value('value[1]', 'varchar(1000)'),
    
tbl.cols.query('./paramdata[1]')

FROM @XML.nodes('//root') AS tbl(cols) 
GO

I raised this with MS via Connect (#562092) and went on to implement my own workaround using the following CLR shredding function:

public partial class UserDefinedFunctions
{
    [SqlFunction(FillRowMethodName = "tvf_clr_FillParameterData",
        TableDefinition = "Name nvarchar(255), Value nvarchar(255), ParamData nvarchar(255)")]
    
public static IEnumerable tvf_clr_ParameterDataShredder(SqlXml parameterData)
    
{
        XmlDocument document
= new XmlDocument();
        
document.LoadXml(parameterData.Value);
        
return document.SelectNodes("//parameter");
    
}

    
public static void tvf_clr_FillParameterData(object row, out string outName, out string outValue, out string outParamData)
    
{
        XmlNode document
= (XmlNode)row;

        
outName = document.SelectSingleNode("name").InnerXml;
        
outValue = document.SelectSingleNode("value").InnerXml;

        
outParamData = null;
        
if (document.SelectSingleNode("paramdata") != null)
            
outParamData = document.SelectSingleNode("paramdata").OuterXml;
    
}
}
;

 
This served me well and solved the performance issue and all was fine. Until now.

The thing is, this method doesn't handle decoding of escape characters in XML, so any &s that exist in the XML string being shredded will be returned as &:amp. Not ideal. My first thought was to just implement a decoding function within the CLR routine but the main candidates use System.Web which is not available within SQL Server. I eventually stumbled upon this post from Jonathan Keyahias which provided a sql safe Encoding implementation and I was going to use this as a basis for writing my own decoding function. While waiting the 20mins for Visual Studio 2008 to open, it occured to me to revisit the Connect case and fortunately, there were a number of workarounds there, and a combination of the solutions the most effective.

INSERT INTO @ConfigData (ID, Name , Value, ParamData)
SELECT     1,
    
tbl.cols.value('(name/text())[1]', 'nvarchar(255)'),
    
tbl.cols.value('(value/text())[1]', 'nvarchar(255)'),
    
tbl.cols.query('./paramdata[1]') 

FROM @xml.nodes('//root') AS tbl(cols) 
OPTION ( OPTIMIZE FOR ( @xml = NULL ) )

The above simple modification to my SQL INSERT statement and performance was acceptable and more importantly, the decoding of the escaped characters was fine. Saved me a lot of work.

1 comment:

  1. This is where is all starts Powered by shredding, liveSite https www camelback net
    shredding,

    ReplyDelete

/* add this crazy stuff in so i can use syntax highlighter