Friday 26 November 2010

T-SQL: Working with FTS - Stopwords & Filestream

I last worked with Full Text Search back on SQL2000 in a company which used the feature extensively. However since then, i've not had cause to use it in anger and I thought it high time that I refreshed my knowledge. Many things have changed since back then, notably the new FILESTREAM storage and also the reworking of noise words - now called Stopwords - and its these 2 features that i'm going to look at here.

First up, i configured a sandbox environment (I already have a basic C# windows console app that can add the files to the database which i may make available in another post).

CREATE DATABASE Sandbox
GO
/* set up the database/table to store the BLOBs */
ALTER DATABASE Sandbox ADD FILEGROUP Sandbox_FS CONTAINS FILESTREAM
GO
ALTER DATABASE Sandbox ADD FILE (NAME= 'Sandbox_FS', FILENAME = 'C:\SQLData\Sandbox_FS')
TO FILEGROUP Sandbox_FS
GO
CREATE TABLE dbo.SearchTable
(ID UNIQUEIDENTIFIER ROWGUIDCOL NOT NULL UNIQUE, SearchBlob VARBINARY(MAX) FILESTREAM NULL, FileExtension VARCHAR(10))
GO
CREATE FULLTEXT CATALOG Sandbox_FS_FTCat;
GO
CREATE FULLTEXT INDEX ON dbo.SearchTable (SearchBlob TYPE COLUMN FileExtension Language 1033)
KEY INDEX UQ_SearchTable_ID ON Sandbox_FS_FTCat
WITH CHANGE_TRACKING AUTO;
GO

-- add some BLOB data to my table using C# app

SELECT * FROM dbo.SearchTable WHERE CONTAINS (SearchBlob,'"munson"')


So the first issue I came across was with regard to my sandbox SQL installation. I hadn't changed any of the default startup accounts for the SQL Full-text Filter Daemon Launcher so it was running under the LOCAL SERVICE account. When i ran a full text query, i got the following error:

Msg 30053, Level 16, State 102, Line 2
Word breaking timed out for the full-text query string. This can happen if the wordbreaker took a long time to process the full-text query string, or if a large number of queries are running on the server. Try running the query again under a lighter load.

A quick google search showed that this is a fairly common error with a very simple fix as highlighted in this forum post. Essentially, the LOCAL SERVICE account needs to be part of the SQLServerFDHostUser$$MSSQLSERVER group so I added that, restarted the services and hey presto, we were in business. So rerunning my query:

SELECT * FROM dbo.SearchTable WHERE CONTAINS (SearchBlob,'"munson"')
-- 1 row returned

Next up, its time to configure the stoplist:

-- create the stop list
CREATE FULLTEXT STOPLIST BadWords;
GO
-- add the "bad word" to the stoplist
ALTER FULLTEXT STOPLIST BadWords ADD 'munson' LANGUAGE 'English';
GO
-- associate the stoplist with the fulltext index
ALTER FULLTEXT INDEX ON dbo.SearchTable SET STOPLIST BadWords

SELECT * FROM dbo.SearchTable WHERE CONTAINS (SearchBlob,'"munson"')
-- no rows returned

I've got to say, that i'm impressed with the revamp of the fulltext feature. In the "good ol'" days managing noise words/stop words was quite finicky, editing text files and restarting services whereas now it can all be managed using intuitive T-SQL commands. Good work MS.

Friday 19 November 2010

T-SQL: Can't Convert Dates in Indexed Views

Just thought i'd drop here an example of a view and how using CONVERT on a date in a view can invalidate it as a candidate to be indexed. Some CONVERT styles are classified as non-deterministic and it just so happens that the default style is one of these so if you use the CONVERT function in your views, you'll need to ensure you specify a valid style.

This is run on SQL2008.

CREATE DATABASE TEST
USE Test
GO
-- set up the sample DDL
CREATE TABLE dbo.a (Id INT NOT NULL PRIMARY KEY, dt DATETIME)
CREATE TABLE dbo.b (Id INT NOT NULL, sdt VARCHAR(100))
GO
-- and add some sample DML
INSERT INTO dbo.a SELECT 1, GETDATE()
INSERT INTO dbo.a SELECT 2, GETDATE()
INSERT INTO dbo.a SELECT 3, GETDATE()
INSERT INTO dbo.b SELECT Id, CAST(dt AS VARCHAR) FROM dbo.a
GO
-- create a basic view which is able to be indexed
CREATE VIEW dbo.vwAB
WITH SCHEMABINDING
AS
SELECT
a.Id, a.dt, b.sdt
FROM dbo.a
INNER JOIN dbo.b
ON a.Id = b.Id
GO
-- this works
CREATE UNIQUE CLUSTERED INDEX CIDX_vwAB_Id
ON dbo.vwAB (Id);
GO
-- change the view to add a convert
ALTER VIEW dbo.vwAB
WITH SCHEMABINDING
AS
SELECT
a.Id, a.dt, CONVERT(DATE, b.sdt) AS sdt
FROM dbo.a
INNER JOIN dbo.b
ON a.Id = b.Id
GO
-- now this doesn't work as the CONVERT is non deterministic!!
CREATE UNIQUE CLUSTERED INDEX CIDX_vwAB_Id
ON dbo.vwAB (Id);
GO
-- change the view to use a deterministic CONVERT style
ALTER VIEW dbo.vwAB
WITH SCHEMABINDING
AS
SELECT
a.Id, a.dt, CONVERT(DATE, b.sdt,103) AS sdt
FROM dbo.a
INNER JOIN dbo.b
ON a.Id = b.Id
GO
-- now this works as we're using a deterministic CONVERT style!!
CREATE UNIQUE CLUSTERED INDEX CIDX_vwAB_Id
ON dbo.vwAB (Id);
GO



BOL has an article which explains what the requirements are for using Indexed Views here.

And you can view more on the CONVERT function and which styles are deterministic here.

Friday 12 November 2010

Microsoft Community Contributor Award 2011!

On firing up my inbox today, i was intrigued to see an email from MS informing me, "Your online community contributions have been recognized by Microsoft". I assume that this must be in recognition of my activity on the MSDN forums and its nice to receive acknowledgment for it.

However, my involvment in the forums stems from an enjoyment of knowledge sharing and helping others with problems even if you don't have any form of relationship with them. In my jobs, i love working in teams and helping people develop their skills and its a great feeling when you all pull together, share information, bounce ideas and solve a problem as a team. Its just something that floats my boat, pure and simple. Jonathan Kehayis has written a great article on the subject of mentoring and its something i can certainly relate to.

http://sqlblog.com/blogs/jonathan_kehayias/archive/2010/09/03/the-importance-of-a-mentor.aspx

Of course, there is a second side to the coin when working the forums and that is strengthening my own knowledge. Seeing some of the great/creative ways of solving problems that many of my peers frequently demonstrate is both inspiring and educational. It also helps to revisit some of the concepts that I know but haven't put in practice in a while (SQL Server moves so fast its tough even to tread water!!).

I'd recommend anybody to spend time in the SQL Server forums. They are a fantastic source of information and I think a superb way of getting to know the product and how it is used in real life scenarios. You just can't get that information from a book.

Friday 5 November 2010

SSRS: SQL2008 RDLC

I thought i'd experiment with writing a WPF app which can expose some SQL2008 reports that i've developed. Notwithstanding the workaround required to use ReportViewer in WPF (see here) it appears that due to the timeline of when VS2008 and SQL2008 were developed/released, you can't use ReportViewer 2008 with SQL2008 reports!! See the quote below from this forum thread:

"Visual Studio 2008 was released much earlier than SQL Server 2008, so ReportViewer 2008 is based on the 2005 version of RDL. A SQL Server Reporting Services 2008 server report is based on the 2008 version of RDL, so, it can't be degrade to 2005 RDLC".

Bit of a pain, but it sounds like this should all be fixed up in Visual Studio 2010.
/* add this crazy stuff in so i can use syntax highlighter