Friday 26 November 2010

T-SQL: Working with FTS - Stopwords & Filestream

I last worked with Full Text Search back on SQL2000 in a company which used the feature extensively. However since then, i've not had cause to use it in anger and I thought it high time that I refreshed my knowledge. Many things have changed since back then, notably the new FILESTREAM storage and also the reworking of noise words - now called Stopwords - and its these 2 features that i'm going to look at here.

First up, i configured a sandbox environment (I already have a basic C# windows console app that can add the files to the database which i may make available in another post).

CREATE DATABASE Sandbox
GO
/* set up the database/table to store the BLOBs */
ALTER DATABASE Sandbox ADD FILEGROUP Sandbox_FS CONTAINS FILESTREAM
GO
ALTER DATABASE Sandbox ADD FILE (NAME= 'Sandbox_FS', FILENAME = 'C:\SQLData\Sandbox_FS')
TO FILEGROUP Sandbox_FS
GO
CREATE TABLE dbo.SearchTable
(ID UNIQUEIDENTIFIER ROWGUIDCOL NOT NULL UNIQUE, SearchBlob VARBINARY(MAX) FILESTREAM NULL, FileExtension VARCHAR(10))
GO
CREATE FULLTEXT CATALOG Sandbox_FS_FTCat;
GO
CREATE FULLTEXT INDEX ON dbo.SearchTable (SearchBlob TYPE COLUMN FileExtension Language 1033)
KEY INDEX UQ_SearchTable_ID ON Sandbox_FS_FTCat
WITH CHANGE_TRACKING AUTO;
GO

-- add some BLOB data to my table using C# app

SELECT * FROM dbo.SearchTable WHERE CONTAINS (SearchBlob,'"munson"')


So the first issue I came across was with regard to my sandbox SQL installation. I hadn't changed any of the default startup accounts for the SQL Full-text Filter Daemon Launcher so it was running under the LOCAL SERVICE account. When i ran a full text query, i got the following error:

Msg 30053, Level 16, State 102, Line 2
Word breaking timed out for the full-text query string. This can happen if the wordbreaker took a long time to process the full-text query string, or if a large number of queries are running on the server. Try running the query again under a lighter load.

A quick google search showed that this is a fairly common error with a very simple fix as highlighted in this forum post. Essentially, the LOCAL SERVICE account needs to be part of the SQLServerFDHostUser$$MSSQLSERVER group so I added that, restarted the services and hey presto, we were in business. So rerunning my query:

SELECT * FROM dbo.SearchTable WHERE CONTAINS (SearchBlob,'"munson"')
-- 1 row returned

Next up, its time to configure the stoplist:

-- create the stop list
CREATE FULLTEXT STOPLIST BadWords;
GO
-- add the "bad word" to the stoplist
ALTER FULLTEXT STOPLIST BadWords ADD 'munson' LANGUAGE 'English';
GO
-- associate the stoplist with the fulltext index
ALTER FULLTEXT INDEX ON dbo.SearchTable SET STOPLIST BadWords

SELECT * FROM dbo.SearchTable WHERE CONTAINS (SearchBlob,'"munson"')
-- no rows returned

I've got to say, that i'm impressed with the revamp of the fulltext feature. In the "good ol'" days managing noise words/stop words was quite finicky, editing text files and restarting services whereas now it can all be managed using intuitive T-SQL commands. Good work MS.

No comments:

Post a Comment

/* add this crazy stuff in so i can use syntax highlighter