SQL Saturday 173 Washington DC

Saturday December 9th found me in Washington DC for SQL Saturday 173.  As we are planning an event in Richmond, I opted to volunteer more than I have in the past so I could be closer to the logistics and see what the organizers had done. #SQLSat173 was organized by Chris Bell, a first time organizer, with a little assistance from his wife.  :)

I knew putting on a SQL Saturday took a lot of work and the DC team put it all together nicely.  While the content of the sessions would make most attendees happy, the DC team provided a few extras to give the attendees lots of reasons to enjoy themselves–there was food all day long, a Microsoft Kinect in the lobby, a hot lunch from Maggiano’s, and they had a Kindle Fire raffle for those that printed their speedpass.  The DC team had wonderful sponsors that helped most of that possible and I hope to the RVA team can replicate some of that–with our own twist of course.

The DC team provided vests as a thank you to the speakers and volunteers.  Here I am sporting mine.  :) VestOh course, I did make it to several sessions and I had a great time.  My dad actually tagged along and enjoyed himself.

The one other treat was meeting with Karla Kay and chatting with her about some of the ideas we had for our event.  As always, the SQL community is wonderful and I hope to see you out on the trail!

Coalesce instead of Max

I was asked to tune a query which was used in an export of registration data. A person registers for an activity and they answer questions about this. This query pulls from 19 tables–one of which is a one to many. The export creates a column for each row in the many table. It is on this subject I would like to focus.  For simplicity, I have reduced my examples to two tables. My setup looks something like this.

QuestionAnswerSetup

The tsql below will create this. Our end goal is to match the questions answered by each registrant to the registration data.

CREATE TABLE Registration
( RegID    INT
, RegistrantFirstName    VARCHAR(25)
)
INSERT INTO Registration (RegID, RegistrantFirstName)
VALUES (1, 'Carlos'),
(2, 'Wayne'),
(3, 'Jeff')

CREATE TABLE RegistrationAnswer
( RegID    INT
, RegQuestion    VARCHAR(25)
, RegAnswer    VARCHAR(25)
)

INSERT INTO RegistrationAnswer (RegID, RegQuestion, RegAnswer)
VALUES (1,'Favorite Color','Red'),
(2,'Favorite Color','Blue'),
(3,'Favorite Color','Red'),
(1,'Favorite Food','Pizza'),
(2,'Favorite Food','Chimichangas'),
(3,'Favorite Food','Hamburgers')

The previous logic used the Max() Function to break the rows into column. Seemed to work well.

--Old Way
SELECT Registration.RegID, RegistrantFirstName, Answers.*
FROM Registration
LEFT OUTER JOIN (SELECT MAX(CASE WHEN RegQuestion='Favorite Color' THEN RegAnswer ELSE null END) AS [Favorite Color]
, MAX(CASE WHEN RegQuestion='Favorite Food' THEN RegAnswer ELSE null END) AS [Favorite Food]
, reg.RegID
FROM [dbo].[Registration] reg
INNER JOIN [dbo].[RegistrationAnswer] ans ON reg.regid = ans.regid
GROUP BY reg.RegID) Answers ON Answers.regID = registration.regid;

I found; however, that if I commented out the left outer join portion of the query (Remember, the real query has 19 tables) the export will suddenly become very fast. With that in mind, I began to look for ways of moving rows to columns and came across an example using the coalesce function. The result is the following. I use a CTE to get the records from the many table and then I use the coalesce function with my group by, replacing the max function.

--New Way
WITH Answer_CTE (RegID, RegQuestion, RegAnswer) AS(
SELECT RegID, RegQuestion, RegAnswer
FROM RegistrationAnswer
)

SELECT Registration.RegID, Registration.RegistrantFirstName, Answers.*
FROM Registration
LEFT OUTER JOIN (SELECT RegID
, coalesce(min(case RegQuestion when 'Favorite Color' then RegAnswer else null end),'') as [Favorite Color]
, coalesce(min(case RegQuestion when 'Favorite Food' then RegAnswer else null end),'') as [Favorite Food]
FROM Answer_CTE cte
GROUP BY RegID) answers ON Registration.RegID = answers.RegID;

What was the impact? I turned on Statistics IO and Statistics TIME ON and got the following results. I ran each of them several time so I could get everything set (load into memory, cache plan, etc). These results are from the query I was tuning.  For readability purposes, I removed the read-ahead reads, lob logical reads, lob physical reads, and the lob read-ahead reads as each were zero for both queries.

Old Query

SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:
CPU time = 0 ms,  elapsed time = 0 ms.

SQL Server Execution Times:
CPU time = 0 ms,  elapsed time = 0 ms.

(1357 row(s) affected)
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0
Table ‘two’. Scan count 1, logical reads 6896, physical reads 0
Table ‘three’. Scan count 1, logical reads 56069, physical reads 0
Table ‘four’. Scan count 3, logical reads 4628, physical reads 0
Table ‘five’. Scan count 2, logical reads 10090, physical reads 0
Table ‘six’. Scan count 0, logical reads 0, physical reads 0
Table ‘seven’. Scan count 0, logical reads 4165, physical reads 0
Table ‘eight’. Scan count 0, logical reads 4520, physical reads 0
Table ‘nine’. Scan count 1489, logical reads 9299, physical reads 0
Table ‘ten’. Scan count 0, logical reads 4569, physical reads 0
Table ‘eleven’. Scan count 0, logical reads 3, physical reads 0
Table ‘twelve’. Scan count 0, logical reads 2, physical reads 0
Table ‘thirteen’. Scan count 1, logical reads 35, physical reads 0
Table ‘fourteen’. Scan count 1, logical reads 2, physical reads 0
Table ‘fifteen’. Scan count 1, logical reads 2, physical reads 0
Table ‘sixteen’. Scan count 335, logical reads 720, physical reads 0
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0
Table ‘seventeen’. Scan count 335, logical reads 721, physical reads 0
Table ‘eighteen’. Scan count 1, logical reads 22, physical reads 0
Table ‘nineteen’. Scan count 1, logical reads 2, physical reads 0
Table ‘twenty’. Scan count 1, logical reads 100039, physical reads 0

SQL Server Execution Times:    CPU time = 13093 ms,  elapsed time = 20205 ms.

The query ran in almost 20 seconds. The only change I made was to the CTE and the coalesce function instead of Max SQL Server parse and compile time:

SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:
CPU time = 0 ms,  elapsed time = 0 ms.

SQL Server Execution Times:
CPU time = 0 ms,  elapsed time = 0 ms.

SQL Server Execution Times:
CPU time = 0 ms,  elapsed time = 0 ms.

(1357 row(s) affected)
Table ‘Worktable’. Scan count 2714, logical reads 10947, physical reads 0
Table ‘two’. Scan count 0, logical reads 13422, physical reads 0
Table ‘three’. Scan count 1357, logical reads 17500, physical reads 0
Table ‘four’. Scan count 2, logical reads 8680, physical reads 0
Table ‘five’. Scan count 2, logical reads 10090, physical reads 0
Table ‘six’. Scan count 0, logical reads 0, physical reads 0
Table ‘seven’. Scan count 0, logical reads 4165, physical reads 0
Table ‘eight’. Scan count 0, logical reads 4520, physical reads 0
Table ‘nine’. Scan count 1489, logical reads 9299, physical reads 0
Table ‘ten’. Scan count 0, logical reads 4569, physical reads 0
Table ‘eleven’. Scan count 0, logical reads 3, physical reads 0
Table ‘twelve’. Scan count 0, logical reads 2, physical reads 0
Table ‘thirteen’. Scan count 1, logical reads 35, physical reads 0
Table ‘fourteen’. Scan count 1, logical reads 2, physical reads 0
Table ‘fifteen’. Scan count 1, logical reads 2, physical reads 0
Table ‘sixteen’. Scan count 335, logical reads 720, physical reads 0
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0
Table ‘seventeen’. Scan count 335, logical reads 721, physical reads 0
Table ‘eighteen’. Scan count 1, logical reads 22, physical reads 0
Table ‘nineteen’. Scan count 1, logical reads 2, physical reads 0
Table ‘twenty’. Scan count 1, logical reads 100045, physical reads 0

SQL Server Execution Times:    CPU time = 5797 ms,  elapsed time = 7196 ms.

This query ran in just over 7 seconds. Because these are exports and are not always in memory, I have found that there is larger discrepancy in the elapsed time the first time each query is run. For the purposes of comparison I ran them each multiple times before taking the stats.