I was asked to tune a query which was used in an export of registration data. A person registers for an activity and they answer questions about this. This query pulls from 19 tables–one of which is a one to many. The export creates a column for each row in the many table. It is on this subject I would like to focus.
For simplicity, I have reduced my examples to two tables. My setup looks something like this.
The tsql below will create this. Our end goal is to match the questions answered by each registrant to the registration data.
[sourcecode language=”sql”]
CREATE TABLE Registration
( RegID INT
, RegistrantFirstName VARCHAR(25)
)
INSERT INTO Registration (RegID, RegistrantFirstName)
VALUES (1, ‘Carlos’),
(2, ‘Wayne’),
(3, ‘Jeff’)
CREATE TABLE RegistrationAnswer
( RegID INT
, RegQuestion VARCHAR(25)
, RegAnswer VARCHAR(25)
)
INSERT INTO RegistrationAnswer (RegID, RegQuestion, RegAnswer)
VALUES (1,’Favorite Color’,’Red’),
(2,’Favorite Color’,’Blue’),
(3,’Favorite Color’,’Red’),
(1,’Favorite Food’,’Pizza’),
(2,’Favorite Food’,’Chimichangas’),
(3,’Favorite Food’,’Hamburgers’)
[/sourcecode]
The previous logic used the Max() Function to break the rows into column. Seemed to work well.
[sourcecode language=”sql”]
–Old Way
SELECT Registration.RegID, RegistrantFirstName, Answers.*
FROM Registration
LEFT OUTER JOIN (SELECT MAX(CASE WHEN RegQuestion=’Favorite Color’ THEN RegAnswer ELSE null END) AS [Favorite Color]
, MAX(CASE WHEN RegQuestion=’Favorite Food’ THEN RegAnswer ELSE null END) AS [Favorite Food]
, reg.RegID
FROM [dbo].[Registration] reg
INNER JOIN [dbo].[RegistrationAnswer] ans ON reg.regid = ans.regid
GROUP BY reg.RegID) Answers ON Answers.regID = registration.regid;
[/sourcecode]
I found; however, that if I commented out the left outer join portion of the query (Remember, the real query has 19 tables) the export will suddenly become very fast. With that in mind, I began to look for ways of moving rows to columns and came across an example using the coalesce function. The result is the following. I use a CTE to get the records from the many table and then I use the coalesce function with my group by, replacing the max function.
[sourcecode language=”sql”]
–New Way
WITH Answer_CTE (RegID, RegQuestion, RegAnswer) AS(
SELECT RegID, RegQuestion, RegAnswer
FROM RegistrationAnswer
)
SELECT Registration.RegID, Registration.RegistrantFirstName, Answers.*
FROM Registration
LEFT OUTER JOIN (SELECT RegID
, coalesce(min(case RegQuestion when ‘Favorite Color’ then RegAnswer else null end),”) as [Favorite Color]
, coalesce(min(case RegQuestion when ‘Favorite Food’ then RegAnswer else null end),”) as [Favorite Food]
FROM Answer_CTE cte
GROUP BY RegID) answers ON Registration.RegID = answers.RegID;
[/sourcecode]
What was the impact? I turned on Statistics IO and Statistics TIME ON and got the following results. I ran each of them several time so I could get everything set (load into memory, cache plan, etc). These results are from the query I was tuning. For readability purposes, I removed the read-ahead reads, lob logical reads, lob physical reads, and the lob read-ahead reads as each were zero for both queries.
Old Query
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
(1357 row(s) affected)
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0
Table ‘two’. Scan count 1, logical reads 6896, physical reads 0
Table ‘three’. Scan count 1, logical reads 56069, physical reads 0
Table ‘four’. Scan count 3, logical reads 4628, physical reads 0
Table ‘five’. Scan count 2, logical reads 10090, physical reads 0
Table ‘six’. Scan count 0, logical reads 0, physical reads 0
Table ‘seven’. Scan count 0, logical reads 4165, physical reads 0
Table ‘eight’. Scan count 0, logical reads 4520, physical reads 0
Table ‘nine’. Scan count 1489, logical reads 9299, physical reads 0
Table ‘ten’. Scan count 0, logical reads 4569, physical reads 0
Table ‘eleven’. Scan count 0, logical reads 3, physical reads 0
Table ‘twelve’. Scan count 0, logical reads 2, physical reads 0
Table ‘thirteen’. Scan count 1, logical reads 35, physical reads 0
Table ‘fourteen’. Scan count 1, logical reads 2, physical reads 0
Table ‘fifteen’. Scan count 1, logical reads 2, physical reads 0
Table ‘sixteen’. Scan count 335, logical reads 720, physical reads 0
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0
Table ‘seventeen’. Scan count 335, logical reads 721, physical reads 0
Table ‘eighteen’. Scan count 1, logical reads 22, physical reads 0
Table ‘nineteen’. Scan count 1, logical reads 2, physical reads 0
Table ‘twenty’. Scan count 1, logical reads 100039, physical reads 0
SQL Server Execution Times: CPU time = 13093 ms, elapsed time = 20205 ms.
The query ran in almost 20 seconds. The only change I made was to the CTE and the coalesce function instead of Max SQL Server parse and compile time:
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
(1357 row(s) affected)
Table ‘Worktable’. Scan count 2714, logical reads 10947, physical reads 0
Table ‘two’. Scan count 0, logical reads 13422, physical reads 0
Table ‘three’. Scan count 1357, logical reads 17500, physical reads 0
Table ‘four’. Scan count 2, logical reads 8680, physical reads 0
Table ‘five’. Scan count 2, logical reads 10090, physical reads 0
Table ‘six’. Scan count 0, logical reads 0, physical reads 0
Table ‘seven’. Scan count 0, logical reads 4165, physical reads 0
Table ‘eight’. Scan count 0, logical reads 4520, physical reads 0
Table ‘nine’. Scan count 1489, logical reads 9299, physical reads 0
Table ‘ten’. Scan count 0, logical reads 4569, physical reads 0
Table ‘eleven’. Scan count 0, logical reads 3, physical reads 0
Table ‘twelve’. Scan count 0, logical reads 2, physical reads 0
Table ‘thirteen’. Scan count 1, logical reads 35, physical reads 0
Table ‘fourteen’. Scan count 1, logical reads 2, physical reads 0
Table ‘fifteen’. Scan count 1, logical reads 2, physical reads 0
Table ‘sixteen’. Scan count 335, logical reads 720, physical reads 0
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0
Table ‘seventeen’. Scan count 335, logical reads 721, physical reads 0
Table ‘eighteen’. Scan count 1, logical reads 22, physical reads 0
Table ‘nineteen’. Scan count 1, logical reads 2, physical reads 0
Table ‘twenty’. Scan count 1, logical reads 100045, physical reads 0
SQL Server Execution Times: CPU time = 5797 ms, elapsed time = 7196 ms.
This query ran in just over 7 seconds. Because these are exports and are not always in memory, I have found that there is larger discrepancy in the elapsed time the first time each query is run. For the purposes of comparison I ran them each multiple times before taking the stats.
Leave a Reply
You must belogged in to post a comment.