We Can Pivot If You Want To

Screen Shot 08-01-14 at 10.59 PMSo I get that I am a little late to the party here; however, I find myself having to pivot from time to time and am always looking back for code as a refresher.  While pivot is pretty cool, it always takes me a second to connect all the dots so I hope this post help do that.

So, with Pivot, we want to take data that looks like the data on the right–there are multiple rows for each ProductId and would like to get each product ID on one line with the quantities from each Bin.

When we are finished, the data will look like this.

Screen Shot 08-01-14 at 10.59 PM 001

One row for each product id

In my example, I am using the [Production].[ProductInventory] table in the adventureworks2012 database.  To make a row become columns, we must know ALL possible values we want to make as a column.  We specify these in the FOR line of the pivot syntax.  This query will create the columns for each value I specified in the Shelf column

SELECT [ProductID],[A],[B],[C],[D],[E],[F],[G],[H],[I],[J],[K],[L]
	FROM [Production].[ProductInventory]
	 PIVOT
	 (
		SUM (Quantity)
		FOR SHELF IN ([A],[B],[C],[D],[E],[F],[G],[H],[I],[J],[K],[L])
	 ) AS PVT 

Screen Shot 08-01-14 at 11.13 PMI should see something like this.  We are getting closer–we have our columns, but we still a line for each record.  We only want to see one line per product id.  This is where we put the columns we want from the table and the pivot syntax together in an almost sub-query like format and select the columns as we want them to display.

The syntax is something like this.  The first two rows are the columns I want, the second two are the columns I need from the table, and then add the pivot syntax followed by an order by and presto, you have your data the way you want it.

SELECT [ProductID],[A],[B],[C],[D],[E],[F],[G],[H],[I],[J],[K],[L]
	FROM
	(SELECT [ProductID], Shelf, Quantity
	FROM [Production].[ProductInventory]) p
	 PIVOT
	 (
		SUM (Quantity)
		FOR SHELF IN ([A],[B],[C],[D],[E],[F],[G],[H],[I],[J],[K],[L])
	 ) AS PVT
	 ORDER BY [ProductID]

Finding Active Heap Tables

As part of a review of a system, I look for tables without clustered indexes; however, invariably I will provide the list of tables to a development team and they will say “These are old tables we don’t use anymore”, so I am forced to provide additional statistics about the number of times a heap table is used in the database.  Based on a check found in Brent Ozar’s sp_blitz script, I pull usage information about the heap tables.  If there is no usage, they are not reported.

EXEC dbo.sp_MSforeachdb 'USE [?]; 
      SELECT DB_NAME(), t.name as TableName, ius.user_seeks, ius.user_scans, ius.user_lookups, ius.user_updates, ius.last_user_scan
      FROM [?].sys.indexes i 
		INNER JOIN [?].sys.objects o ON i.object_id = o.object_id 
		INNER JOIN [?].sys.tables t ON o.object_id = t.object_id
        INNER JOIN [?].sys.partitions p ON i.object_id = p.object_id AND i.index_id = p.index_id 
        INNER JOIN sys.databases sd ON sd.name = ''?'' 
        LEFT OUTER JOIN [?].sys.dm_db_index_usage_stats ius ON i.object_id = ius.object_id AND i.index_id = ius.index_id AND ius.database_id = sd.database_id 
      WHERE i.type_desc = ''HEAP'' AND COALESCE(ius.user_seeks, ius.user_scans, ius.user_lookups, ius.user_updates) IS NOT NULL 
      AND sd.name <> ''tempdb'' AND o.is_ms_shipped = 0 AND o.type <> ''S''';  

Happy Hunting

Dropping a publisher when the distributor is down

Working on a new server, I came across a replication process that had been forgotten about and the pieces were disconnected.  The distributor server/process had been installed on another server and when I went to remove the publisher from the database, I got the below error.

DECLARE @publication AS sysname;
SET @publication = N'SmokeTest';

-- Remove a transactional publication.
USE [ReportServerTempDB]
EXEC sp_droppublication @publication = @publication;

OLE DB provider “SQLNCLI10″ for linked server “repl_distributor” returned message “Login timeout expired”.OLE DB provider “SQLNCLI10″ for linked server “repl_distributor” returned message “A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online.”.
Msg 11001, Level 16, State 1, Line 0
TCP Provider: No such host is known.

The linked server was pointing to a server that no longer existed and I thought I was stuck.  I tried lots of options, but ultimately came across this blog that demonstrated how to forcefully remove the publisher when I couldn’t connect to the distributor.

Spoiler: The trick is to use the @ignore_distributor=1 parameter and you should be good to go.

DECLARE @publication AS sysname;
SET @publication = N'SmokeTest';

-- Remove a transactional publication.
USE [ReportServerTempDB]
EXEC sp_droppublication @publication = @publication<code>
,@ignore_distributor=1</code>;

Coalesce instead of Max

I was asked to tune a query which was used in an export of registration data. A person registers for an activity and they answer questions about this. This query pulls from 19 tables–one of which is a one to many. The export creates a column for each row in the many table. It is on this subject I would like to focus.  For simplicity, I have reduced my examples to two tables. My setup looks something like this.

QuestionAnswerSetup

The tsql below will create this. Our end goal is to match the questions answered by each registrant to the registration data.

CREATE TABLE Registration
( RegID    INT
, RegistrantFirstName    VARCHAR(25)
)
INSERT INTO Registration (RegID, RegistrantFirstName)
VALUES (1, 'Carlos'),
(2, 'Wayne'),
(3, 'Jeff')

CREATE TABLE RegistrationAnswer
( RegID    INT
, RegQuestion    VARCHAR(25)
, RegAnswer    VARCHAR(25)
)

INSERT INTO RegistrationAnswer (RegID, RegQuestion, RegAnswer)
VALUES (1,'Favorite Color','Red'),
(2,'Favorite Color','Blue'),
(3,'Favorite Color','Red'),
(1,'Favorite Food','Pizza'),
(2,'Favorite Food','Chimichangas'),
(3,'Favorite Food','Hamburgers')

The previous logic used the Max() Function to break the rows into column. Seemed to work well.

--Old Way
SELECT Registration.RegID, RegistrantFirstName, Answers.*
FROM Registration
LEFT OUTER JOIN (SELECT MAX(CASE WHEN RegQuestion='Favorite Color' THEN RegAnswer ELSE null END) AS [Favorite Color]
, MAX(CASE WHEN RegQuestion='Favorite Food' THEN RegAnswer ELSE null END) AS [Favorite Food]
, reg.RegID
FROM [dbo].[Registration] reg
INNER JOIN [dbo].[RegistrationAnswer] ans ON reg.regid = ans.regid
GROUP BY reg.RegID) Answers ON Answers.regID = registration.regid;

I found; however, that if I commented out the left outer join portion of the query (Remember, the real query has 19 tables) the export will suddenly become very fast. With that in mind, I began to look for ways of moving rows to columns and came across an example using the coalesce function. The result is the following. I use a CTE to get the records from the many table and then I use the coalesce function with my group by, replacing the max function.

--New Way
WITH Answer_CTE (RegID, RegQuestion, RegAnswer) AS(
SELECT RegID, RegQuestion, RegAnswer
FROM RegistrationAnswer
)

SELECT Registration.RegID, Registration.RegistrantFirstName, Answers.*
FROM Registration
LEFT OUTER JOIN (SELECT RegID
, coalesce(min(case RegQuestion when 'Favorite Color' then RegAnswer else null end),'') as [Favorite Color]
, coalesce(min(case RegQuestion when 'Favorite Food' then RegAnswer else null end),'') as [Favorite Food]
FROM Answer_CTE cte
GROUP BY RegID) answers ON Registration.RegID = answers.RegID;

What was the impact? I turned on Statistics IO and Statistics TIME ON and got the following results. I ran each of them several time so I could get everything set (load into memory, cache plan, etc). These results are from the query I was tuning.  For readability purposes, I removed the read-ahead reads, lob logical reads, lob physical reads, and the lob read-ahead reads as each were zero for both queries.

Old Query

SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:
CPU time = 0 ms,  elapsed time = 0 ms.

SQL Server Execution Times:
CPU time = 0 ms,  elapsed time = 0 ms.

(1357 row(s) affected)
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0
Table ‘two’. Scan count 1, logical reads 6896, physical reads 0
Table ‘three’. Scan count 1, logical reads 56069, physical reads 0
Table ‘four’. Scan count 3, logical reads 4628, physical reads 0
Table ‘five’. Scan count 2, logical reads 10090, physical reads 0
Table ‘six’. Scan count 0, logical reads 0, physical reads 0
Table ‘seven’. Scan count 0, logical reads 4165, physical reads 0
Table ‘eight’. Scan count 0, logical reads 4520, physical reads 0
Table ‘nine’. Scan count 1489, logical reads 9299, physical reads 0
Table ‘ten’. Scan count 0, logical reads 4569, physical reads 0
Table ‘eleven’. Scan count 0, logical reads 3, physical reads 0
Table ‘twelve’. Scan count 0, logical reads 2, physical reads 0
Table ‘thirteen’. Scan count 1, logical reads 35, physical reads 0
Table ‘fourteen’. Scan count 1, logical reads 2, physical reads 0
Table ‘fifteen’. Scan count 1, logical reads 2, physical reads 0
Table ‘sixteen’. Scan count 335, logical reads 720, physical reads 0
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0
Table ‘seventeen’. Scan count 335, logical reads 721, physical reads 0
Table ‘eighteen’. Scan count 1, logical reads 22, physical reads 0
Table ‘nineteen’. Scan count 1, logical reads 2, physical reads 0
Table ‘twenty’. Scan count 1, logical reads 100039, physical reads 0

SQL Server Execution Times:    CPU time = 13093 ms,  elapsed time = 20205 ms.

The query ran in almost 20 seconds. The only change I made was to the CTE and the coalesce function instead of Max SQL Server parse and compile time:

SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:
CPU time = 0 ms,  elapsed time = 0 ms.

SQL Server Execution Times:
CPU time = 0 ms,  elapsed time = 0 ms.

SQL Server Execution Times:
CPU time = 0 ms,  elapsed time = 0 ms.

(1357 row(s) affected)
Table ‘Worktable’. Scan count 2714, logical reads 10947, physical reads 0
Table ‘two’. Scan count 0, logical reads 13422, physical reads 0
Table ‘three’. Scan count 1357, logical reads 17500, physical reads 0
Table ‘four’. Scan count 2, logical reads 8680, physical reads 0
Table ‘five’. Scan count 2, logical reads 10090, physical reads 0
Table ‘six’. Scan count 0, logical reads 0, physical reads 0
Table ‘seven’. Scan count 0, logical reads 4165, physical reads 0
Table ‘eight’. Scan count 0, logical reads 4520, physical reads 0
Table ‘nine’. Scan count 1489, logical reads 9299, physical reads 0
Table ‘ten’. Scan count 0, logical reads 4569, physical reads 0
Table ‘eleven’. Scan count 0, logical reads 3, physical reads 0
Table ‘twelve’. Scan count 0, logical reads 2, physical reads 0
Table ‘thirteen’. Scan count 1, logical reads 35, physical reads 0
Table ‘fourteen’. Scan count 1, logical reads 2, physical reads 0
Table ‘fifteen’. Scan count 1, logical reads 2, physical reads 0
Table ‘sixteen’. Scan count 335, logical reads 720, physical reads 0
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0
Table ‘seventeen’. Scan count 335, logical reads 721, physical reads 0
Table ‘eighteen’. Scan count 1, logical reads 22, physical reads 0
Table ‘nineteen’. Scan count 1, logical reads 2, physical reads 0
Table ‘twenty’. Scan count 1, logical reads 100045, physical reads 0

SQL Server Execution Times:    CPU time = 5797 ms,  elapsed time = 7196 ms.

This query ran in just over 7 seconds. Because these are exports and are not always in memory, I have found that there is larger discrepancy in the elapsed time the first time each query is run. For the purposes of comparison I ran them each multiple times before taking the stats.

Recursive CTE Query

A client came to me and had the similar to data set below.

Create
Table product (
productid        INT,
typeid              INT,
typeDesc        VARCHAR(15) )

INSERT INTO product VALUES (1,2,‘Male’),Product Set
(1,2,‘Female’),
(1,3,‘Red’),
(1,3,‘Green’),
(1,3,‘Blue’),
(1,4,‘Small’),
(1,4,‘Medium’),
(1,4,‘Large’),
(2,5,‘Small’),
(2,5,‘Medium’),
(2,5,‘Large’)

They wanted to make a single list of all possible choices for a given product.  For example, the first several rows for product 1 would start like this:

Large, Blue, Female
Large, Blue, Male
Large, Green, Female
Large, Green, Male
Large, Red, Female
Large, Red, Male

You would see every type combination for each product.  Product 2 would only have 3 lines.  They had tried a couple of other  joins and temp tables and they asked me to take a look because the query would have to be dynamic as the number of combinations would vary from product to product.  I was sure that I could use a CTE to resolve this.  I came across this Simple Talk article that helped me get the logic straight in my head.  Although I had been using CTE in my queries, this was the first time I really used it recursively.

The logic is to declare the CTE and the query you want and the inside the CTE use the UNION statement to then call the CTE again.  It will loop through and create the string.

WITH product_CTE ( productid, typeid, typeDesc, long_answer,length)
AS
(SELECT productid, typeid,CAST( typeDesc ASVARCHAR(8000)),CAST(ASVARCHAR(8000)), 0
FROM
product
WHERE
productid = 1

UNION ALL

        SELECT x.productid, x.typeid,CAST(x.typeDesc +‘, ‘+ cte.typeDesc ASVARCHAR(8000)),
CAST
( cte.typeDesc ASVARCHAR(8000)),length+ 1
FROM
product_CTE cte
INNER
JOIN (Select productid, typeid, typeDesc
                        FROM
product) X ON cte.productid = x.productid and cte.typeid < x.typeid)

 Select *
FROM product_CTE              
WHERE length= 2
Order BY typeDesc

Now there is a catch.  The recursion will show you all the possible combinations IE (LARGE, Blue) which is incomplete.  This is where the length comes in.  They way it makes sense to me is N-1 or the number of combinations, in this case 3 for product 1, minus 1.  This example will only give the rows for a specific productid and then you have to know that number of combinations for that product.

Anyway, I thought it was cool the way it came together.  On a side note, does anyone know of a widget I can use to post code?  It is pretty tedious in the editor and still doesn’t look good.

Powershell in SQL Server

I have yet to really scratch the surface in using Powershell to manage my SQL Server; however, from time to time I have used a powershell script to delete files from the server on an ongoing process.  Because I am still a Powershell novice, I get most of my scripts from the Scripting Guy.  He does a good job explaining what the variables do and mean and I invite you to check that post out.

This example is in SQL Server 2008.  Putting into SQL Server Agent is fairly straightforward.  I choose a new step and choose a Type of Powershell.  This script will delete all the files in my D:\Backups folder that are older that 5 days.