Episode 85: Speedbumps Upgrading to 2016

Have you done your migration to SQL Server 2016 yet? Don’t worry—you aren’t alone. With all the glory of new features and improved performance, not all upgrades are equal. In this episode we talk with Javier Villegas who returns from episode 2 to talk about some of the issues he ran into on the upgrade to 2016. He stumbled on a process that normally runs for 7 minutes prior to the upgrade and then 26 hours after the upgrade. While it all worked out, he shares the issues he had and gives us a reminder that testing has not been deprecated.

 Episode Quote

Listen to Learn

  • Javier’s experience migrating to SQL Server 2016
  • Why using different production environments during migration is a good thing
  • Trace flags and cardinality estimators features and hurdles
  • Database settings and configurations in the new version

Javier on Twitter
Javier on LinkedIn
Javier’s Blog
Javier’s MVP Profile

About Javier Villegas

2016 upgradeJavier is a Microsoft MVP Data Platform and a member of the MSDN since July 15, 2008. He is also a PASS member since 2008. He works at Mediterranean Shipping Company as a DBA Manager. Javier has been dealing with SQL Server since version 6.5. He is from General Pacheco, Argentina. Besides being an SQL Server expert, Javier is a soccer fan cheering for his team the Velez Sarsfield.

Transcription: 2016 Speedbumps

Carlos: Javier, welcome to the program.

Javier: Thank you! Thank you, Carlos. Thank you, Steve for hosting me one more time.

Carlos: Yes, that’s right. Special thanks to you as being one of the first, the guinea pigs if you will. When I had this idea way back in May of 2015, maybe even before April, and went down to Buenos Aires and I said, “Hey, anybody want to record a podcast with me?” And Javier raised his hands so we’re glad we could have you back.

Javier: I remember that time, yeah.

Steve: And I know since that point in time we’ve down quite a bit to improve the process and how it all works to get the show out so we should have some good quality here. Also, didn’t you re release, re-mastered versions of some of those old episodes Carlos?

Carlos: We did. We’ve re-mastered them talking with Julien our editor and you can took a peek at them. Taking all the “uhms” and what not that I’m so consistently doing and helping balance some of that particularly in the beginning, the sound balance wasn’t all that great. Well so, Javier, our ultimate topic today is on migration to 2016. Some of your experiences with it and some of the road bumps that you kind of ran into. But first, before we kind of get into that, tell us a little bit of background about your organization and what you’re doing with SQL Server?

Javier: Ok. I am the DBA manager of a group of 4 DBAs at Mediterranean Shipping Company. What basically my team responsibilities are basically have the OLTP environment running fast, running always and running in good shape all around the world. Recently, we’re also introduce, we are also taking care of some analysis services in particular to tabular model for PowerBI which is something that we are beginning to so those are the tasks that I’m doing these days.

Carlos: Very nice, and yeah, probably from that Episode 2 I remember the story you told me that you had a data center in Chile. I don’t remember where you had to failover to but there was an earthquake there in the country and you ended up failing over. How many, there was like 2 minutes something like that you were able to failover, right?

Javier: Yeah, in a matter of a couple of minutes we made the failover from one location to another. Same in New Zealand for another earthquake and at that time we were using database mirroring. Now, the product evolved to availability groups always on and all the beauties that we know today.

Carlos: Sure. You must be quite the failover artist then to have done it again.

Javier: Yeah, you know, as every DBA we just try to avoid that to happen.

Carlos: Yeah, that’s right. To have to done it twice in two different countries, I don’t think there are too many people that can say that. 

Javier: I don’t know but the key is to once in a while do a failover test. Have a bunch of power users who help you on a Saturday morning so you can do the failover and recheck that the whole failover process works smoothly, that’s the ultimate goal. And you know, on every exercise that we do of this testing failover process we found stuff to improve basically the speed of doing this switch.      

Carlos: So ultimately our topic is on that conversion to 2016 and you made the change but it wasn’t always a smooth transition. You ran into a little bit of bumps. So why you tell us about that and some of the hurdles you had.

Javier: Yeah. Well, our main OLTP platform around the work was a mixture of SQL 2012 and 2014, right. And failover cluster instances with always on and all these things for higher availability and Disk I/O recovery. My company was part of the top program so we got involved with the bits very early in the process of the SQL 2006 release. So we were going back and forth with the practitioners with all these features that they planned to release. At some point, we have to validate our business core application with the releases that Microsoft was giving to us. We found some unexpected behavior. I mean just to give you a quick example, one of our most critical SQL shops for the business it runs every three hours and it generates some sort of a cache table to get some critical reports and the tables are used by other processes, right. As I said, that runs every three hours and normally takes 7 minutes to complete, right. When we were on this testing process suddenly those 7 minutes went into one hour.

Carlos: Wow, so what’s the problem there?

Javier: You know, the study gets even more choosy because I said, “Ok, maybe I need to do an updated statistics because this is a new environment. I just restore my databases.” So I took some time and I made an updated statistics for all the tables, and instead of one hour the end result was 26 hours. 

Steve: Wow. I’ve heard SQL Server 2016 just runs faster but apparently there are some edge cases that we need to work around. What did you end up doing with that?

Javier: Yeah. Well, you know, it was something that I spoke with the product managers at Microsoft and we raise the hand as soon as we found this because it’s unacceptable for our business, right. One day, or even more than one day just to get this table wants, this cache table wants so we send them all the call, we send them a copy of our database system. We started working together very thoroughly on this issue. Well, it turned out that our coding technique is basically using a lot of the table functions and we used table functions, not the inline table function, the regular one. Of course we don’t use scalar functions at all that’s a different discussion that we can have in another opportunity, but kind of forbidden. We don’t use it our own so we use table functions. And we use them like in a chain, like several layers of table function. One calling another and another maybe cross supply and joins with another table function, something like that, alright. The new cardinality estimator in 2016, it was just considering this like these calls to the table functions were like immediate so the estimation was unreal. Actually we’re not even doing an estimation. So the end result at that point was not to use the new cardinal estimator just turn on a trace flag to escape the CE. I mean, use all the other features for SQL 2016 that are great but temporary turn off the new CE.

Carlos: CE being cardinality estimator.

Javier: Yeah, definitely.

Carlos: So I guess the problem was then that the optimizer, right, the cardinality estimator was going in it and it was looking at those table functions because they were kind of nested if you will it didn’t pick the best plan. Or it was being too aggressive there perhaps.

Javier: Yeah. It just saying when the CE reached some certain level into these layers of table function it’s just considers them they return immediately and that was not the reality. That is why we switched from 7 minutes to one hour, right.

Steve: So then when you switched the trace flag to go back to the old cardinality estimator. Did it go back to that 7-minute runtime right away or did you have to do anything else?

Javier: Yeah, no, no, it’s just that. Also, another possibility is just keep the compatibility level for the user databases in a pen because actually this is not a new problem in 2016. This is something that is also happening in 2014. So you keep the compatibility level for your user databases in a 110. It’s like, you know, false entry to use the old or the legacy CE and we got back our 7 minutes.

Carlos: Ok. Yeah interesting and I knew they have made some changes there but I guess just like everything else you have to test it because you could be, and obviously they have use case potentially for that but that nested table functions kind of makes me “Woosh”. Get the heebeegeebees a little bit. You need to go through your code and test it in a new version.

Javier: I remember long time ago like in 2008 or something, me and my boss we were invited to the Microsoft campus to discuss about new releases of SQL etecetera. They kind of give us a presentation of table functions. Well we got in love with them and we decide to switch all our TSQL coding to that way of writing because as I said, we love it. It’s quite clean, clear and we just love it. We were just fine for years until we found this kind of hiccup. But you know the Microsoft is working quite hard on this case and others to make you through now. I know that in be next they are planning to do some other changes to the CE.

Carlos: Now are you using, are you doing anything with Linux or SQL Server in Linux? Or you guys are pure Windows?

Javier: In my production and Q&A environment we are using all Windows Server now switching to Windows Server 2016 because there are some particular features that we need. Which just to tell you in one sentence, we love the possibility of having Windows 2016 and SQL 2016 because we can do always on on machines that are not on the same Windows domain, right, that’s the main.

Carlos: Right, another feature.

Javier: Yeah. This is something that we were asking for years. Now we have it and now we are using it. But in terms of Linux, yes I have a bunch of labs using some. I believe that almost all the distributions Ubuntu, Red Hat. Now, I believe that, I have the version to run on SUSE Linux that is the ctp 1.2 Linux. That is where the CTP 1.2 that was released a few days ago. I’m validating stuff against Linux and actually I love it.

Carlos: Very cool.

Steve: With SQL Server 2016 one of the other big feature that was added then was the query store. Now, is that something that you’re making use of?

Javier: Wow, man. This is my favorite new feature in SQL 2016. This is something that you know, before. I mean, in terms of being a DBA and doing a troubleshooting on this case that you’d have something that for month. I report that always runs in let’s say 5 minutes, and suddenly one day it started running in 2-3 hours. And nobody make a change, nobody touched the code, etcetera. Those cases were extremely challenging to troubleshoot. But now with query store it’s just a simple click to have your previous or your optimum execution plan for that particular thing so, yes, I’m using it. As I said, this is my favorite feature in 2016, query store.

Steve: So for people who are new to 2016 and haven’t tried that out yet, any tips or recommendations that you would give or hints for people?

Javier: Yes. This is something that, as I said, I’m using it on my daily basic task and this is like the first version of query store, right, in SQL 2016 was the first time that we saw it. There are some things to improve like every feature. In this case the only thing that, I mean in the connect site I made this recommendation as many other SQL professional. Now query store is per database so you have to enable it, and maintain it, and queried per database. And you have all these performance information within the same database. The recommendations or the comments for Microsoft and I know that probably they should be working on this is just have the query store per instance, right, being able to store this performance information on a separate user database. This is something that I would love to see in future SQL versions.

Carlos: Right, to be able to get it out there.

Javier: Yeah, I mean beside that the current implementation is great because as I said, before it was like a blind task. And it used to take me long time to do this kind of troubleshooting when performance just change without any changes etcetera. Now it’s just, you know, go to the query store panel and I’ll just check the query plan for a particular and I’m forced to use the old one, right, it’s good.

Carlos: I know when they first, of course I was talking with Borko before it was actually released. 2016 was still kind of being baked if you will. They were working on some of the reporting and I’d admit that I haven’t got into it as much as maybe I should I have. But one of the, so the reporting and some of the charting that they were going to make available was still kind of coming together. Have they continued to make improvements and I know SP 1, or Service Pack 1, was the big focus was on the new features that being made available in the lower editions. Do they continue to make changes to that reporting, so that reporting features in the query store?

Javier: Well, yes. Honestly, I haven’t seen that very deeply. I also focusing on Service Pack 1 in other kinds of new reports like the one for always on. Those are new reports in where you can do better troubleshooting for example latency. Because you know, as I said, probably query store I believe that was the first feature that was completed before SQL 2015 RTM. That was already like long time ago. So even if it’s to be one it’s quite mature. Of course, as I said, I mean you give this to the DBAs and we ask for more, right. But in Service Pack 1 the new reporting for always on, I mean the dashboard for always on for availability groups is really improved. So those reports are now greater than, they provide you a better way of troubleshooting of always on. Before it was kind of a blind experience.

Carlos: Very cool, Borko will be very happy to hear that. PM manager we keep in touch from time to time just being, you know, from time to time. So he likes getting feedbacks, right, telling that people are enjoying the feature.

Javier: Yeah, yeah. No, I mean, Service Pack 1 was a hint. I mean something that I’m still trying to understand. It’s the first time that they bring down Enterprise features to other editions so, wow. They have these new features, the new DMBs for troubleshooting the improvement in always on is really great.

Steve: So another thing that, I know when we’re talking before the show that you mentioned was the new database settings or database setting that are scoped per database in 2016. What are you doing with those?

Javier: Yes. Well, database configuration. This is something that is one of the great features for DBAs because now you can set a bunch of configurations or settings that before you had just for the whole instance or per session. Now you can do it per database like for example specify the max DOP, the max degree of parallelism per database. This is something that is really powerful. You don’t have to use the max DOP hint or start playing with Resource Governor to specify a different max DOP. Also, there is another setting which is great, which is the legacy cardinality estimation. This is connected with the problem that I mentioned before. You can specify now per database to use the legacy cardinal estimator without specifying any trace flag or anything. You have your user database you will go to these CISCO database settings. You change the legacy cardinality estimator to “ON” and all the under that user database will use the old CE. Same for other setting like for example you can do a clear procedure cache per user database now which is something that is also good.

Steve: Ok.

Carlos: So those are all settings like inside the database properties if you will.

Javier: Yeah, from Management Studio you go to your user database. Right click “Properties” and you’re going to see them there. You have the default which is the ones that are configured for default, for all user database which of course are off just to try to don’t modify the behavior by default. Also, another one that you have which is also connected with the legacy cardinal estimation is the query optimizer hot fixes. You know that Microsoft with every service pack with every cumulative update they introduce changes to the cardinal estimator which by default they keep them disabled just to, you know, don’t change the behavior in something that is running within the same SQL version. Before this, or actually parallel you can turn on a trace flag and enable all those hot fixes for the CE. Now with database scope configuration you can enable those hot fixes in the CE per database which is great.

Steve: And so if you have an older database there or something that is not kind of going to work well with those changes you can then turn it off for that one and then turn it on for others and be in good shape.

Carlos: Yup, fine tuning.

Javier: Correct, yeah. Of course, I mean, all of these have to be carefully tested on, not in production, like that’s the advice. Don’t play with this in production but actually this is something that is easy to turn on and off.

Steve: Ok. Well, shall we move on to the SQL Family questions?

Javier: Yeah, I’m ready.

Steve: Ok, so how did you first get started using SQL Server?                       

Javier: Well, I mentioned this story in my previous episode, Episode #2. Like 20 years ago I started with the Microsoft courses with Windows Server 3.51 or NT4 and one of the elective courses that were there. I mean there were like 3 courses. One was Exchange, one was Active Directory, and the final one was SQL Server 6.5 at that time. So everybody had to choose the regular training plus one of these three. My first choice was Exchange and was full. There were no place for me so then I say, “Ok, Active Directory seems interesting.” It was also full. So they say, the SQL course has plenty of space so that was the one that was there for me. So I took and after a couple of days I say, “Oh men, this is so boring.” But then after the professional career changes and I got the opportunity to get focus on database engines at some point in one previous job that I had. It was like with multiple vendors in terms of databases and also operating system. I remember using with Sun and Oracle and Linux with MySQL and Windows with SQL. It was like a data center with all that zoo of environments. At some point I put my career into the Microsoft rail and I focus everything on Microsoft especially on SQL Server so that is how I got involved to that.

Steve: Do you ever wonder how different your career may have been had that Exchange class not been full?

Javier: I haven’t think about that too much.

Steve: Ok.

Javier: I actually, you know, I think that being a DBA is something that you have to love and you have to feel it because we all are kind of particular sort of people.

Steve: Yup.

Javier: So you have to love it and at some point long time ago I said, “Ok, this is what I love.” So let’s go ahead and try to do this and do it well.

Carlos: If you could change one thing about SQL Server would it be?

Javier: Wow. Well, I mentioned a couple of things regarding query store. But something that I will love to, I mean, first of all if you ask me this question like 6 months ago I will say multi platform. I have that thing already but now that I have, you know, this I will say make the failover process on availability group even easier in terms of being able to replicate to the secondary logins, the SQL and all the things that are not use of database in terms of settings, everything because right now you do an availability group failover for disaster recovery purposes. Even if it is a test or if it is real you have to do a lot of manual task to fix the logins you know. I have like in all the production environments that I have around the world I have like state-of-the-art Power Shell script that try to take care of synchronizing the logins plus the SQL plus the SSIS packages, etcetera. I would love to hit the failover process and just forget about it.

Carlos: Alright. What is the best piece of career advice that you’ve received?

Javier: Well, few years back I received this piece of advice which I always remember. I used it like on my everyday task since I’ve received it. And it is, “Try to be focused”. Just be focused. You know, it could be a bunch of things but being focused on what you’re doing and even if you’re, I mean, ultimately that’s the thing that will allow you to accomplish a task in a good way especially when you’re under pressure. So being focused on what you’re doing is a great piece of advice that I got.

Steve: Very good advice.

Javier: Yeah.

Carlos: Javier, our last question for you today. If you could have one superhero power what would it be and why do you want it?

Javier:  Wow, so this is something that I’ve been thinking a lot and probably it’s not going to be related or totally related to technical thing. And maybe it’s not a real power but this is something that I would really love to have. First of all have an additional arm and hand. And let me complete it, also, being able to, I mean, if you don’t have to sleep because of a reason you don’t feel it so. I mean, don’t have the necessity of sleep. This is not because I have to work at night, etcetera. These two things, the additional arm and hand plus not having the necessity to sleep is to being able to handle my two little boys. Yeah, one is 1-year old and the other one is almost five so that is why I need these superpowers.

Steve: Yes, having two sons myself. My wife and I having two sons definitely can understand that. Although, mine are a little bit older and it does get a little bit easier as they get older.  

Javier: Really. That is something good because now I know that if I wait it’s going to be easier.

Steve: Yup. But you’ll come across different things like the first time they dent your car.

Javier: Yeah. I’m far away from that.

Steve: Yes.                       

Carlos: Well, Javier, thank you so much for joining us again. Oh, sorry.

Javier: No, no, I just wanted to say that it’s not totally technically what I just said but it’s real.

Carlos: That’s right. Well, Javier, thank you so much for joining us on the program today.

Javier: It’s always a pleasure to speak with you, Carlos and Steve, so thank you for having me. And we’ll look forward to be again with you in the future.

Episode 84: MVP

The Microsoft Most Valued Professional or MVP award–while it’s hard work and dedication to be called an MVP, what are the benefits? Well, Kathi Kellenberger is a great member of the SQL community, traveling to many SQL Saturdays, speaking at conferences about SQL server, and this time she is here with us to tell us what it really means to be an MVP and the great journey it has been to becoming one.

Do you actually know the percentage between men and women MVPs in the world and US? How many posts do MVPs have on forums, helping others? How do they keep track of all the SQL Saturdays, User Groups, conferences they have been to? Kathi will be sharing her story, the process that she went through until she became an MVP, how you get nominated, and how to get renewed!

Episode Quote

“Do these things because you love to do them. Don’t do them because you want to be an MVP.”

“Women, you’ve got to get over your reluctance to promote yourself.”

“It’s not a job, nothing paid […] We’re kind of a free support for people in a way. A lot of what I like about it is that I can share my enthusiasm about SQL Server.”

Listen to Learn

  • What it’s meant to be an MVP?
  • Benefits of being an MVP.
  • MVP categories
  • The process and criteria on how to be awarded as an MVP.

Kathi’s Blog
Kathi’s Kerberos and SSRS course
Kathi on Twitter
Kathi on Pluralsight

About Kathi Kellenberger

Kathi is an independant SQL Server MVP and Teammate with Linchpin People. She is author of “Beginning T-SQL 2008” and co-author of “Professional SQL Server 2005 Integration Services,” “Beginning T-SQL 2012,” and “SQL Server MVP Deep Dives.” Kathi enjoys speaking and writing about SQL Server and teaches the occasional SQL Server class.

Transcription: MVP

Carlos: This is Carlos Chacon.

Steve: This is Steve Stedman.

Kathi: This is Kathi Kellenberger.

Carlos: So Kathi, welcome to the show.

Kathi: Thank you and thanks so much for inviting me. I’m really excited about it.

Carlos: Yes, I’m glad that we finally were able to put this together, and of course, you know, the rest as I say is history. But this is something that we’ve been trying to get together since October of 2015. It’s a long time of coming, we’re glad to finally have you on the show.

Kathi: Yeah.

Carlos: And as your Twitter handle suggests @Kathi, right. You’re a family member to all and a great member of our community and so we’re excited to chat with you.

Kathi: Thank you.

Steve: We have an interesting, oh, go ahead Carlos.

Carlos: No, no, go please.

Steve: And we have an interesting topic to cover today and that topic is?

Carlos: Yeah, we want to talk a little bit about the MVP, Microsoft MVP. What it means to be an MVP and what’s it meant to you and so I guess let’s go ahead and jump into that, and give is a little bit of your story there. How long have you been an MVP first, I guess? 

Kathi: Ok. I was first awarded the MVP in 2008, and then in 2010 I took a detour in my career and ended up at Microsoft for about a year and a half. So I lost it during that time and so then I got it back again in 2013 and have had it every year since then. Kind of a weird back and forth but once I was eligible again I got it again. That’s how long.

Carlos: Now ultimately looking at the MVP website, Microsoft is looking for two professionals to provide support and guidance if you will to the community kind of taking their time to help others along their way particularly with the Microsoft technology stack. I guess what do you see or kind of the qualities of being an MVP?

Kathi: Yeah, again, probably should clarify that. It’s not a job, nothing paid. It’s an award that Microsoft gives so I feel like it’s in some ways we’re an extension of Microsoft sales because the things that we do I’m sure help companies use SQL Server more effectively. We’re kind of a free support for people in a way. A lot of what I like about it is that I can share my enthusiasm about SQL Server. I have this problem where when I learn something, the first thing that I wanted to do is go tell somebody else about it. And this is a great way to be recognized for that because really it’s like a 20 hour per week unpaid job in some respects because I feel like that’s about how much time I spent a week on MVP type activities. Some of it you get paid because if you write a book you get paid a little bit but for the most part it’s like a part-time unpaid job.

Carlos: Interesting. I had never heard that number before, 20 hours a week, that’s quite a bit. But when you think about some of those other things, I mean, you mentioned about writing a book, blog posts, speaking in front of people, preparing a presentation.

Kathi: Right, yeah, you know Perl site courses. I volunteer at a non-profit to teach people about SQL Server, User Groups, SQL Saturdays. Yeah, you know, some weeks are more than others but being an independent consultant right now if I only am billing 15 or 20 hours that week I am going to fill in the rest of the week with the MVP type stuffs.         

Steve: So then when you’re dealing that and you’re doing your 15 or 20 hours that’s focused on that. Are you keeping track of that and somehow having to report that to Microsoft or is that just what you do to be out there and be recognized as an MVP.

Kathi: Right, right. You don’t have to keep track of it in an hourly type basis. I did keep track of it for about two months just because I was interested in how much I was staying. But then that time keeping, you know, that gets to be exuberant in itself so I just wanted an idea so I kept track. What you have to do as a current Microsoft MVP there is a website where you are to upload your activities. So if I speak at a SQL Saturday, I have to fill out a form and said that I spoke at SQL Saturday, there were 30 people in the room, you know, linked to it. So you have to record your activities and that’s for renewal. So let’s say you are awarded the MVP, you don’t have to do anything for the next year. You’re an MVP, you could drop off the face of the earth as far community stuff is concerned. There are no requirements but to get it renewed you have to do things. You do report but it’s not an hourly type thing.

Steve: Ok, so then the title MVP stands for Microsoft’s most Valuable Professional, is that the right?

Kathi: Something like that, yeah. Most Valued Professional I think. 

Steve: Yup. And then so you’ve done all these work and you need to keep doing that work in order to keep getting it year after year. I mean, that’s a lot of things you need to do there and what are the benefits that you gain for doing that?

Kathi: My favorite is a free MSDN subscription so it would include Azure credits. That’s probably my favorite because I can really use and have available to me very easily any of the developer tools, SQL, Windows Operating Systems, things like that for training, or experimenting, that type of things. That’s really my favorite and other probably my second favorite which I don’t always take advantage of is the MVP Summit up in Redmond. That’s the last couple of years that’s been like either overlapping, one year is overlapping the past summit, and the last couple of years it has been like a week after, couple of weeks after. But that’s just really awesome because you were on the Microsoft campus. You’re surrounded by, you know, in my case lots of my friends are MVP. I’m pretty lucky. Get to hang out with friends a lot more and learn about the new things that are coming up. That’s probably the third thing that’s a really great benefit is you get a lot of insider information about what’s coming up, what SQL version is next, or all of the things that are not public knowledge at this point. Obviously you got to keep those things to yourself but that’s pretty cool. They also have webinars from time to time about things that are coming up. In fact, the last year or so they’ve really increased the frequency of those webinars so if there is a topic you’re interested in just watch and chances are there will be information about what’s coming up, or maybe what’s in the next service pack, or what are the new features in the next version of SQL. There are really a lot of benefits that way.

Steve: Ok. Now, I’ve seen that there’s like different categories of MVPs in each country like you might have a Data Platform MVP or an Azure MVP. Am I understanding that correctly?

Kathi: Yes. In fact, last year they revamped all their categories. We were SQL Server MVPs and now we are Data Platform MVPs. Another really interesting thing is that you can now be awarded MVP in multiple areas so for example if somebody is really focused on SQL Server but they’re also really focused on Windows Server they might get an award in that area.

Carlos: Got you. Yeah, PowerShell is probably another category that overlaps quite a bit as well.

Kathi: Yeah, yeah, and I don’t know all the categories. I just look for what I do so at this point everything has been SQL Server. However, I’ve overlapped a tiny bit into the security area because I did a course for Perl site and I’ve spoken on it at a few places on Kerberos so I just want to have and put those in sure security. Whenever you add your activities you have to kind of put them in a category so through people that have different areas. Probably one of the big, I talked about the tangible benefits but for probably another thing is if you want to be a consultant this is probably not necessarily that great to if you are just like a database administrator at one company. I was when I became an MVP originally and my company did not care. They really care less. But as a consultant or somebody with your own company it is gold. It’s going to get you noticed and really be beneficial.

Carlos: One of the things or criteria they talk about creating on as well or is included if you will of contributions and that is form replies and answers. You know, so just having, you got somebody who searches forms and what not or put questions out on forms. I am always amazed at the people who can respond, the #sqlhelp is probably another good example, right? They’re just like, they’re waiting, right, and they just. I mean, almost it seems like and what’s also interesting is that they have really good answers. So couple of people that kind of jump to mind is Aaron Bertrand on Stack Overflow. I don’t know how many points he’s got. Robert Davis is another one. He’s probably on the SQL Help. Seems like he’s always charming and every so often. I have never been able to get into that because I guess maybe a knuckle dragger I guess I don’t know all that much.

Kathi: Yeah, I’m sure you knew. 15 years ago I was on SQL Server Central forums all the time and answering lots of questions. In the 90s, I was on whatever was back then for Visual Basic and constantly answering questions and I don’t know if as I’ve gotten older I can’t multitask as well. I just don’t get, not only that, not only forums but the same people are on Twitter and on Facebook. And all these things all day long and I’m like, “Man, I’ve got work to do”, so that is one area where I did not contribute much. I’m very grateful for the people that do contribute but I do not. I’m more likely DB writing, recording, or presenting. Those are the things that I’m going to do but as far as answering questions, man. And the bad thing I really like doing the TSQL questions on at one time and actually at Pragmatic Works when I was there. They really encouraged us to answer question when we were on the bench. The problem is which I wasn’t really on the bench that much luckily. But the problem is for me someone puts a question out there. Maybe they have the schema and the example and then they, maybe they don’t so you have to asked them about it and you become so, it’s such a commitment to the question. You know, getting out the information from them and maybe you got a solution. Well then you’ve got a, you know, create a little temporary database, not a temporary database, create a little database and populate it and try to work it out. I just feel like it’s too much of a commitment so I don’t do it. But yeah, I pretty much stick and obviously Microsoft does look at that. That’s one of the things they look at but luckily I do enough writing, recording and presenting, and volunteering that doesn’t hurt me that I don’t answer questions.

Carlos: Yeah, the metric they give on the MVP side is this guy from Brazil. He had 728 helpful posts.

Kathi: That’s awesome.

Carlos: That must be total, over the year, 185 answers and 407 posts. Like, holy cow.

Kathi: I guess you can do that. You know, I do remember at one time and I was at jobs. I was getting paid and I was able to do some of that but, you know, I can’t.

Carlos: Sure, that’s true.

Kathi: I can’t, so I don’t know maybe I just can’t multitask anymore.

Steve: You know, I feel that same way too. I think that if I’m going to go and answer questions on Stack Overflow or whatever out there I’ve got to focus on that. And that’s something that I can’t really do when I’m doing work that I’m getting paid to do and there’s got to be that separation there. That bench time that as you mentioned it, if someone has a lot bench time well they might have more opportunity there but I think it’s a balance to figuring that out.

Kathi: Yeah, absolutely.

Carlos: Part of putting together those presentations or even the courses, I mean, you mentioned your book and having just put a book out myself kind of that time and energy that it takes to do that. But it also kind of helps you like the side benefits there is that you that become more familiar with that topic and can talk to it more rather than this specific TSQL problem that they’re having or they’re trying to loop or, you know, whatever the specific implementation is.

Kathi: Right, right, yeah. I just like what I do to scale. Occasionally people will send me a question personally. I will try to help them if I can.

Carlos: Right. And that’s different, if they send it to you personally that’s different.

Kathi: Right, yeah and sometime. I just had a gentleman who I don’t know if they were not familiar with SSIS but they had wrote a post on my blog site about SSIS, and had a couple of questions and I was able to help them very easily with that question and I felt good about that. But sometimes I feel like and I haven’t had anything like this for quite a while but sometimes I feel like they’re trying to get free consulting out of you and it is just too much. There are times whenever I’m asked personally and I say, you know, this is bigger than we can do, you know email it. Maybe you want to put this question out on this other forum for people. You know, if somebody sends me a question about replication or something. I mean, I’ve used it but I’m not an expert on replication. That would be better to go ask that on a replication forum.

Carlos: Yeah, there’s a fine line, right there. You know, particularly as the MVP, you mentioned the time you have to put in and then kind of chasing some of these specific problems when all the hundreds of myriads of variations that could take place can be difficult.

Kathi: Yeah, and that time just to clarify, that’s just an estimate of what I do. I don’t know what other people do and there is no requirement. That’s just pretty much what I do on average.

Steve: Ok now, you’ve mentioned that you’ve been an MVP twice and that you originally have that and then you went to Microsoft and you became an MVP again after that.

Kathi: Right.

Steve: Now, how did you go about getting that? Is that something that somebody nominated you for or what is that process and how does that happen?

Kathi: Right, so what usually happens, and this is kind of funny, but you will be at past summit or SQL Saturday and people start asking you, “Hey, are you going to the MVP Summit this year?” And then you will say, “Well, no, because I’m not an MVP.” And then they’ll say, “Well, you should be.” And then they will nominate you so that happens a lot. I guess I can say who nominated me originally because this happened in, I guess around 2007, people started saying this to me. Andy Leonard, and there were like three or four people who said I’m going to nominate you. Well, one person did and that was Andy Leonard. And I knew you just had him on a podcast the other day because I’ve listened to a part of it. He’s such a sweetheart but he nominated me. Of course I didn’t get it the first time. It took a couple of times for it to go through. But that’s usually it, you fill out this form. You can nominate yourself but that’s kind of weird but somebody who thinks that you deserve it will nominate you, and then you may or may not get contacted from Microsoft. And if you do then they’re going to ask you to fill out a form listing all your activities for the last year, and then you may or may not hear anything back again. It is kind of weir because it just seems like, if you don’t hear anything. If you’re not having a conversation back and forth don’t expect to get the email on the MVP Award Day so what I would think, you know. I would think you probably have some conversation back and forth with them. But even if you do that is no guarantee that you’re going to get it.

Steve: Yup, so mentioned the MVP Award Day and I saw recently, right after the first year, I think there are a bunch of MVP renewals and new MVPs out there. Is that an annual thing or a quarterly thing? How does that work?

Kathi: It’s quarterly. So it’s January 1st, April 1st, July 1st and October 1st. The really funny one is October 1st because it’s, not October, but April 1st because it’s April Fools’ Day, you know, that the really funny one.

Steve: We could all be MVPs for a day on April 1st.

Kathi: We could be, we could be, yeah, so it’s a pretty exciting day for you if you’re up for renewal or you’ve been nominated. You’re going to be checking your email every 10 minutes. But the frustrating is it comes out late morning or early afternoon. Usually because it come out, so they’re coming out of Redmond probably so that’s specific time so, you know, you’re waiting. It is an exciting day if you get renewed, so far so good. I need to tell you something really cool about me in particular with MVP. So I first received an MVP in 2008 and at that time I asked the MVP lead, kind of liaison at Microsoft that deal with it. I’m like, “How many women are MVPs of SQL Server MVPs?” He gave me these numbers. He knew at that time they were, this is 2008, there were 4,000 MVPs in the world of any area.

Carlos: Total?

Kathi: Total. At that time I think there are around 250 or 260 SQL Server MVPs at that time, and I don’t what it is now. And there were 60 SQL Server in the U.S. and 6 of those were women. So I was like, “Wow!” The numbers are just fluctuating constantly partly because Microsoft keeps hiring MVPs. They’ve hired a bunch of them recently.

Carlos: That’s true. They have been on the hiring spree.

Kathi: Right, but I think women have, you’ve got to get over your reluctance to promote yourself. I’ve had that. I feel like women or girls are taught from early on, you know, work really hard and be really smart but don’t let people know about it. I feel like we’ve been taught that from childhood and so I had to get over that and start learning how to really promote myself.

Carlos: eah, that’s interesting. As the father of four girls, that’s probably something I need to key in on because, yeah.

Kathi: Yeah, we were really taught to be exceptional but hide it. You know, I saw that in my own daughter when she was a teenager. Especially if boys were around, all of a sudden she was not quite as smart. Yes, you know.

Carlos: Interesting.

Kathi: Yes, because she doesn’t want to be intimidating. But I think that one of the luncheons at past summit one year they talked about becoming an MVP and they started and saying, “Hey, you need to let people know what you’re doing.” And then there were quite a few of us that became MVPs around that. I don’t remember what year it. I think the number is still pretty small compared to the guys. But I think we’re represented probably about the same percent that actually is working in the field so I guess that’s not bad.     

Steve: When you talk about letting people know what you’re doing or promoting yourself, what type of things did you start to do differently at that point in time?

Kathi: Yeah, I think it was a gradual process. But what I would do is, like say for example, I have a book coming out. I would contact Steve Jones and say, “Hey, I want to write an article that’s going to promote my book.” So I started to think of ways to promote myself. I’m not much on Twitter but if I have a new blog post, or have a new Perl site course, or whatever I’m going to get out there and promote that. I think part of it is speaking. Trust me I’ve got a C in Speech in college and I was a very shy, terribly shy. And speaking in front of people is not something I would do but I made myself do it. Those are ways to I think at least that I did myself out there. I look for opportunities that can help me in multiple ways is really what I do.

Steve: Ok, so how many times in a year would you say that you generally speak at a SQL Saturday or other big events.

Kathi: Let’s see, darn.

Carlos: Put you on the spot there.

Kathi: Yeah, yeah, I’m going to say at least 15. You know, I’ve had to go back out to my MVP thing because I have everything tracked there. But I try to go to at least 6 SQL Saturdays a year. I’ve spoken at past summits every year for the last four. I don’t know how long my luck is going to hold out there but somehow I’ve managed to do that. I speak a lot for the local User Group, plus I do a lot of remote so I’m going to say between 15 and 20 presentations a year.

Steve: Oh, that’s really good out there. I mean, that’s more than one a month and that’s a big commitment.

Kathi: Oh yeah, but you know what, to me once I started speaking it became a little bit addictive. It’s thrilling, you know.

Steve: I think I have that same feeling on the addictiveness of speaking. It’s just the travel side of it that slows me down.

Carlos & Kathi: Yeah.

Steve: Ok, so I guess at this point then is there anything else that you want to share with us?

Kathi: Yeah, I think that making MVP is a pretty, there’s a pretty high bar to make MVP and I don’t want people to be discouraged if they don’t make it. Keep doing the things that you’re doing. When I was at Microsoft even though I wasn’t eligible to be an MVP I continued to do the same kinds of things I was doing because I love it. Do these things because you love to do them. Don’t do them because you want to be an MVP. I think an MVP is got to be, the MVP award is a side effect of doing these things you love. I don’t love posting on forums but I’ll do it. But luckily if for some reason I was not going to be an MVP anymore. You know, that’s always a possibility, would I stop writing books, you know, maybe. But I’d still be writing blogs. I’d still be doing Perl site courses and speaking just because I love to do those things.

Steve: Ok, that’s sounds great. Shall we move on to the SQL Family questions then?

Kathi: Sure.

Steve: Can you share with us how you first got started using SQL Server?

Kathi: Yeah, it was kind of crazy. I had just become a developer. I was a mid life career changer and this was 1997 and I was working in a really tiny company that there were only four of us at this little startup. And they had bought a package of training courses for this other guy that worked there. And he was wanting to take, now they call them Systems Center, but one of those type classes Operations Manager, one of those type classes and the course just kept getting canceled and the expiration date was coming up and they said, “Kathi, how would you like to take one of these courses at this training company because it doesn’t look like this guy is going to use it.” So I said, “Sure.” And I saw there was this SQL Server class and I want to had took that class so that was early 1998. So if it wasn’t for that fact that that guy couldn’t get his Operations Manager class in. I don’t know how my life would have changed. I’m sure it would have got in you know. So that was a SQL Server 6.5 class, so that was my first actually getting to do anything or seeing it. And then shortly after that, well then I started actually using it, implementing it at customers.

Carlos: Sure, interesting. Now, in all that time when you were on SQL Server, if there was one thing you could change about SQL Server what would it be?

Kathi: Well right now, since Service Pack 1 came out and all of these cool features have been downed. I don’t know what you want to call it now in Standard. There is one feature I’d love to see back ported down to Standard Edition, and that’s read-only replicas and AG so that you can do availability groups in Standard and you can actually do that in 2016 before Service  Pack 1. But they’re only, they’re only for fail over. You know, I would just love to have this read-only replica in Standard. That’s what I would love to see.

Steve: Well, perhaps we’ll see that in the next Service Pack or the next version, who knows.

Kathi: Yeah, I’m not going to hold my breath though.

Steve: Ok, what is the best piece of career advice that you’ve received?

Kathi: Yeah, so like I said was a mid life career changer. I had automated an auction at my kids Grade School. I was a pharmacist at that time but I automated it with that Microsoft Access. And this other parent saw what I was doing and he worked for a company that produces training materials for technology. And he saw what I was doing and he said, “You could be a developer.” That was really the first time anybody, this was like early 90s and somebody said something like that to me, “You could be developer”. And he, you know, got me some training materials, encouraged me to take a certification test. So I would say that was probably the best advice, just someone saying, “Hey, you could actually do this.” That really got me started down this path.

Steve: And that was actually triggered the change from Pharmacist to IT side of things, developer, SQL Server? 

Kathi: Yeah, there are a lot of things that happened along the way. Probably the first thing that happened was I was in my last year Pharmacy college, working my Bachelor in Pharmacy back then. And there was a computer course where they had TRS-80s in the room and the first time I saw a little program, we had to type a little programs in, and record them on a tape recorder. The first time I actually typed a little program in that TRS-80 I was hooked from that moment on. It was like a light switch went off in my brain. I had programming and things like that as a hobby to just write some things for home up until I started doing some things for my kid’s school. And then really that automating that auction was the project that got me started down the path, you know, before I felt like it was something that was out of reach.

Steve: Very interesting.

Kathi: Yeah.

Carlos: Our last question for you today Kathi. If you could have on superhero power what would it be and why would you want it?

Kathi: So I’m going to change this up a little bit. I’m not really into superheroes that much but I love Star Trek.  I love Star Trek and I watched the entire Star Trek: Next Generation when I was on the treadmill last year. And this year I’m double dipping. I’m watching Voyager and Deep Space 9 both at the same time this year. I always love Star Trek but I didn’t get a chance to watch it a while when my kids were little, so busy, so now I’m actually getting to do it. Seeing episode here and there but my favorite technology on Star Trek is the replicator. If I could have a replicator and it’s going to look like a little microwave that I can just say to it, “Computer, you know, tea hot, Earl Grey ”. You know, they can get you any food that you want or any piece of clothing that you want just like that. I just think that would just be amazing. That’s what I want.

Carlos:  There you go. I think Amazon would pay a lot of money to have that not be built.

Kathi: You’re right. But we have the 3D printers now that are not quite the same as replicator technology but we’re heading down that road. People are printing all kinds of things, you know, so. Someday we will be there. In fact, Apress got a book. I don’t have the title in front of me but Apress got a book about Star Trek technology.

Carlos: Really?

Kathi:  Yes. And how they came up with the ideas, where the Science is today, where it’s going. There’s a chapter on each technology.

Steve: Oh yeah, it’s really quite amazing how much technology has come out of things that were surely science fiction. Like the communicator, and then we have the flip phone and now flip phones aren’t even new technology anymore.

Kathi: Oh yeah, these little computers in our hands that we walk around with are better than the SIRS I was using in the late 90s. You know, they’re amazing. They’re better than what they had on Star Trek, you know. Everything you could ever want to know is right in there. Then you can to talk it to if you want to.

Carlos: Well Kathi, thank you so much for joining us. It’s been a pleasure to have you.

Kathi: It’s been a pleasure for me. It was a lot of fun.

Steve: Yeah, thanks Kathi, this has been fun.

Kathi:  Yup. Thank you so much.

Episode 83: Data Visualizations

We’ve all been there – you have some data and you need to make it pretty for the report. With the all the new reporting options you have probably been tempted to find a new way to visually present information.  In this episode of the podcast, we are talking about data visualizations and our guest, Jonathan Stewart, has some issues with the way we currently go about it.  Jonathan has been on the show as a panelist and I have really enjoyed his perspective on things so we invited him on the program to discuss some ways you can go about improving the data visualization of your reports, without adding additional burden on process.

Psychology plays a big role in data visualization. To make a good report, before even starting your tool, you should be able to summarize what the report presents. One of the major aspects of psychology is color and how we perceive it. Jonathan explains how we can use this while visualizing data.  We discuss color blindness and I haven’t heard this discussed too much elsewhere and I was curious to know how we can solve this problem with an easy button.

 Episode Quotes

“One of the major aspects of psychology is just color, right? What colors you’re using? And this goes back to my storyboard and you see everything comes back to storyboard, right because that’s like. Actually I have everything done before I ever open a tool. Even the colors picked out should be done before I ever picked a tool.“

“Don’t think of it as, you know, I have to this extra work. Think of it as you’re doing it to prevent extra work. If you develop a framework, you develop your own storyboard and you get your stuff there“

“Out of 30 people in the room, there’s 3 people at least in that room that are colorblind. And then too, there’s not just one type of colorblindness. There are multiple spectrums.“

“We’re datafying everything. We datafy everything from our footsteps, to our calories, to our heart rate, to traffic on the road, we datafy everything because we like to be able to see and analyze things.“

“Are we using the numbers to prove our bias or we’re using the numbers to disprove our bias and to show the proper thing?“

Listen to Learn

  • How to do a storyboarding before even opening a tool to create a report
  • How to include storyboarding in data visualization process without investing too much time
  • How you can prevent additional time spent on report modifications
  • Color psychology behind data visualizations
  • How to create a report that colorblind person can understand
  • How the quality of the source data affects the end result
  • Predictions how the reports and data will be used in the future

Jonathan on Twitter
SQL Locks LLC
Edward Tufte and data visualization
Colour Blind Awareness
See you charts as a color blind person would
Building a data visualization storyboard

About Jonathan Stewart

Data VisualizationJonathan is a SQL Server professional of 18 years. His experience ranges from BI architecture, data modeling, data warehousing, database architecture, database administration and strategic IT planning for Fortune 100 companies. He sees data in everything and is constantly looking for ways to organize, read and analyze whatever data set he can get his hands on. He is passionate about data and making sense of it. He is passionate about using data to effect positive change in his local community of Columbus, Ohio. When not doing data things, you can find him mentoring youth in his community.

Transcription: Episode 83

Carlos: This is Carlos Chacon.

Steve: I’m Steve Stedman.

Jonathan: I’m Jonathan Stewart.

Carlos: Jonathan, welcome to the program.

Jonathan: Thank you for having me. I appreciate you having me on. It’s a great honor.

Carlos: Yes, always interested at chatting with you. And ultimately our topic of discussion today is data visualizations. Now, this isn’t necessarily a reporting discussion but ultimately I guess the way we present and potentially consume reports or as we say data visualizations. So I guess let’s jump into that. And you have an interesting story why is this an important topic for you.

Jonathan: Well, first off one of the issues that I’ve always seen is that whenever we look at data visualizations first thing we think about is a report, a chart, the tool whether Excel or PowerBI, Tableau, Qlik doesn’t matter.

Carlos: Sure, SSRS, a lot of our people using SSRS.

Jonathan: Yeah, Reporting Services. And that’s how we see it but one of their problems, you know, we end up with problems of was the report right and stuff like that. All those problems we end up having with it can be hedge upfront if we look at it from a different view. I begin to look at things from the focal point, the first point. Look at it from the problem and then begin to look at it as a story versus just a report. And how can we get the story better, how can we tell the story better? So I begin to look at, I’ve always been a psychology junkie, so I always love stuff like how these colors make us feel, how this, you know, things like that. All these things begin to build in me about visualization so I begin to study it more and that became almost my focus. I was already a business intelligence consultant so I was already building reports and stuff like that so I got to see things of how people receive certain reports. You know, why pie charts suck. I got to see all these things as I was building out my career and building my focus. It became my interest. It’s a huge interest to me. Everything from how we perceive the world, color blindness, all kind of stuff like that begin to take shape and I get to see it. And as I learn more, and this is just me in general, as I learn more I want to share more to people. I was like, “Oh, this is amazing. Let me share this with them. This is amazing.” So then it became, you know, somebody was like, “You should just speak about it”, so that’s actually one of the things.

Carlos: There you go. Help us here because we’ve had discussions on this show about many of the different reporting options that are available to us, right? And ultimately while there are different needs that’s why there are so many different tools. Why are we fumbling, on what ways are we fumbling when we actually talk about the presentation layer if you will?

Jonathan: One of the major issues that I see in the problem of visualization is that the current tools are sold as self service BI, self service this, self service pumping your gas, and stuff like, of self servicing. The pumping your gas joke is something that Edward Tufte in one of his blogs he made a comment about. He has a huge issue with people like Qlikview, Tableau, even PowerBI, you know you have our tool you can make this. He’s like it’s not like pumping gas. You can’t just take data and just show it because if it has no context, if you don’t know what it means then it is still is a bad visualization.

Carlos: Got you, so we need to be able to better represent what the data means when we present it.

Jonathan: Right, so one of my focus is that I bring it into, like when I go to a client and they’ve never see things like this before I actually storyboard before even I open up the tool. I’ll go and I‘ll interview. If I get a request, I need this, so I’ll go and I’ll sit and talk with them. And it will be a legit interview, almost like I’m building full blow data warehouse. I’m interviewing businesses and stuffs like that. One of the first questions that I ask is, “What problem are you trying to solve with this report?” So before I even open up the tool, well, one of the things too that we do. When we open up a toll, as soon as we start coding something or writing something whether it’s five minutes, or an hour, or 5 days a week whatever, if it’s wrong we really don’t want to change it because it’s our baby, right? Like we don’t want to change it. We write a certain procedure. I really don’t want to just throw it away. We want to modify it. So even before I get to opening any tool, it doesn’t matter what the tool is, I want to make sure that I understand the problem.

Steve: So one of the things you mentioned there was Tufte and I know, I’ve looked at some of his books and I’ve seen some truly amazing visualizations. And I know one of the disconnects that I’ve had is going from here’s the tools we’re given, SSRS or whatever, or maybe that we’re using to, how do we get from that to some of these amazing things that you’re seeing there? And that’s where; I mean even with PowerBI, there’s a lot more features in there. But to really go to some of the stuff that he shown, it’s a big jump.

Jonathan: Right, right. And I think that one of the things as I can get more people to think about in terms of a story, storyboarding, you know, you’ll get to that. So you better see what’s the problem we’re trying to solve? You know, who’s the audience? What action are we trying to take? And then what type of mechanism? Are we doing PowerPoint or PowerBI then, understanding your tone, stuff like that. Are we showing success or failure? Once you have all these questions answered then you can say, “Ok, how can I display that properly?” And then you can start getting to those amazing visualizations that Tufte and all those people show then you can start showing all those things. But once you have all that stuff answered upfront then you can solve. I think one of the problems that we try to jump to that is we say, “Oh ok, I think I can show that in this amazing histogram.”

Carlos: Yeah, exactly. I’ve got some data now let me just throw it in this visualization tool.

Jonathan: Right, and try to make it look pretty, and then it ends up not being what we wanted. It may not be exactly what we want and we try to tweak it. It can still in that being wrong. One of the examples that I use is you have somebody panting for gold out in California. And over the course of a week they have 10,000 tries and over the 10,000 tries let’s they say they find 147 pieces of gold. And so now somebody says, “I want a report of that”. And you say, “Ok, well what do you want in a report? Do you want me to show all 10,000 tries?” You may not want to see all 10,000 tries. You may only want to see the 141 successes. But you may be somebody who’s developing the tools that did the try so yeah you do want to see all 10,000 tries. So which is the right answer? And a lot of times too when somebody gives us a report requirement it may be three reports that we may have to develop, two reports. But we don’t know that until we actually go back and actually interviewing this and figure out who the audience is and stuff like that because at the end of the day it’s not just show us the data.

Carlos: Right, and we’ve talked about this before on the show and kind of maybe the new wave or the new technologist another differentiator is that I think a lot of times like you say,  we’re technologists I have the data I’m going to toss it into the report. I’ve done my job kind of a thing. The new wave or I think what executives are looking for. We’ve had CEOs on this show talking about what they’re looking for in data people. And they want people that understand the data. They want people who understand the business, right? Why are we collecting data this certain way? And I think this could then be a major differentiator to be able to go in and say, “Well you know what I am the guy who develop, use your Gold Rush example. I am the guy who, I understand how to develop this gold developing software, process whatever therefore I can give you better insights into that 10,000, or know how to display that kind of a thing.”

Jonathan: Right, and then two, know who your audience is for too because if you just don’t report for another person who’s going to be out there panting for gold that’s a whole different report than your CEO. Understanding all those things upfront before you ever open up the tool will help ensure that your visualization becomes useful because at the end of the day that’s the whole point to it.

Carlos: Right, that’s not the only problem.

Steve: So one of the things that I see working with clients specifically around reports is that oftentimes you’ll get a request that comes in from one of the customer of the client or an internal customer of the client that says, “Take an existing report that’s like a simple grid type report, and add to it or modify it or do these things to it.” And oftentimes what we run into there is that people know what they’ve seen and they know how to ask for alterations to what they’ve seen but they don’t necessarily know to ask for what they’re really looking for. Do you run into that similar?

Jonathan: I do and I have a couple of tricks that I will share. One of the things I do is like you said, you see that a lot. We have Report A, we want to extend Report A this and this. Once I have all these can I make a one sentence summary of it. Kind of make a three minute story of that. If I take what I have there and I give it back to them and they say, “No, no, wait. That’s not really what I want.” That’s what you will get when you start engaging the user by saying, putting stuff until a story is formed because we understand stories because we like stories. So once you start explaining that to them, they say, “Oh, wait, wait, that’s not it.”

Jonathan: One of the things that we always run across is that we think we know something and somebody may have, not that we don’t have, so it will be I think they know that. And it happens with all of us. We say, “Do such and such.” And we think that everybody may know all of the answers and they don’t. Putting it into a story form and presenting it back then we say, “Ok is this, and this…” And they’ll say, “Oh, wait I forgot to tell you such and such.” So you’ll prompting them to give you more information as well before you ever modify that chart.

Carlos: You’re now assuming that they know something that they may not know, or you know you’re working on different assumptions.

Jonathan: Right, because at the end of the day regardless of what kind of visualization you’re doing. What, you’re doing an infograph, simple chart, histogram doesn’t matter. You should be able to summarize that in words real quickly to whoever is presenting it to you and they just say, “Yeah, that’s exactly what I want.” Whether it’s the grain, everything should be summarized quickly to your user back and say, “Ok, yeah, that’s exactly what I want.” And at the end you could open your tool.

Steve: Sounds like some great advice there. You mentioned psychology earlier, can you tell us a little bit about how psychology plays into what you do with reporting and visualizations?

Jonathan: Yes, definitely. One of the major aspects of psychology is just color, right? What colors you’re using? And this goes back to my storyboard and you see everything comes back to storyboard, right because that’s like. Actually I have everything done before I ever open a tool. Even the colors picked out should be done before I ever picked a tool. So you go back to the tone of your visualization. Are you showing, you know, did you company lost $10 million last quarter? Do you really want to show that with happy colors and stuff like that? Maybe, right, you know. Right, maybe? But you probably don’t want the whole report to be red either. The color connotations of red, you know, how red makes us feel? How certain things we have innate biases to it. So like the color red, the color red has a connotation of power, energy, war, strength and stuff like that. So we use that in that aspect and that’s great. But it can also be violent, and brutal, overbearing, anger. It’s all about understanding the color that we use when we’re using it. And you see a lot of stuff, the visualization, you see the psychology of it and things like company logos, right? Like there’s a reason why majority of company logos have red, blue or white in it, if you’ve ever thought about that? Majority of company logos have red, blue and white because they wanted to show, you know, the red they want to show power, passion, strength, desire. The companies that have white want to show purity, and innocence, and simplicity. And the blue, like Microsoft obviously, right. They want to show confidence, and intelligence, and trust, stuff like this. So they’re already doing these things, you know, showing you these type of things. Even in movies there are certain actors, directors that use color to show feeling because we automatically have these innate feelings of a color when we see them so not going too far from the subject of visualization. We want to go back and make sure that we’re showing, that we’re using the right color. We are not using green because we like the color green. We’re suing green because we want to show nature, or down to earth, growth, vitality. But we got to be careful with it too because it can also be devious, and selfish and greedy as well. So want to make sure we’re cautious on how we use colors in the certain times. You can take a perfect visualization that shows exactly what the user wants and change the colors, and it changes the whole meaning of the report.

Steve: It makes me think I work for a company years ago where one of the marketing directors had basically a rule that nowhere on the company website would green ever be used. I never understood why but thinking about some of the things you’re talking about here maybe they have some kind of psychology behind not using green, and they fail to share that with the development team.

Jonathan: Well, what was the industry they are in?

Steve: It was in sort of self service websites like back in around 2000-2005 sort of early on.

Jonathan: Yeah, because I was trying to think of what kind of negative meaning has green would have or something like that, and they have them in something else that they were thinking about. Yeah, there’s all kind of reasons why people want to do that and then you get the stuff that’s. One of the funny things is like the color orange. Orange is historically a color that you either love or hate. But kids are drawn to it like that’s why Nickolodean’s Logo is orange, right. it signifies like happiness and stuff like that so you get things like Amazon’s logo and stuff to.

Carlos: Well, so it’s funny you mentioned that because people would ask me growing up like what my favorite color was, and I could never really pick one. But I always enjoyed like orange candies and flavor so I chose orange.

Jonathan: Because kids love orange, and this one of those things people don’t think about it until you’re like, “Oh yeah, you know what that’s true.”

Carlos: I think there’s one other area that I want to jump into but I guess I want to maybe stop and say. Well, look all of these sounds well and good and I think yes it make sense. Like we want to present the best data because at the end of the day we want to make sure that our user are happy, that people are understanding the information well. But all of a sudden this sounds like a lot of work. An additional task that I have to that now take on, right, in addition to this. Where’s my easy button, right?

Jonathan: Alright, so here’s your easy button for you.

Carlos: Ok, here we go.

Jonathan: Preventing rework, number one, because at the end of the day if your visualization is wrong you can do it again then you just wasted everything, right? Don’t think of it as, you know, I have to this extra work. Think of it as you’re doing it to prevent extra work. If you develop a framework, you develop your own storyboard and you get your stuff there. You develop your go to 5 questions. Because for me, like I have probably, maybe 8-10 charts and graphs that I use all the time. I already have those in my head. I can throw stuff into them and just go with them. I don’t have to think about those because I already have them. And if anybody who has done reporting sure you guys both have your favorite charts and graphs that you like. So you already have those, you don’t really have to develop them because you already have them and know them. Now if you develop your own…

Carlos: I don’t say, you know, showing the lazy man that I am, right? I mean, 99% of the time I’m taking the default, the default theme, whether that’s in Word, right. Whether that’s in like, you know. So whatever the chart is, I mean, like Excel might give me a couple of options but I was just like, “Ok, I want to take that one and then if they don’t like it then I’ll change it, right?” I admit that, and I think maybe to your point that idea of developing a process around how you’re going to create them and then coming up with some of your favorites. Once you’ve kind of gotten couple of winds there you can go back and make that process a little bit faster.

Jonathan: And you know what thinking of that too, what I’ll do, and I’m committing this now right on air so I have to do it. I’ll write a blog post of like let’s say 5 questions. A quick process that a professional can go through and may take them 20 minutes to gather these answers that will help frame a quick storyboard to build into their platform. It would be basic that you can extend it yourself but I’ll create a couple of questions that you can have and take in that way to help you get to that point so that you can eliminate a lot of the rework that we typically have. I’ll do that and put it in my blog.

Steve: Yeah, that would be great. If you give us the link to that we can include that on the show notes which will be at sqldatapartners.com/visualization.

Jonathan: Ok, yeah, definitely.

Steve: Ok, so now that we’ve talked about colors. What about people who are color blind? How do you handle that?

Jonathan: Ok, that’s actually a big big thing. And it’s actually dear to me too because I grow up with people in my family who suffer from Color Value Deficiency. And one of the things, I even remember as a kid and things that you would never think about unless you actually live with them or actually experience them in day to day life. Like one of the things are cakes, birthday cakes. I remember a birthday cake. We were doing a birthday cake and we have to actually stop and think about the color difference between the cake and letters on it because the person we were creating the cake for wouldn’t be able to see the difference. I was probably like 8 or 9 years old and I was like, “Wow, that’s…” So at an early age it was impactful for me to be able to see that.

Carlos: What are the stats on colorblindness?

Jonathan: I believe it’s something like 1 out of every 8 men are colorblind. 1 out of every 12 men, 8%, so yeah. 1 out of every 12 men and 1 out of every 200 women in general are colorblind. So if you take a SQL Saturday with 30 people in the room, there’s 3 people at least in that room that are colorblind. And then too, there’s not just one type of colorblindness. There are multiple spectrums. There is the Color Vision Deficiency, there’s different types of Protanopia. Those are the three major ones. But there are like Green Cone Deficiency. One of the things that we hear too when we think of colorblindness, we think that the person can’t see the color red but they lose the whole spectrum. So somebody has a green deficiency they lose the whole green spectrum. They could lose the whole red spectrum. And one of the things that I show in one of my talks is that, and I do this purposely because like as I said it’s dear to me, is I show people what other people see because until we actually see how other people see things we can’t really empathize with them. So I’ll take a picture and I’ll show it in the different types of spectrum. I’ll say, “Ok, this is how we see the world.” Then I’ll show, “This is how they see the world. This is how this person sees the world.” And they’re like, “Oh my God I didn’t think about that.” There’s actually some cool things that we can do though. Outside off the bat we can immediately, we can use shading and gradients which are tools that are already available to us to be able to show differences. We can avoid using reds and greens together when possible. But there are other tools too, like one is the ColorLab. And I will give you guys the links so that you can put it with the notes, the ColorLab. The ColorLab is actually pretty cool because you can go through and you take your company’s color palette because that’s a lot of thing too. A lot of companies have color palettes that they have to put their reports in these palettes.

Carlos: Right, they have to meet the criteria, sure.

Jonathan: Right, so they probably better use shading and gradients. But you can put the colors in here and you can go through and change the color spectrum, the CVD spectrum and see how people see these colors truly. And there’s another site, color-blindness.com, it’s called COBLIS, it’s the Colorblind Simulator. And so what I do is when get done with my visualization whether it’s my Reporting Services, PowerBI, Tableau doesn’t matter. I’ll screenshot it and I’ll upload it there and I’ll look through their simulator to make sure that I can still see what I want to show. Does my report, does my visualization still show the story that I want to tell so, am I loosing tone because I lose the color? And if I am then I can go back and change it. Those are quick things that we can do to make sure that we include people from the CVD spectrum because one of the things is that as time goes on we have more and more data and everything is becoming “datafied”. You’re going to have more and more people depending on data. And the one thing that you don’t want to do, right now it’s not written into the ADA, the American Disability Act, but I can see it being used later on because somebody say’s they can’t do their job because you don’t show them a CVD friendly report and they can’t do their job. And that’s actually something that is simple as changing the colors. You could be potentially liable in the future for stuff like that. I see those types of cases coming because I see lawyers chasing stuff like that. That’s a whole other spectrum to chase and make money. To hedge is bad. What you don’t want to do is to be the person who developed that report that got the company sued for $10 million. Right, I mean.

Carlos: No, that’s wasn’t me, that was the last guy, right?

Jonathan: You know, that’s stretching it, that’s an extreme outlier but the possibility exists. So just to prevent it let’s include them anyway. Because then too you will never know who you’re affecting. If you can make somebody’s day by including them without actively going out and searching them that makes somebody feels good. At the end of the day you can help somebody out. That’s what this is all about anyway, right? So 5 more minutes of your time to help include somebody who has CVD, that’s cool.

Steve: Who knows that person you’re helping out by making it look good for the colorblind is maybe the decision maker on whether you come back for that next project as a consultant.

Jonathan: Definitely, yeah.

Steve: That could make a big difference to your future as well as or my future.

Jonathan: Right.

Carlos: So what are some of the tools doing to help us with some of that, right? Again, I’m a lazy knuckle dragging Neanderthal. Alright, I want to kind of out of the box to be that way. What are the tools doing to help me there?

Jonathan: Well, I’m interested to see how PowerBI is going to directly address that. One of the things that you can do right now and like Excel and PowerPoint, and PowerBI as well, you can change gradients and stuff like that. I mean, right now, you got to be actively thinking about it. I could see in the future having a switch that if you have CVD, you could hit the switch and it would change for you. Actually that would probably make a lot of sense to do something like that because that’s your easy button, right? I don’t know if they’re going to do that but I could see them doing that because that’s simple. They could say, for this color, this RGB color we could show this color because they have the budget to be able to delve into that and figure out what works for who and what works for not. I could see them doing something like that in the future.

Carlos: Right. Yeah, just translating that so they do have the different colors like you mentioned the reds and greens where it just translates to gradients or shades of blue or whatever.    

Jonathan: Right, because the nature of the PowerBIs and the Tableaus stuff like that and then with the HTML 5 enable you to do stuff on a fly because it will then be just a quick button that will let you change the styling or shade something like that change the colors on a fly. Obviously a Reporting Service report is static. It would be able to do that. But the new HTML 5 reports will be able to do that. You know what I mean? I could see that coming in the future. We don’t know yet but they’re actively working on it so we know that they know what’s the problem and we will see something soon.

Carlos: Right, now will you talk a little bit about, again, kind of data everywhere, right. And you know SQL Server 2016, the integration with PowerBI and Mobile Reports. We may already feel that there are enough reports out there but it seems like that’s only going to expand, right? Will you talk a little bit about, you mentioned the dangers of datafication. I mean, you feel like the role of reporting and the need to go through this is just expanding. We’re now just starting to kind of get into some of this?

Jonathan: Oh yeah, this is the forefront of it. Like we’re datafying everything. We datafy everything from our footsteps, to our calories, to our heart rate, to traffic on the road, we datafy everything because we like to be able to see and analyze things. We want a quick answer. We want to be able to say, “Oh, I get this.” And it’s good though. It lets us make changes and decisions we can plan. They’re using it to be able to track diseases. They can track the spread of Zika virus using data. That’s amazing stuff you can do but there’s dangers in that though. There are extreme dangers in that. We need to be careful especially with us being the person that’s, the people that are actually showing the data because it falls back on us. I’ll give you two quick examples too. One is a mechanical issue and the other one is a human issue. And so at the end of the day as data professionals we love the term garbage in and garbage out. We live by that. So thinking along that I’m actually writing a blog post about it now so the story will be in there as well. I had a client that was a retail client and they want to track store attendance with a mechanical piece. And as every time someone came in it will track and count so they would know how many people came in and out of the store. So they would do everything, they would base stuff on store bonuses and stuff like that based on that number there. And they would based even their finances is bases on that attendance in the store because they could do store attendance divided by sales and all that kind of stuff like that. So as I was working on one of the analytical project there I was interviewing one of the business people, well I was interviewing the business people in general, and I begin to hear them call it the random number generator. And I was like, what? But it was known that the numbers we’re wrong and it was a joke. I’m like, wait a minute. Your basing your business on something you know that’s wrong. They like, “Yeah, we know it’s wrong.” I’m like, that’s not a problem? Everything they were doing in the future was based on these incorrect assumptions so the analytics was right, the visualization was right but the source was wrong. So you have garbage on the end because your source as wrong so that’s one of the dangers as having an incorrect source, having low quality input. The other one is the human factor and this is little touchy because how it transpired out but we’ll use the election. One of the things that we saw up until the point of the election, a week before the election, most major poll showed Hillary Clinton winning between 3-6% points. You know, ok that was great. And she had based the fact that, she didn’t have a concession speech because the numbers says she could not loose. Obviously we know how that went out. So the first thing that they went back to, they were like, “The polls are wrong, the polls are garbage.” So that reflects back on people like us writing the report so that will be the analog of we wrote the report wrong. No, we didn’t write the report wrong, the data came in wrong. What end up happening was it caused a social desirability bias. People says stuff what they think you want to hear not what they’re really are going to do. So people were saying that they were going to vote for her when obviously they didn’t. But like I said we won’t go into that, that’s also psychology as well. I told you I’m a psychology junkie. But see even with that, people knew we collected the data, they did the full random sampling. You have faces like the Gallup poll and USA Today poll so they’re doing all the stuff that’s correct. They’re doing the proper distribution of people, the proper distribution of the population to be able to get exactly what’s needed to be able to give accurate results and stuff like that. They go through, do all these and have the proper margin of error, you know, everything, right? And they showed, “Ok, 3-6% she should win. Ok, great let’s work on that.” And then that doesn’t end up happening because the source was wrong. One of the things that I want to caution is as data professionals we need to make sure that we’re comfortable in our sources. Is our source correct? Was there a bias in collecting it? Were there errors in the machine that collected it? Was there possibility of errors coming in with the data corrupted coming in, all kind of stuff like that because at the end of the day…

Carlos: Are we looking for the number that will help us prove our point rather than looking at the numbers a little more objectively to help us answer the question.

Jonathan: Yeah, I think so and that’s a great way to put it. Are we, in essence that’s another bias. Are we using the numbers to prove our bias or we’re using the numbers to disprove our bias and to show the proper thing because that’s the end. Like you said, they were using the numbers to prove their bias because they wanted her to win so the numbers proved that so that works let’s go with it. What nobody ever said was, are people truthfully telling the story that’s actually happening. Are they lying to us? Are they truthfully going to vote for her.

Carlos: Right. Man, well some interesting stuff and I always see, I think. This is not going away, right? The reporting options are not decreasing that are available to us.

Jonathan: Not at all.

Carlos: And so I think it will be kind of interesting to see how some of these plays out. I’m interested to start playing with even some of the colorblindness components. Because when I heard those numbers I thought, “Wow that is much higher than I would have guessed.”

Jonathan: One of the things too with the influx of data and how it’s going to become. I could see these principles being used for everybody from the Administrative Assistant to the Executive Director and CEO and stuff like that because everybody at some point is going to become natural to be creating reports. We need to make sure are we comfortable with the source because it’s not just going to be us creating them in the future. As more and more self service tools come out, you know, the iPhone 15 is going to have a way to show reports of your tweets and stuff in real time. You know, Facebook is going to be like, “Hey, show my data.” I mean, I can see that. That’s coming because it is just the next natural evolution of it because as more and more people understand how valuable data is the more and more people are going to show their data. More and more people are going to show what they’re doing with their data. I mean, think about it. Even your FitBit, you know the FitBit, get your own reports so that you can share with your friends. You know, “Look, I had 10,000 steps today. I’m averaging 9.2 hundred steps a week.” You know, stuff like that. So even those small stuff, even now we’re beginning to see people wanting to show their own data. So it’s just going to get more and more in the future as we get further along.

Carlos: So that’s an interesting point. And I think ultimately as data professional we can continue to provide value there if we’re willing to educate people on how to do that and do the best practices rather than I am the creator of the reports, here’s some guidance on how to do that.

Jonathan: And I think that’s the big thing at the end of the day is willing to see control for the greater good. Because at the end of the day you’re more valuable when you provide true benefit to the enterprise than just you know, “I created this”. If you can show more people to do it better than just having, because you’re a bottleneck no matter how good you and me, we are bottlenecks. But if ten of us can do it then we’re better. That’s how I see that.

Carlos: The value there again is an understanding the data and kind of where it comes from and then the presentation that’s like the final piece. You can push that down a little bit and just have that be taking off your plate as the tools become easier and what not. Yeah, good stuff.

Steve: I think what is interesting on that too is that oftentimes the visualizations or the reports that are shown will drive behavior. And when you’re asking someone to build the report that’s going to drive behavior oftentimes they don’t know the complete understanding of that behavior. And I think that goes back to what Jonathan said in the beginning around the story, building the story and storyboard. But if that is pushed out to everyone who’s able to get to the data and build their own visualizations then people really have to build their own stories in that case so that they’re showing the right data. Otherwise, you’re going to end up with data that’s driving business decisions that’s all over the spectrum. One person’s report may contradict another’s based off of their biases or approach to creating that.

Jonathan: Correct. And another thing too with reports this is something that I learned a long time ago too is that a report does not have to answer your question. Sometimes a report makes more questions which eventually lead you to the right answer. That’s something that lot of us do is that we want to report to solve a problem. Of course it doesn’t always have to solve a problem. Make them lead to more questions and that’s progress. That’s what I was thinking about.

Steve: Yeah, that’s one of the things I love with reports is when somebody looks at it and say, “Well, why is that number that way?” And you have to do the research and figure out, “Oh, well it’s because sales are off in that region” or whatever it may be. And then you’re able to fix a business problem because of those questions that came up.

Jonathan: As a consultant when I’m talking to clients that’s what I tell them too. Don’t be afraid to have your results posts more questions because that’s good. You’re getting further into your data. You’re learning more about yourself. Everything doesn’t have to have a quick answer. You know, that’s our society, we want quick answers. But everything doesn’t have a quick answer.

Carlos: Well, awesome Jonathan. Thank you for the discussion.

Jonathan: Yes, thank you. I enjoyed it. I love talking about stuff like this. You could can geek me out for hours for psychology and stuff.

Steve: This has been a lot of fun.

Carlos: Shall we go ahead and do SQL Family?

Jonathan: Sure.

Steve: So Jonathan how did you first get started with SQL Server?

Jonathan: I have a funny story with that. I was working on a help desk and I was trying to figure out what I want, I know I wanted a career in IT, because I wrote my first program when I was 6 years old. I was one of those weird kids. I had that Radio Shack TRS-80. I’m taking it back, right, showing my age. So as 6 years old I wrote my first program so I had been doing IT stuff all through school and stuff like that then got a job at a help desk and I knew that wasn’t really what I wanted to do. So I was doing research and research and end up finding, you know I was like, “I think this database thing is kind of cool.” So I went and sought out the DBA at the company and I work 2nd shift. He worked the regular 8 to 5. So I would come in everyday at 8 o’clock and sit with him until I had the work at 2 to learn SQL Server from him. And he was willing to actually, his name was Carlos too which is funny.

Carlos: Great name.

Jonathan: Yeah. And he was willing to sit with me and teach me SQL Server, and that was SQL Server 6, 6.5, that’s taking back too. They saw my drive and determination with that and the created me a Junior DBA role. And that was my first job as a true SQL Server professional, I was a Junior DBA at that company, and that’s how I got started. That was my first exposure to SQL Server and ever since then I just love SQL Server. I’ve been doing it for, if I tell you how many years ago you’ll know how old I am, so 17 years later this where we are today.

Steve: That’s impressive, going out of your way to sort of work the extra hours so just you can learn something new. If more people did that I think we’ll be in a whole different world.

Carlos: That’s right. Not the typical path, right, kind of falling into your lap, you know with me. It’s more of a, hey that’s pretty cool and let me go check that out. Now, in all that time with SQL Server, if there is one thing that you could change about SQL Server what would it be?

Jonathan: The one thing I will change is that they’re actually changing it. It was, you know, the staticness between versions and fixing things. We went to that whole dead zone, the desert between 2008 R2 and 2012, and that was like, “Oh my god”. You know, trying to figure out, and there were some great things professionals found. Ways around, solutions around problems but that was one of the biggest things between there and there. And then 2012 to 2014, ok that’s cool but to see them how faster change stuff like VNX is in CTP1 now. Is that what it is? They are at a rate where they could possibly even release VNX this year. That’s crazy and that was actually what I’m thinking was, what’s to be able to make changes faster on a fly is the community. And you know, users will say, “Hey, do this. Can do that?” You know, we’re getting that base for PowerBI seems like every week. It seems like every time I open up Management Studio there is another update for that. I mean, they’re doing the things that I would like to change and it’s changed faster. So that’s really cool to see as exciting. I can’t wait to see the future. I mean, imagine SQL Server, well the changes they’re making in just 3 years, 2-3 years it’s going to be amazing what’s coming.

Steve: Alright, so what is the best piece of career advice that you’ve received? So for me the best piece of career advice is based on, for me I had struggled personally in just being comfortable on who I am. And I remember one of my managers, his name was Keith. And me and Keith are still really good friends. I talk to him more of the time, I still do. You know, Keith told me, “Just embrace who you are. Be who you are. Don’t try to be somebody else.” And what’s funny is that’s actually the beginning of my logo. You know, you guys see my logo it’s my hair. That’s me literally embracing myself. And so that was the best piece of advice that I got because once I truly embrace who I am, my strength and my weaknesses. Still trying to fit into a box that I thought somebody else wanted me to be that’s when I begin to flourish. That’s when I begin to be truly happy. I’m truly happy because I’m being who I am. I love to share, I love to talk, I love to teach, and just embracing who I am. The best piece of advice I could give back to somebody is to tell them, “Doesn’t matter who you are. Whether you’re short, tall, you are good at this and good at that. Be who you are, embrace who you are.” And that will be your gift. That will be what people will gravitate towards you.

Steve: Sounds like great advice.

Carlos: You know, that’s right, that whole idea about networking kind of finding your own people. I think all too often when we come out of college and just like, “Oh, I need a job, right, so I’ll go anywhere. If we can identify what it is we want to do or place we want to go after. I mean, even this idea of colorblindness and making reports better to help that segment that kind of speaks to the work that you want to do and just a matter of finding where that’s important to other people as well.

Jonathan: Right, definitely. I think that’s even, actually even people taking that project and run with it. Someone selling a service to a business saying, “Hey, you know, we can convert your reports to.” Somebody say it’s like that you can sell that service.

Carlos: Our last question for you today Jonathan. If you could have one superhero power, what would it be and why do you want it?

Jonathan: So my favorite superhero doesn’t even have any superpowers. You can guess who it is?

Carlos: Batman?

Jonathan: It’s Batman, yeah. And what I always love about him is the detective comics because he uses his mind and that was a superpower. He just outsmarted people. Obviously he was physically fit, right. I mean, the ability to get to the in-depth of stuff. I love that investigative and that’s kind of like what I want to do anyway. Get to the problem, get to the root problem. Figure out what’s really going on. If anything I would want to be just Batman that’s my. I don’t need super speed and super strength. Nah, I just like the super mind and call it a day.

Steve: You know the other cool thing with Batman is he had pretty much unlimited budget, and he had all the toys, and tools, and gadgets that he wanted to do whatever he needed to.   

Jonathan: That is true.

Carlos: That will be a lot of fun.

Jonathan: Yes, I guess you could say that’s also super power too, right, the unlimited money.

Steve: Yup, right.

Carlos: Having a kin business sense or at least having a dad maybe who had one.

Jonathan: Right, right. That would definitely help.

Carlos: Well, Jonathan, thanks so much for being with us today. We do appreciate it.

Jonathan: Yeah, once again I thank both of you guys. This has been an honor to be able to do this. I like to thank the community for even, having you guys available to do this so I look forward to what’s coming in the future. Look forward to seeing you guys again at the SQL Saturday being able to panel and stuff. And I thank you for the opportunity and the time. You guys have a great day!

Carlos: Yup.

Steve: Thanks Jonathan, it’s been great.

Episode 82: Indexes Part 2

In the previous episode we discussed the basics of using indexes, what they are, how to use them, along with some common problems and solutions. One of the problems you will certainly face as time passes is index fragmentation. While indexes will certainly help your database perform well, they do require care and feeding–meaning there is some administrative overhead.

In this episode, we continue our conversation with Randolph West and discuss what we can do to take care of indexes and keep them well maintained over time. There is an option to use default maintenance plans and to do fragmentation and updating statistics, but should you use them? What are the alternatives?  We didn’t forget to consider strategies for deciding which indexes to keep and which can be dropped.

 Episode Quote

“That’s purely Microsoft’s fault because generally, people are using the maintenance wizard are using it because they don’t know how to do it with [other] tools.“

“You still need to update the statistics after you’ve done a reorganization which is something that people tend to forget”

Listen to Learn

  • General maintenance needed to take care of indexes and keep them well structured on an ongoing basis
  • Good strategies for deciding which index is get included and which ones we need to ignore
  • Should you use index maintenance wizard
  • What column store indexes are

About Randolph West

IndexesRandolph West solves technology problems with a focus on SQL Server and C#. He is a Microsoft Data Platform MVP who has worked with SQL Server since the late 1990s. When not consulting, he can be seen acting on the stage and screen or doing voices for independent video games.

Transcription: Episode 82

Steve: So then let’s jump a little bit more into the care and feeding side of things, and I know we touched a little bit on rebuild and a little bit on statistics but if we look at what are the general things that we needed to do to take care of indexes and keep them well structured on an ongoing basis?

Randolph: I’d be interested to hear your side because I’ve been speaking a lot. I’ll open if I had any feedback.

Steve: Ok, I mean, so really I think on that there’s the concept of reorganizing versus rebuilding as you get fragmentation over time. And then there’s statistics and I think that, or let me start with a very bad way of doing it that I see quite often that is if you go and use one of the default maintenance plans. And the maintenance plan you just go and click let’s rebuild my indexes, let’s reorganize my indexes, then let’s rebuilt statistics. One of the things you end up doing is rebuilding which basically rebuilds the entire index from scratch without, basically if you’re in Enterprise Edition you can do that on online mode but if you’re not in Enterprise Edition that’s an offline operation. But what the default maintenance plan people sometimes check is that they say, “Let’s rebuild them.” And that gives you the best structured index you’re probably going to end up with and it also rebuilds your statistics. And then they reorganized them and then they rebuilt statistics with like a 5% sampling or something like that. And in that example you end up with a whole lot of work that’s actually giving you sort of a negative approach or negative impact. Whereas if you just rebuilt them to begin with and then not done the reorg, and not done the statistics rebuild afterwards you would be in a lot better shape.

Randolph: So, you know, those defaults that you’re talking about, the reason that happens is that because the maintenance plan wizard has got rebuilt then reorg, then statistics in that order in the list.

Steve: Absolutely, yeah, people just click the checkboxes or select them and that’s what you get.

Randolph: So when I see that I blame Microsoft for that. That’s purely Microsoft’s fault because generally people here using the maintenance wizard are using it because they don’t know how to do it with say, Ola Hallengren or MinionWare tools so just to name two, there are many. So that’s Microsoft’s fault and every time I see that I do not blame the DBA, or the accidental DBA, or whoever is managing it. It’s not their fault. But it is causing a whole bunch of work that is unnecessary.

Steve: Yes, it may not be their fault but it’s their pain they have to deal with. When I see that I generally going a little bit of education to explain why that’s bad and turn those off. And like you said, replace it with MinionWare or Ola Hallengren scripts. Both which are far better solutions at doing at than the default maintenance plans.

Randolph: So the one thing that I wanted to mention, Steve, is that an index rebuild is, as I mentioned before is size of data operations. So if your table is 10GB and you need to rebuild it, if you’re doing a clustered index rebuild it’s going to require 10GB of space in the data file and then it’s going to require at least 10GB in the transaction log. And if you’re rebuilding a clustered index it is going to by default rebuild all of the non-clustered indexes on that table that referred to that clustered index and that means all of them. So if you’ve got a 10GB table and you’ve got 30GB of non-clustered indexes on that table it’s going to require at least 40GB in the data file and 40GB in the transaction log because it’s going to rebuild everything. So if you’ve got a maintenance plan that says let’s do a clustered index rebuild and then before that you had the non-clustered indexes rebuilding it’s going to do them again anyway so it’s a nightmare. This is why I say it is statistic updates more regularly are going to be a lot more effective than doing index rebuilds over the long term. You still do need index maintenance but let’s not do it every night.

Steve: And that’s where I like the third party solutions that we mentioned there for the ability to rebuild those where you have options to say given a certain level of fragmentation I want to choose to reorganize this or if it’s even more fragmented then we’re going to choose to rebuild it. And some of them even have the option of only do this for a certain amount of time.

Randolph: Can you go into detail what’s the difference between a rebuild and a reorg is?

Steve: Ok, so with the rebuild it’s basically, technically it’s like the equivalent if you drop the index then recreate it, and then it’s going to be completely rebuilt from scratch and it will take up the log and disk I/O like you talked about a moment ago. But with the reorg, what that’s going to do is going to the index and basically restructure or compact the index pages as needed. It will go through and shuffle them around to compact them down to a smaller amount of space. And it will do that page by page as it goes through there and that can be done while things are online in Standard Edition. But it’s slow and the end outcome is not as good as a complete rebuild. It might get you real close but it’s not going to be good as complete rebuild but the benefit is you can keep it online.

Randolph: And you still need to update the statistics after you’ve done a reorganization which is something that people tend to forget as well. The statistics need to be up to date so that the query optimizer knows where the data is and how much of it is in there, and what the distribution of the data is in that index so that it can make the right choice about what type of query to create because a lot of queries is coming down to SORTS and JOINS and that’s where you find your biggest performance bottlenecks. A SORT is a blocking operation so everything in the query has to stop until the sort is finished. So if your index is up to date and it’s sorted correctly then it doesn’t have to do that operation it’s already done. Whole bunch of things about loops, we can talk about here as well which I’m not going to. An up to date statistic is much more valuable to the query optimizer than a defragmented index.

Carlos: There’s another point there even that, we’ve talked about current feeding of the indexes, right, rebuilds vs. reorgs. However, we can also have statistics that are outside of the indexes, right, on the table. Those are also providing feedback to SQL Server to the optimizer. And if we neglect those we’re still kind of the same peril because the optimizer maybe using the statistics but we’re not giving them that current feeding.

Randolph: Correct so, it’s valuable to have a, I hate this word, holistic approach to your current feeding is that you need to look at not just indexes, and not just fragmentation, and not just statistics but all of them. And in a way that is effective and efficient for your maintenance window. You don’t want to go and rebuild indexes every night because it doesn’t make any sense. But you also don’t want to stop doing statistics updates.

Steve: Yup. One thing I commonly see with that is rebuilding of the statistics or updating the statistics where somebody will have a job that goes through and blindly update every single statistic in the entire database every night. And that can have a pretty big performance impact on things depending on the size of your database. And a better approach to do that is to look at which tables or indexes actually have changes and only update those that have a fair amount of changes to them on that frequent basis. And then maybe on a regular basis of once a month or once a week you do an update on across all of them.

Randolph: Yeah, that’s a very good point. The amount of data that changes in a table if you’ve got automatic statistics enabled, we’re going to go a little bit of a segway now, by default automatic statistics are enabled. There are other settings that you can choose in 2016. There’s a whole bunch of new stuff per database which you can choose. Let’s talk about everything before 2016 because that’s what most people are still using. Statistics will only update themselves if a certain percentage of the table has changed. The value has changed across versions recently but the fact is that statistics will only update themselves if a certain percentage has been changed. So if you’ve got a really large table with tens of millions of records or rows and you have only one million that have changed recently statistics will not be updated automatically. So it does pay you to use one of the third party tools we’ve mentioned like MinionWare or Ola Hallengren’s maintenance solution where it will go and check, has any number changed and then you can update the statistics there and that will help your query optimizer as well.

Steve: The other side effect you can get with that is that if you do hit that threshold where it decides it’s time to rebuild those statistics that might be in the middle of your peak load during the day. And right when you’ve got a lot of traffic coming to your database and index or statistics rebuild occurred that can have impact on things too.

Randolph: That in fact is one of the recommendations for SharePoint style databases. There’s a lot of things I don’t like about SharePoint but the fact is a lot of people use it so one of their recommendations is to turn off automatic statistics updates on any SharePoint style database. That includes CRM, it includes GreatPlains, all of those even. What’s the one that I used? Whatever it is, there’s a whole bunch of them where you should turn off statistics updates automatically and then include statistics rebuilds in your maintenance plans. So it’s just to keep in mind each database is different.

Steve: It will probably apply to Navision and Dynamics as well.                     

Randolph: Yes that’s the one. Yeah, Dynamics is the one I’m trying to remember. Thank you! So any SharePoint style database they do recommend turning off statistics updates automatically and to do them in a separate maintenance window. So it pays to do your research to make sure that you’re doing the right kind of maintenance plans for your database and your instance.

Steve: Yup, very good point.

Carlos: So I guess a couple of different things we talked about all of these components. And I guess let’s now talk about some of the benefits right, so all of these things we have to go in, right, kind of the holistic approach, kind of having to know our data, getting more familiar with it. Ultimately to what end are we going to do that? I think, so we’ve talked a little bit about performance. I guess we should probably talk about how that performance gets into the system if you will or the mechanisms that cause the improvements?

Randolph: Ok, so when I do my performance session, what I say is, “You’ve got expensive items like your CPU. You’ve got less expensive items like your RAM, and you’ve got even less expensive items like your hard drives.” So your CPU is the most valuable thing to you because the SQL license is per CPU core and you want to make sure you’re using the most efficiencies of your CPU and memory as you can. What an index does is, we spoke about this before, it is a copy of the data so you want to keep your indexes as efficient as possible so that if you’ve got a large table you don’t want your index to be large as well. You want it to be smaller so that less of it is in memory because that’s what this game is about. SQL Server is all about being in memory as much data as possible in memory. So for Standard Edition up to 2016 even you’ve only got a certain amount of memory that you can access. 2016 Service Pack 1, the limit is still there but that’s your buffer pool that is in memory as opposed to everything. But the fact is that there are limits to the amount of memory you can use for SQL Server specifically with Standard Edition because not everybody can afford Enterprise. So you have to manage the amount of data that’s going into memory as much as you can and that is most effectively done by the right kind of indexes for your queries. And that’s also why you don’t want to have duplicate indexes because it will also be in memory. Also, you don’t want to have wide indexes because they will take up more memory than they need to. And that’s why included columns are very handy way to reduce the size of your indexes that’s why we have filtered indexes. All of these things to try and reduce the amount of data that is in memory so that we can do a lot more with what’s in the buffer pool.

Carlos: So then how do we go about or what’s the best ways to determine, we talked about looking at execution plan that kind of gives the recommended index there. So what are some good strategies to go about deciding which index is get included and which ones we need to ignore?

Steve: One approach I take on that is if I’m looking at an execution plan and it suggests an index. If it’s a development environment one of the things I’ll try initially is just create the index and see how it helps and I’ll drop it, and then go look and see is there a similar index. I mean, if the index that it suggested was beneficial I’ll go and see if there’s a similar index that could be modified or added to that would give the similar and a fact that the suggested index was doing. And sometimes that means you’re taking existing index and just add in an include to it or taking an existing index and add in another column into it.  

Carlos: Ok, so that’s kind of a dev environment, one query at a time.

Steve: Yup, and the other approach that I see there is I can even go and look at some of the missing index DMVs to go and figure out. Actually you look at DMVs for a handful of things, one is to figure out what are the missing indexes and figure out the overall cause associated with those missing indexes. And then come up with what are some good ones to add that are not going to be duplicates of other indexes and then see how that improves performance. You can also, using DMVs, go in and look, you can figure out what are your duplicate indexes. If you have duplicate indexes oftentimes you’ll see that one of them might be not being used at all and the other one is being used a lot. You can go in and drop one of those. However, you need to be careful when you’re looking at unique indexes. You don’t want to drop a unique index versus a clustered index or non-clustered index you want to look at and drop the right one there. Another thing to look at is big clustered indexes, that’s another thing that you track down is what are the clustered indexes that are really wide. And that really wide means it’s going to ripple through all the non-clustered indexes with that extra size. You can also track down unused indexes. What are the indexes that are on your system that are not being used? Now when you look at that you need to be very careful because that’s only going to be the ones that haven’t been used since the SQL Server instance restarted. Also, if you rebuild and index I believe it clears those index usage stats.

Randolph: Sometimes.

Steve: Sometimes, exactly. So it’s something that if you see there’s an index that is not being used you might want to track it over a few weeks to a month and confirm it, yeah it really isn’t being used and then go and take a look dropping those.

Randolph: That goes back to one of the best practices is to have a baseline. So know what your system is doing and track that overtime and then you can refer back to that baseline and say, “Well, this is exactly what’s happening. This is different.” And go from there. Yeah, that’s a good point.

Steve: Yup, and that reminds me of a project, Carlos, that you and I worked on. Were we setup a monitoring component that ran for a month and kept track of unused index details, and log them every few hours, and then we went back at the end of the month and reviewed the ones that over the entire month had no utilization and then suggest those as possible candidates to be dropped.

Randolph: That reminded me of Pinal’s comment as well that he’ll only look at the database that’s been running for a week. The one thing I wanted to mention is the DMVs that you’re referring to there’s one in particular, there’s a group of diagnostic queries that Glenn Berry, Glenn Alan Berry, from sqlskills, he uses and maintains them. He’s got a couple of, and so does Brent Ozar and a whole bunch of other people. But Brent Ozar’s one called as sp_BlitzIndex and Glenn Berry’s one is included in his diagnostic scripts. It’s a very similar query and what it does it waits your missing indexes as according to number of scans, number of seeks and something called Average User Impact. Now that Average User Impact number can be, I think it’s unit less, I’m not quite sure how does that number is calculated but if it’s really in a high impact like tens and hundreds of thousands then usually I will go and look at that first and say, “Well, how many times has this index been used in terms of number of seek and scans.” And if it’s a high usage index that is missing or a high impact then I will usually create that without too many modifications if it’s not too wide or doesn’t have too many include columns.

Carlos: I think it’s the number of times requested multiplied by the number of hits.

Steve: Yup, and just to jump in there I think I’d feel a little bit left out I didn’t have the chance to mention Database Health Monitor on that. In Database Health Monitor there are many indexing reports similar to what Randolph has described there.

Randolph: Certainly, from Steve Stedman’s solutions has a magnificent and free tool called Database Health Monitor which I have used. I don’t even get paid for this statement. It’s a good product. It’s free which is even more amazing. This is the great thing about the SQL Server community. There are a lot of free tools that are out there that are adding value all the time. And all that people asked is that you recognized them and I recognized Steve as a valuable member of our community. That ends the sponsorship message.              

Steve: Thank you for the shameless plug.

Carlos: So it’s interesting, I guess, and maybe I’m being a little bit paranoid but using, so when I’m looking at that impact and I will take that and also using the ones that I’m interested in adding. Of course I want to do what Steve mentioned looking for duplicates or kind of what’s there, right? Then am I going to get over my threshold of ten or whatever just kind of taking into consideration what additional load am I putting on this table by creating the index. And while you want to test that out mostly the environments that I worked on they just don’t have a good mechanism for creating the similar load as I have in production. So when I go to implement that index one of the first things I’ll start to look at is those usage stats, right. Because I want the usage stats in the index that I just created to be going up because I know that, “Ok well, wait a second what did I do? Is this still a really good candidate?”

Randolph: Yeah, that’s a very good point.

Carlos: So I guess, other that’s how are going to go out and creating them, couple of other things that we didn’t really get into some of the other new indexes like column store or even XML indexes. But I guess other thoughts about when to start looking at these other, we even try to approach that now. Maybe I feel like we should punt on some of those.    

Randolph: I can briefly mention about XML indexes that the way they work is they’re going   to be larger than your table or your XML column. The reason being is that it will create an internal table according to the structure the XML document or XML field that you’re indexing. So it actually expands out the XML data into an internal table and then indexes that so you could end up with a column that is say 100kb maximum. That’s a small one and you can end up with an index that is 500kb, or gigabyte, or tens of gigabyte because it’s creating an internal table under the covers. A system table that it is then indexing so be wary of XML columns in that if you’re going to index then make sure that there’s sensible indexes that they’re only indexing certain nodes inside that XML document and be aware of that. And also I’d like to add here that if you use the XML data type, the data going into that XML row or column is not going be the same as what you get out. It does modify the XML data going in for whatever reasons.

Carlos: Say that one more time.

Randolph: If you have XML data that you put in for auditing reasons for example. If you pull it out it’s going to have been modified somehow. Either the tags will be slightly different or the spacing will be slightly different so do not use an XML data type to store audited records of XML types. If you want to keep an absolute record of what your XML looked like as it came out of a web service for whatever reason store it in a varchar or nvarchar column instead because in it it is identical. If you put it into an XML data type it will modify the data. It is still the same data but it will slightly modify. The tags will be slight different or whatever.

Steve: So then as far as column store indexes I think that’s something we could probably dedicate an entire episode just to talk about.

Randolph: We definitely can.

Carlos: I think you’re right.

Randolph: A column store index is neither an index nor clustered so it’s so complicated. In fact, there is a series of post. I think it’s Nico who has done it over one hundred posts on how column store indexes work. We could talk for days on that. They are fascinating and completely subverts what you think you know about indexes, and data, and row level storage and all that kind of stuff. It’s fascinating stuff.

Steve: Yeah, and just a few notes on that I think that it’s one of those things that is great for data warehousing or OLAP type things. And may not always be the best option for your OLTP side.

Randolph: At the same time you could have reporting style queries in your database and with 2016’s optimizations for column store you could have which can change and can be clustered and all sorts of stuff. You could have some stuff that in your OLTP environment that could be OLAP style indexes. Yeah, there’s so much to think about there.

Carlos: We saw that in Episode 78 with Brian Carrig. They talked about using their column store index in a transactional environment.

Steve: Yup, and then one of the things that I’ve seen that is one of those misunderstandings around column stores is that if you, because column store does the compression on each column, is that people think of it as I don’t have to have any non-clustered indexes if I have a column store index. And that’s not true at all. And that if you’re looking at a bigger table and you’re using column store you may still need to have some non-clustered indexes on it as well.

Randolph: Oh yeah, so do you research. Read all one hundred and whatever post. Become an expert then implement them.

Carlos: So we’ve talked a little bit about, so implications, how do we know what to use. So space considerations we talked a little bit about more in the rebuild process, right? That we’re going to need additional space in our data file and our log file things like that. I think we’ve had a pretty good conversation there. I guess one last idea I’ll throw out that we can bat around a little bit. We talked a lot about having multiple TempDB files and so one of the thoughts that I’ve seen out there is if you’re potentially, I won’t say, so what’s the value there is that if you’re looking to spread out or move your indexes from where your data might reside. One way to do that is to create another data file potentially on a separate storage and then rebuild those indexes.

Randolph: Ok, there are two things that I can talk about there. The first thing I wanted to mention is if you do need to move your table into a different file or file group all you have to do is do a clustered index rebuild and target that file group as the new location for the clustered index because remember the clustered index is you data. So that’s a handy way if you need to move your table into a different file group that’s how you will do it. The other thing is by virtue of that definition there is that you could have your non-clustered indexes in a different file as your data and you might want that for improved I/O performance or if you’ve got your read-only stuff in a different file group or all sorts of reasons for that. It’s very handy for splitting your load on the I/O level. Less of a problem these days but it’s a nice thing for large tables to split your non-clustered indexes from your clustered index so that’s not reading from the same portion of the disk or the I/O subsystem or both.

Carlos: Right.

Steve: Just to note on that index rebuild for a clustered index to move it to a different file group although that will move the table and the entire index. One of the things that could be left behind when you do that is any of the varchar max or nvarchar max or other large items that are located outside of the data pages for that table.

Randolph: Yeah, off row data will be affected. Steve, I don’t know because I’ve never tried. What happens if you tell it to do the index rebuild with the log compaction enabled.

Steve: You know, that’s a good question. I have not tried that for a long time.

Randolph: I smell a blog post.

Steve: Yup.

Carlos: Ok, very good. Well, awesome. I think great conversation and obviously there’s a lot more that we could talk about indexes but I think this was a noble attempt at covering some of the basics and getting into some of the nitty gritty as well.

Randolph: Yeah, the fact is that indexes don’t stand by themselves. They are a very important part of everything including statistics and everything else so don’t think that once you became an index expert you become an index performance tuning expert because that’s not true. You have to have a very broad knowledge of how things work in a number of different fields used upon to get the best performance out of your system. And there’s nothing wrong with good enough. You don’t have to have 100% defrag indexes. You don’t have to have indexes rebuilt. You can have them reorganized. Don’t have to have them reorganized at all if your statistics are up to date in certain contexts. There are a lot of tradeoffs that you have to think about when doing your maintenance plans and indexes form just a small part of that.

Steve: Yup.  

Carlos: Great.

Steve: Very good point.

Carlos: So shall we do SQL Family?

Steve: So how did you first get started with SQL Server, Randolph?

Randolph: Steve, that’s a good question because when I originally got this question I had to think very hard about it. I have been fascinated from a personal level about organizing data. Back in my youth I would create lists and lists of things CDs, books all that kind of stuff and I would do them in the paper equivalent of Microsoft Excel. And then when Lotus 123 came along I started playing with that because my dad worked at the bank and he was a 123 expert. If I’d know about VisiCalc I probably would have used it. And then I got into Microsoft Access because it was there. It was a very old version of Access. I think it was version 2 that I started with and then I started working at a PeopleSoft implementation partner in South Africa where I am from. And that’s where I first work with SQL Server and Oracle at the same time, and I was not a fan of SQL Server. I have to be honest. At that time it was version 6.5 and version 7 that’s around the time I joined so there were issues because as some of your listeners may know SQL Server’s engine was rebuilt around 7 time, 6.5, 7 time. In fact, if you look on MSDN you can download 6.5 and you can download 2000 but you cannot download version 7. It is very difficult to come by. There are maybe good reasons for that. I got exposed to SQL Server that would have been in 1997, around there. So yeah, that’s where I first got exposed but then I didn’t really play with it until I got to the bank. I was more of an Oracle guy. Got to the bank in 2006 so there was a large gap of not playing in SQL Server and then I couldn’t figure out why DESCRIBE, the keyword DESCRIBE wasn’t working in SQL Server. For all of you Oracle office you notice there I prefer SQL Server now because as I said before a lot has changed. It is a much better product than it was in 1997. In fact, I think it has surpassed Oracle. I think it’s better than anything else as well and that’s because I’m trying to keep abreast of all the new stuff. I don’t want to be stuck in the past and have assumptions about the product. I want to play around with the new stuff so. That was a long way of saying 20 years or so.

Steve: Ok. Well I know that in the late 90’s Oracle was a much better product than SQL Server.

Randolph: It really was.

Steve: Yup, and that has changed. That has changed significantly in the last 17 years.

Randolph: Yeah, it has.

Carlos: If you could change one thing about SQL Server what would it be?

Randolph: I had a pot for this but I’m going to change my mind. The pot answer is in-memory processing on SQL Server, in-memory OLTP which was called Hackathon, that was the code name for it. There is a transaction log operation that happens even for stuff that you don’t want kept. It’s still being backed by disks somewhere. It may not be in the transaction log itself but it is still being backed by disk somewhere. I would like to have in-memory objects that do not have anything on disk backing them at all. They must just be there, and if the server goes down, oh well tough luck.

Carlos: Alright. Yes, I remember seeing you in the Bob Ward presentation at SQL PASS talking about the in-memory OLTP.

Randolph: There’s some very very cool stuff with in-memory OLTP that I’ve been playing with. I’ve been blogging about it as well on my blog bornsql.ca. That speaks about some in-memory ops and I found. Interestingly I found another issue that I’ve spoken to Jonathan Kehayias about where Temp tables are slower than Temp variables in certain cases. So I’m investigating that at the moment and it had to do with my in-memory investigations.

Steve: Oh interesting, I’d like to see what you find there.

Randolph: If I could change, that was my pot answer about SQL Server what I would change. What I would change is having a TempDB per user database as a configurable extra.

Steve: That would be nice.

Randolph: In other words it doesn’t have to be default but if I need a TempDB for a particular database I would like to have that as a separate entity so that I could manage it separately and keep track of the objects that are happening for that particular database. The architecture for that is going to be really difficult to do so I don’t know if Microsoft is going to do that but that’s what I would like.

Steve: Alright, so what’s the best piece of career advice that you’ve received.

Randolph: Oh, it’s going to be along answer as well, sorry. “Let your work speak for itself” is the first and foremost one. So it doesn’t matter what people say about you if your work can speak for itself then you don’t have to say anything. That’s the best piece of advice that I had. But the other is “Everything you say and everything you do is being recorded somewhere” so please treat people with respect. Treat people as you would have them treat you and don’t say things on voicemails that can be played back to you at meetings with the CEO and saying Randolph don’t do that.      

Steve: I like that too.

Carlos: We won’t ask for the personal experience that led you to that advice.

Randolph: I have at the end of the year or every now and then when it’s a quiet time on my blog I say career limiting moves and that was one of them. So there are a bunch of career limiting moves that I can recommend to you. Don’t do because I’ve done them not because it’s good advice but it’s because I’ve done them so don’t do that because I’ve done it, and I will be one of them.

Steve: Alright.

Carlos: Our last question for you today. If you could have one superhero power what would it be and why do you want it?

Randolph: I thought about this and I honestly don’t know. I liked Pinal’s when where he could touch a server and know what’s wrong with it. The problem is whenever I touch as server if it’s going to break it’s going to break then and there. Maybe I want the reverse of that superpower.

Steve: To just know without touching the server.

Randolph: Yeah, you know per cost of maintenance is one thing. But if I’m in a room and it is going to break it is going to break while I’m there which is good I guess in way because you then you know that we can figure it out then and there. But I’d like the reverse of that that would be cool.

Steve: Ok.

Carlos: Well, awesome. Randolph thanks so much for being with us today.

Randolph: Well, thanks for having me it’s a topic near and dear to my heart indexing and maintenance and stuff like that so I’m glad I got an opportunity to speak to you guys about it.  

Episode 81 Indexes Part I

Part of what we like to do on podcast is to present topics and discuss different ways you might use features. In this episode, we go back to basics and discuss indexes. Like most topics, we made sure to explain all the important aspects. Because this is a such broad topic, we decided to carry this over into our next episode as well. Using the phone book as an example, we chat with our guest Randolph West and explain heaps, non-clustered indexes, and clustered indexes. We also explain best practices to use them when creating new tables.

We didn’t forget to cover the problems you might run into. For example, you might not think about indexes until your data grows very large. Sure, you can attempt to resolve the problem by moving to SSD’s and increasing memory, but this will push the issue further in the future. In this episode, we will discuss considerations for indexes and why database architecture is so important.

We would like to hear your opinions. What are your strategies when using indexes? Use the hashtag #sqlpodcast and let us know!

 Episode Quote

“The clustered index should be chosen very carefully because your entire table is going to be sorted according to that column or inside of columns and what you’re looking for is a unique value for every row.“

“All that an SSD is doing is making your problems go away until a later time. The problem is still there. It’s just being hidden by the fact that you’ve got faster drives. In fact, the same goes for adding more memory. It’s just that happens to be cheaper than trying to fix your indexes.”

Listen to Learn

  • Heaps and indexes types explained
  • Randolph’s thoughts on clustered indexes vs. non-clustered indexes
  • Similarity between clustered indexes and primary keys
  • The results of using Management Studio Designer for creating new tables
  • How to create covering indexes
  • The negatives of having too many indexes and what you can do about it
  • How are default fill factors affecting indexes? Is faster storage good reason to avoid fixing the indexes?

Randolph on IMDB
SQL Skills article on BTree
Randolph’s Blog
Randolph on Twitter
Clustered and Non-clustered indexes described
Heaps
Creating Indexes with Included Columns

About Randolph West

IndexesRandolph West solves technology problems with a focus on SQL Server and C#. He is a Microsoft Data Platform MVP who has worked with SQL Server since the late 1990s. When not consulting, he can be seen acting on the stage and screen or doing voices for independent video games.

 

Transcription: Indexes Part 1

Carlos: Ok, so let’s, I guess we’ll go ahead and kick it off. So Randolph, welcome to the show.

Randolph: Thank you very much for having me.

Carlos: Yeah, it’s great. I know that you and Steve have done a little bit of work together and some of your adventures have come up from time to time as we’ve talked about it, and so it’s great to have you here on the show with us today.

Randolph: Well, thank you! It’s been great fun working with Steve. He is a consummate professional, and I would hire him. I’m just saying.

Carlos: There you go.

Steve: And technically I would say I would hire you too because we’ve done that on projects.

Randolph: Yes we have.

Steve: I think we first met on the Database Corruption Challenge in 2015.

Randolph: Has been it where.

Steve: Oh yeah, yes, and then, yeah you did an amazing job on some of the corruption we worked on there. And then you and I have worked on fixing a couple of corruption issues for clients since then, and then we also worked together on another project sort of a longer term disaster recovery moving to Azure type project for our client.

Carlos: So today our topic is indexes and now this can be a pretty broad topic and just some of our pre-planning we thought, “Gosh, there’s no way we’re going to cover it all.” Some of the components are just a little complex to potentially discuss all of them over an audio format. So some of that will be pointing to our show notes and pointing people to additional documentation there. But let’s go ahead, I guess let’s start with that conversation and kind of maybe even defining an index and how that works. Now I think, so the first one, so we talked about indexes, we’re talking about copies of data, the way our data is structured, right. The very first example we have of that is a heap. Basically, a heap is a table with no indexes on it so there’s no structure or order into the way that the data is saved. Obviously, you have columns, right. You can have data types, and all those kind of constraints but the way the data is actually saved is not part of the definition if you will of the table. That makes it a heap.

Steve: Yup, so then really from there we take a look at how do we get to the data faster because with a heap to find any specific piece of data you’ve really got to search through the entire list, or the entire table and look at every single row. And say, “Is this what I’m looking for? Is that what I’m looking for?” Eventually after you’ve looked through the entire pile, heap in this case you come back with the result. But another way to this is by reordering some of the data. The way I like to think of it is sort of like a phonebook or a phone list. And I know growing up I’m used to seeing the yellow pages and the white pages. The yellow pages were the business listing and the white pages were the listing of every person and business with a phone number organized by name. Now, if you were just to take everyone on that phone list and just put them in a random order perhaps in the order that they signed up or purchased their phone service. That’s sort of the equivalent of a heap. There’s no specific order to it that’s useful for searching on it unless of course maybe you’re searching on the order that they got their phone service installed. But it would make it really impossible for anyone using that phonebook if they were just in that order, just random order basically to find anybodies phone number. So what they’ve done with the white pages is they’ve organized all of the people ordered by last name, then by, when there’s multiple people having the same last name you have the first name order, and when there’s multiple people with the first name you have the middle initial order. And that’s really the equivalent of creating a clustered index on last name, first name and middle initial. What that effectively does is it orders the entire table or phonebook in this case by those items so that when you needed to do a lookup and you’re looking for someone who’s name is Smith, you don’t have to start through the As and go all the way through to the Zs and confirm you found all the Smiths. You can just jump to the S section and then track down where Smith is from there really quickly. So then we look at the yellow pages, and the yellow pages are the business listings where the business is organized by business type. So if you’re looking for an accountant you can jump to the A section, find the section where all the accountants are and then lookup who all the accountants that are listed in the yellow pages. But if you’re looking for an exterminator, you need to go to the E section for exterminators and then you can find the list there. Now, with the yellow pages, it’s sort of like a filtered, non-clustered index in that it’s a copy of data that exist in the yellow pages, sorry it’s a copy of a data that exist in the white pages but it’s ordered differently and the reason it’s filtered is that it only includes businesses. So we’re kind of throwing two things in there, one it’s a non-clustered index and then it’s a copy of the data and two it’s filtered because just people have been thrown out and only businesses are shown.

Carlos: One thing that’s kind of important there is that that filter won’t necessarily change, right. That category of business, you’re not going to take that out of your system. While you can potentially filter on dates, it is a specific date. You’re not going to be like keeping an average for example of 90 days in an index. That wouldn’t be a good use of an index. But if you only wanted to see like from example, the beginning of 2017 forward then that might be an option, something like that.

Steve: Right, yup. So then next if you had what’s often referred to as a reverse phone directory. Typically not included with you white pages or yellow pages but it would be ordered by phone number so you can look up a name. That would be the equivalent of a non-clustered index or a copy of some of the data that’s been ordered in a different way to make it quicker to find something. So if you want to find out who has the phone number at 555-1212 well you can look it up by the phone number and then find out their name. And that would be just a plain old non-clustered index with no filter in that case because it would have the entire list. Not sure if the yellow pages and white pages are international or available in all countries but hopefully everyone has sort of understood where we’re going with that as an example on how those indexes work.

Randolph: Yeah, that makes sense.

Carlos: So let’s see, Randolph, what about thoughts on clustered indexes vs. non-clustered indexes?

Randolph: Well, the clustered index should be chosen very carefully because your entire table is going to be sorted according to that column or inside of columns and what you’re looking for is a unique value for every row. And you want to make sure that the clustering key is narrow so that it doesn’t take of that much space because that clustering key is really used in non-clustered indexes. Because you have to  have a way for that copy of the data to match back to the original table so the way that the database works is it stores the clustering key, the clustered index inside the non-clustered index so that it can match back. Your clustered index, your clustering key has to be narrow. It has to be unique preferably and ever increasing. What you don’t want is a random value because for a number of reasons, which we can get into later. A random value causes something called page splits and it kills performance, and it kills the way the storage mechanism which just makes things ugly. So you want a number that is ever increasing like an integer or a big integer if you’re going to have more than 2 or 4 billion rows. It’s just you have to choose carefully.

Carlos: One of the things I like there about that idea, I guess we can kind of jump in there, so you mentioned page splits. Even Steve using the analogy of the phone book in a real and almost, I guess it’s an example. But if you think about taking pieces of paper right, and then basically writing on them and filling that up. That’s kind of what the way that SQL Server works and it has and these pages at least in SQL Server are 8K chunks right, so it has to sort them. When you talk about that sorting and the clustered key, clustered index rather it’s going to store them in that scenario and I think to your point is that when you’re doing that in kind of a random order it has to resort that every time, and those pages can get out of whack because, then all of a sudden I’ve got to fit in data that wasn’t there originally.

Randolph: Exactly, so what happens there is you have to wait for the index to be resorted and that can take time. It can kill your performance badly, so that’s why we try and stir clear of wide random columns for a clustering key or a clustered index. For example, GUID columns are not a good choice for a clustering key. That doesn’t mean that will make a bad index but they would make a bad clustered index because remember the entire table is sorted according to that index because the clustering key is how you’re data is sorted and that is your actual data. It’s not a copy of the data.

Carlos: So it’s worth asking at this point, right, to maybe like a small little poll here. So we talk about non-clustered indexes and then which are separate and different from but very, they kind of go hand in hand with primary keys most of the time, right?

Randolph: There’s an interesting conversation here. Microsoft for I think a good reason, it wasn’t the right choice but a good reason, decided that if you create a primary key inside the Management Studio Designer and in fact even before there in previous versions of the tool Enterprise Manager I think was called in the old days. If you create a primary key on a table it will automatically make that primary key a clustered index. I think the reason for that default was because they wanted you to have a clustered index at least. And because the primary key is unique by definition it made sense from a designer point of view and for people who are not intimately familiar with internals to have at least a unique value as your primary key and also as your clustering key. It makes sense from that point of view but it can run into trouble because let’s face it a GUID is unique. It’s a very good candidate for a primary key but it may not be a good candidate for a clustered index. In fact, it isn’t. There are reasons why you would not want to use the designer on particular tables. In fact, I prefer not to use the designer anymore when I’m creating tables because it has a bunch of defaults that may not be appropriate for the design that I’m implementing. It doesn’t mean I don’t use the designer. In fact, I use it a lot for creating relationships and stuff because it’s easier. But in terms of clustered indexes and primary keys the designer will make that the same set of columns so it’s something to watch out for.

Carlos: Well, now, having said that, kind of from a poll perspective. What percentage of time are your clustered indexes different from your primary key?

Randolph: Well in my case, 1%.

Steve: Yeah, I think I would put that really into two camps. One of them is if it’s a new database that I’m architecting from scratch or new tables that I’m building out where I have control over it. Oftentimes they may end up being different, the clustered index versus the primary key. But if it’s a, I don’t know, third-party application database that I’m jumping in to do some maintenance on or something. Like 99.9% of the time those are the same.

Carlos: Ok, and so I think, that’s been my experience as well most of the time, right, so that 99.9% of the time. It’s pretty much like when I’ve identified an issue or they have for an instance a GUID as their primary key and not in the clustered index, then we decide, “Hey, maybe we should make change here.” I got into a little bit of, so I’ve got some feedback. I’ve just put out this book “Zero to SQL”. And one of the things, actually I put a sentence in there. I said, when talking about non-clustered indexes, I say, “We don’t have to include the primary key of the table in the non-clustered index because this is automatically gets included in the index.” Now, somebody kind of took me to task, and they’re ultimately correct in the sense that, it’s the primary key that gets added, it’s the clustered index that gets added. I just thought they’re just curious I’m like how many times is your primary key not your clustered index. That’s kind of where I was going with that.

Randolph: Yeah, we’re designing from scratches as Steve said. You might have a reason for that.

Steve: So I guess you has just gone into a little bit talking about how you said in the book something about not putting the same columns that are in the primary key into your non-clustered indexes because they are included there as well. What was the feedback you got on that?

Carlos: So technically correct. It’s not the primary key that gets added, it’s the clustered index that gets added to the non-clustered index.

Randolph: And in fact there is a little bit deeper depth into that is it gets into the leaf node guaranteed. But if it’s a unique index then it’s not in the intermediate levels if that matters.

Carlos: Ok, and so here we start talking about and this is part of the index that will be a little bit hard to describe over the audio portion. But one of the things we kind of came up was that the idea of the game of the $10,000 pyramid or even pyramids, the card game, if your familiar with that. And that idea that you have a structured top and the data kind of goes the pyramid format, and that’s one of the things that Randolph mentions there. So they have leaf nodes and other types of names if you will for some of those sections so that as SQL Server is looking through the data it can find, do I need to keep going or not.         

Steve: Yup, so really there when the index is being used, it’s that pyramid or like a tree type structure. And it uses that to be able to quickly jump to the pages that you’re looking for rather than scanning every leaf node. It goes through that tree to get to the data quickly. Randolph, did you want to talk about the B-Tree and how that part works?

Randolph: Well, it’s difficult to speak about with just audio. But if you think of the top of the pyramid there is only ever one page there and that is called the root node. Only ever one page and it’s also an 8K page. And it would be a type of page instead of being a normal data page. It’s called an index page. But otherwise it’s exactly the same as a table structure or a clustered index because remember your clustered index is your data. So your non-clustered index and your clustered index have a similar structure. You’ve got the single page at the top and then inside that page are references to other pages. And those pages contain the index information. So if you need to go to Stedman, you would look on the root page for where the S starts and then tell you which page to should go to where the actual data is and then inside there it will be able to find the data much quicker than having to read the entire table to find Steve Stedman’s number in there.

Carlos: Almost like an index of sorts, right, like a database.

Randolph: It’s exactly what it is, right? It’s just an implementation detail of exactly what you’re talking about. You flip to the back of the book, you look for S, there’s Stedman and you know which page to jump to. That’s exactly what’s happening.

Steve: So then one of the differences with that is the amount of time it takes to get there. Is it if you’re going through a heap or a clustered index that isn’t clustered on last name and you’re searching for me Stedman. If there’s a million pages in your table you have to touch all million of those to check and see, to find the rows with my last name on it. But with the index in that tree structure there you might have to read two, or three, or four pages depending on how deep that index is to get to the S section, and then scan through just a few pages once you’re there to find Stedman. And the difference is you’re touching maybe 8, maybe 10 pages rather than.

Randolph: Yeah, tends a lot for a seek. What you’ve just described, if you’re checking the entire white pages or the entire index that’s called a “scan”, and if you’re just looking for a particular record because you know how to get there through the index then that is called a “seek”.  There’s a whole conversation here about which is better? The answer is none of them because it depends on what you’re doing. If you need 10,000 records scan is much more efficient than if you just need one. So a seek would be more efficient if you’re just looking for one or two records. But if you’re looking for a million out of your 10 million then a scan would be much more efficient because it would only need to read fewer pages in that respect.

Steve: Yup, good point there. I think that’s one of those that it comes back to really what is it you’re trying to do. Sometimes the query optimizer has to guess at what is it you’re attempting to do with your query. Usually it gets it pretty close but sometimes it’s not quite right and you end up with a seek when a scan may have been better and vice versa.

Randolph: Yeah, and that gets into conversations about are your indexes correct? Are they stale? Do they need to be updated? Do they need to be maintained? There’s a whole bunch of stuff here that we can talk about. There’s statistics as well.

Carlos: I guess we can jump into some of that. Before we do that I think one of those ideas is, are my indexes good enough? Because the optimizer can do some different things based on the information that it has, right? And then there’s this idea of a covering index. Take us through that.

Randolph: What happens there is we spoke about the structure of the index briefly were you’ve got your root node, you’ve got your leaf node where your data is, and then there’s this intermediate levels. What happens with an included column is that if you have a table where you need to bring back the last name, the first name, the telephone number but there’s a whole bunch of other data in there as well then your index would only need to contain those three columns. But let’s say that some of your queries had happen quite often would also require a middle name. You could add the middle name as an included column. That middle name would appear in the leaf level so the bottom level of the index. But it wouldn’t appear in the intermediate levels and that’s called an included column. Your index doesn’t have to be huge if it had four columns because you wouldn’t be using it for all of your queries but it would be a suitable covering query for the one that requires the middle name. I hope I haven’t confused everybody by saying it that way.

Steve: Just a little bit more detail on the covering index because sometimes when I talked about covering index is people say, “Oh, how do I create that? Do I create, is that a different create statement than a non-clustered or clustered index?” And the answer to that is really, No. A covering index is simply an index where all of the data that you’re asking for in your query happens to be in that non-clustered index. It can return it all back to the query without ever touching the heap or the clustered index. But if it’s not covering, what that means is it uses that non-clustered index to find the rows that you need, and then it jumps back to the heap or the clustered index to bring in the data that you’re looking for. So that last step to jump to the heap or non-clustered index that skipped in the covering index and that can save quite a bit of time too.

Randolph: Yup.

Carlos: And that’s in your execution plans you’ll see that lookup, right. And that’s what’s doing. It is basically, I don’t have the information I need so I need to jump and grab it, right. So whether you should then create a “covering index” will depend on how many times a query gets run, performance to that query and all those kind of things.

Randolph: And how many columns there are in the lookup and all sorts of stuff. Yeah, it’s an art. It’s not a science.

Carlos: Right.

Steve: So then with that when you’re looking at the query plan and there is a suggested index. And I see this quite often where it says suggested index with two or three columns and then include, and it includes like 25 columns in it. That kind of thing would certainly give you a covering index but what are your thoughts when SQL Server is suggesting those types of indexes?

Randolph: Don’t do that.

Steve: And why?

Randolph: Ok so, when SQL recommends and index, a missing index, it is only specific to that query. If we know anything about query plans you can add a space to a query and it will suddenly be a different query plan maybe. Don’t just create missing indexes as they come up. What you need to do is to evaluate against other missing indexes or other recommendations for indexes on that table. And maybe combine the different recommendations into one or two new indexes or modifying existing indexes that try and accommodate as many of those recommendations as possible. An index that includes all of the columns of the table you might as well just be creating the clustered index. It’s just not efficient.

Steve: So one of the things that I see oftentimes when I’m looking at performance tuning and you’re looking at maybe indexes that are inefficient or not used. Often times you see them names something like missing index name which is a clue that somebody just copied and pasted from the suggested missing index and run it then they end up with an index that may not be really what they’re really looking for there.

Randolph: In fact, this goes back to Pinal’s interview that you did in December where he said his reference point is, “Do you have more than ten indexes on a table.” He’s going to start looking into that and see why. Because you can have on a legacy system or even a vendor system, third-party that you’re not allowed to touch. People have over the years gone and added indexes because it helps with the particular query. You can end up with duplicate indexes or indexes that are only different by say an included column. You’ve got all of that maintenance that’s going into keeping these indexes up to date. And statistics and all other stuffs that goes with that when in fact you can consolidate and reduce your indexes quite easily by just analyzing the system and how those queries are running.

Steve: Ok, so then let’s say I’ve got too many indexes like you’re talking about there and some of them are not efficient, or they’re not being used. They may help someday with some query but what are the negatives to having too many indexes specifically what’s going to slow down when you have too many of those?

Randolph: Well, there are two things that I would like to address. The first things is, if this index is useful once a month then create it once a month then drop it again. If you’ve got a reporting style query that has to run once month then generates a report then without the index it would take 10x as long, go ahead and build that index just before the report runs and then afterwards drop it because it’s not in use 28 days of the month or 29 days of the month. There’s nothing wrong with you creating and dropping index when you need it as long as it doesn’t obviously impact business hours. That’s the first thing. The second thing to answer the question is every time you modify the table it has to make sure that the indexes are up to date. So if you’ve got 10 indexes or 15 indexes on a table it’s going to have to, every time you insert or update a row, or delete in fact which is the same as an update because it’s changing the data you’ve got to make sure that that row is replicated or the change of the row is replicated in each one of those indexes. So it has to lock the table and it has to lock the indexes while it’s updating them and that can cause performance problems in of itself. It can also cause problems because you are locking data so you cannot make any changes, you cannot select from that unless you start using tricks that you shouldn’t be using query hence which I’m not even going to mention. Because you have too many indexes and you’re trying to satisfy a possibility that might happen in the future when in fact you’ve got statistics that tell you, DMVs that tell you exactly what’s going on with your database. Use the information available. It’s not SQL 2000 anymore. We have better ways of analyzing how your queries are running.

Steve: Ok, so then one of the things that often comes up, and it’s highly debatable, I don’t know, in places that I’ve been recently is fill factor. And the concept of what is a GUID default fill factor setting and what does that really mean to your indexes?

Randolph: Ok well, fill factor is in theory is about leaving enough space in the data pages themselves so that if you do need to add columns and it is a random type of column like we spoke about before with GUIDs. It leaves space in the data page so that it doesn’t have to split as often and it can go and insert data into those pages without having to resort the entire index every time. That’s kind of what’s it’s about. Do you want to add anything on that definition?

Steve: Well, I was just thinking, I heard a debate awhile back of about someone arguing that with SSDs and faster storage that your fill factor changes over spinning storage. I think that, I don’t know if I entirely agree with that. 

Randolph: I completely disagree because all that an SSD is doing is making your problems go away until a later time. The problem is still there. It’s just being hidden by the fact that you’ve got faster drives. In fact, the same goes for adding more memory. It’s just that happens to be cheaper than trying to fix your indexes.

Steve: Yup. Ok then, so with fill factor I think what we end up seeing oftentimes is that people have a default fill factor setting on their database. They think that that’s going to cover what’s ever they need there it’s going to be right for all occasions. But really it comes down to that fill factor needs to vary depending on the data type, the frequency of change, whether the data going in is sequential or ever growing, or whether it’s completely random. Fill factor really depends on the data.

Carlos: Do you have different fill factor settings based on your clustered vs. non-clustered indexes? Because that kind of would be a good sorting factor in the sense that in theory we talked about our clustered indexes. Again, assuming that they’re unique, that they’re ordered, that you shouldn’t be seeing as much fragmentation there because again they’re kind of in that order. Kind of going back to the idea of them being in the primary key or unique, ascending, keep going, identity is a good example of that. It’s the clustered index is that or going to be a little bit more random if you will, and you could see a little bit more shuffling.

Randolph: The magic word you use there was fragmentation. This used to be a huge problem as Steve mentioned with spinning drives. While I disagree with the fact that I’m not worrying about fill factor at all is ok.  I disagree, you should worry about fragmentation. You should worry about these things purely from the point of views of knowing that your database isn’t in a good condition is well maintained. At the same time though, index fragmentation with faster drives doesn’t matter as much as for example making sure that your statistics are up to date. In fact, I now say in my performance tuning session if you want to get the best benefit from your maintenance window. It is far better time spent by updating your statistics than to, for example, rebuild an entire index. Firstly, from an I/O point of view you need a size of data space for large tables if you’re going to be building an index rebuilt. That’s going to need, transaction log space is going to need space in the table itself, in the data file itself. It’s going to need a lot of resources, and memory, and all of those things. And if you’re replicating, it’s going to replicate all. It’s a big deal to rebuild a large table whereas a much more effective use of your time to do a statistics update. And the benefit from having an index rebuild versus your query plan being more effective fragmentation is less of a problem now. So from that point of view, Steve, yes I will agree with your statement before about fill factor But yeah, the one thing I wanted to mention on fragmentation is that there was a confusion a few years ago, fill factor of zero and fill factor of 100. What’s the difference? There is no difference. They’re exactly the same thing. Zero fill factor means there is zero space left in the page, fill factor of a hundred means it’s 100% used so it’s the same thing. I don’t know why they decided to distinguished but who knows it is Microsoft. A fill factor is a percentage from 1 to 100 so zero is a hundred. What you want to do there as you mentioned is it depends on the type of column you’re addressing. Less so of the type of index you’re using but more about the type of column. So a GUID as we spoken about about before is a random value so you would want to have a lower fill factor on that index for that type of data because then it reduces number of page splits which reduces the amount of fragmentation. However, you will still need to maintain that index from time to time to make sure that the fragmentation is reduced.

Steve: One of the interesting things I’ve come across there is that if you’re in a scenario where somebody has a GUID as their primary key and as their clustered index and it might be in a position where it’s too big in a given maintenance window to change that clustered index or too time consuming to change that. One of the work arounds that I found there is to use the new sequential ID rather than new ID so that you can turn GUIDs into sort of a pseudo ever incrementing value.

Randolph: Right, and that’s 16 bytes wide so.

Steve: Yup, it’s still big.

Randolph: It’s still big but it is sequential as you mentioned.

Steve: Yeah, it’s better than random at that point.

Randolph: It’s better than random but the reason people use GUIDs is usually from a development first perspective so the developers want to manage the index, sorry the primary key in the application as opposed to the database handling it. For number of legacy reasons that was a good idea 15 years ago. It’s not a good idea now. So there are a lot of habits and this is the way we do things happening from a development side that are not being, well the new stuff in SQL Server is not being adapted as effectively because there is 15 year old news on the Internet that says SQL is not good at this. The other thing is we have to educate people to say, listen there is a reason these RDBMS is in the top 3 in the world. There’s Oracle, there’s MySQL and there’s SQL Server, those are the top 3 RDBMSs on the world, Relational Database Management System. So SQL Server is in the top 3 for a reason. It’s a good product. It’s in fact one of the best products out there to handle your data so you should be using the newer features and the newer compatibility levels and things like that. And get away from the old habits of 15 years ago because they’re bad habits and things have gotten better in the last few years.

Steve: Yup, and that is oftentimes a challenging debate to have. And I found that to show examples or to come in and show what the performance difference is between one way of doing it versus the other way. It is a good way to show what’s better there. Ok, so then another one is talked about just the different options around indexing that we have is the concept of filtered indexes. And I know I threw this term out when we’re talking about the phonebook directory around the business names in the yellow pages. So filtered indexes, any thoughts on pros and cons and sort of how that works that you want to talk about here?

Randolph: Well, the idea of a filtered indexes is that if you have a very particular condition in your WHERE clause in your query. Let’s say that you want to grab all of the Stedman’s in the phonebook where there middle name equals J or starts with a J. If that WHERE condition happens often, then you could create a filtered index that uses that WHERE condition and then the index will only include those rows that satisfy that condition. The problem with the filtered index there is that if you use a parameter in your WHERE clause it’s not going to use the filtered index. So it’s solving a very specific problem in my opinion. It’s a good problem that it solves but it’s a very particular way of solving it. And you have to be very careful when you create your indexes and create your queries that you’re using them correctly.

Steve: Yup, so a real world example I came across there that filtered indexes were very handy was around a very large table and needing to purge data beyond a certain date. I found that when I just ran DELETE statement to delete where something that’s older than a given date it was forcing a large index again to find that. And it was like taking 40 minutes to delete a thousand rows. And then what I did instead was I went in and I added a filtered index and deleting one day at a time that was beyond the date range. I added a filtered index just for that one day. That filtered index gave me the shortcut to be able to delete a thousand rows at a time just split second like less than a second to delete those rather than extended amount of time to delete it. I found in that case I could add that filtered index do the work I needed to do and then drop it and then add a new one when I was working on a different day, and that was really a handy way to use it, to improve performance there.

Randolph: Yeah, that’s an excellent example of when a filtered index would make a lot of sense. Another example is if you have a very large table and you have a bit typed field where you’ve got a nullable values, it’s a nullable column, and you’ve only got a few of those rows that have a value of true or false and you could have an index, a filtered index on either of the null values or the populated values depending on what type of query you’re writing. And it can identify those columns or those rows that have that bit column set much more easily.

Steve: Yup, yup, another good example there.

Carlos: Exactly, yeah, that’s been my experience as well.

Steve: So then let’s jump a little bit more into the care and feeding side of things, and I know we touched a little bit on rebuild and a little bit on statistics but if we look at what are the general things that we needed to do to take care of indexes and keep them well structured on an ongoing basis?

Randolph: I’d be interested to hear your side because I’ve been speaking a lot. I’ll open if I had any feedback.

Steve: Ok, I mean, so really I think on that there’s the concept of reorganizing versus rebuilding as you get fragmentation over time. And then there’s statistics and I think that, or let me start with a very bad way of doing it that I see quite often that is if you go and use one of the default maintenance plans. And the maintenance plan you just go and click let’s rebuild my indexes, let’s reorganize my indexes, then let’s rebuilt statistics. One of the things you end up doing is rebuilding which basically rebuilds the entire index from scratch without, basically if you’re in Enterprise Edition you can do that on online mode but if you’re not in Enterprise Edition that’s an offline operation. But what the default maintenance plan people sometimes check is that they say, “Let’s rebuild them.” And that gives you the best structured index you’re probably going to end up with and it also rebuilds your statistics. And then they reorganized them and then they rebuilt statistics with like a 5% sampling or something like that. And in that example you end up with a whole lot of work that’s actually giving you sort of a negative approach or negative impact. Whereas if you just rebuilt them to begin with and then not done the reorg, and not done the statistics rebuild afterwards you would be in a lot better shape.

Randolph: So, you know, those defaults that you’re talking about, the reason that happens is that because the maintenance plan wizard has got rebuilt then reorg, then statistics in that order in the list.

Steve: Absolutely, yeah, people just click the checkboxes or select them and that’s what you get.

Randolph: So when I see that I blame Microsoft for that. That’s purely Microsoft’s fault because generally people here using the maintenance wizard are using it because they don’t know how to do it with say, Ola Hallengren or MinionWare tools so just to name two, there are many. So that’s Microsoft’s fault and every time I see that I do not blame the DBA, or the accidental DBA, or whoever is managing it. It’s not their fault. But it is causing a whole bunch of work that is unnecessary.

Steve: Yes, it may not be their fault but it’s their pain they have to deal with. When I see that I generally going a little bit of education to explain why that’s bad and turn those off. And like you said, replace it with MinionWare or Ola Hallengren scripts. Both which are far better solutions at doing at than the default maintenance plans.

Randolph: So the one thing that I wanted to mention, Steve, is that an index rebuild is, as I mentioned before is size of data operations. So if your table is 10GB and you need to rebuild it, if you’re doing a clustered index rebuild it’s going to require 10GB of space in the data file and then it’s going to require at least 10GB in the transaction log. And if you’re rebuilding a clustered index it is going to by default rebuild all of the non-clustered indexes on that table that referred to that clustered index and that means all of them. So if you’ve got a 10GB table and you’ve got 30GB of non-clustered indexes on that table it’s going to require at least 40GB in the data file and 40GB in the transaction log because it’s going to rebuild everything. So if you’ve got a maintenance plan that says let’s do a clustered index rebuild and then before that you had the non-clustered indexes rebuilding it’s going to do them again anyway so it’s a nightmare. This is why I say it is statistic updates more regularly are going to be a lot more effective than doing index rebuilds over the long term. You still do need index maintenance but let’s not do it every night.

Steve: And that’s where I like the third party solutions that we mentioned there for the ability to rebuild those where you have options to say given a certain level of fragmentation I want to choose to reorganize this or if it’s even more fragmented then we’re going to choose to rebuild it. And some of them even have the option of only do this for a certain amount of time.

Randolph: Can you go into detail what’s the difference between a rebuild and a reorg is?

Steve: Ok, so with the rebuild it’s basically, technically it’s like the equivalent if you drop the index then recreate it, and then it’s going to be completely rebuilt from scratch and it will take up the log and disk I/O like you talked about a moment ago. But with the reorg, what that’s going to do is going to the index and basically restructure or compact the index pages as needed. It will go through and shuffle them around to compact them down to a smaller amount of space. And it will do that page by page as it goes through there and that can be done while things are online in Standard Edition. But it’s slow and the end outcome is not as good as a complete rebuild. It might get you real close but it’s not going to be good as complete rebuild but the benefit is you can keep it online.

Randolph: And you still need to update the statistics after you’ve done a reorganization which is something that people tend to forget as well. The statistics need to be up to date so that the query optimizer knows where the data is and how much of it is in there, and what the distribution of the data is in that index so that it can make the right choice about what type of query to create because a lot of queries is coming down to SORTS and JOINS and that’s where you find your biggest performance bottlenecks. A SORT is a blocking operation so everything in the query has to stop until the sort is finished. So if your index is up to date and it’s sorted correctly then it doesn’t have to do that operation it’s already done. Whole bunch of things about loops, we can talk about here as well which I’m not going to. An up to date statistic is much more valuable to the query optimizer than a defragmented index.

Carlos: There’s another point there even that, we’ve talked about current feeding of the indexes, right, rebuilds vs. reorgs. However, we can also have statistics that are outside of the indexes, right, on the table. Those are also providing feedback to SQL Server to the optimizer. And if we neglect those we’re still kind of the same peril because the optimizer maybe using the statistics but we’re not giving them that current feeding.

Randolph: Correct so, it’s valuable to have a, I hate this word, holistic approach to your current feeding is that you need to look at not just indexes, and not just fragmentation, and not just statistics but all of them. And in a way that is effective and efficient for your maintenance window. You don’t want to go and rebuild indexes every night because it doesn’t make any sense. But you also don’t want to stop doing statistics updates.

Steve: Yup. One thing I commonly see with that is rebuilding of the statistics or updating the statistics where somebody will have a job that goes through and blindly update every single statistic in the entire database every night. And that can have a pretty big performance impact on things depending on the size of your database. And a better approach to do that is to look at which tables or indexes actually have changes and only update those that have a fair amount of changes to them on that frequent basis. And then maybe on a regular basis of once a month or once a week you do an update on across all of them.

Randolph: Yeah, that’s a very good point. The amount of data that changes in a table if you’ve got automatic statistics enabled, we’re going to go a little bit of a segway now, by default automatic statistics are enabled. There are other settings that you can choose in 2016. There’s a whole bunch of new stuff per database which you can choose. Let’s talk about everything before 2016 because that’s what most people are still using. Statistics will only update themselves if a certain percentage of the table has changed. The value has changed across versions recently but the fact is that statistics will only update themselves if a certain percentage has been changed. So if you’ve got a really large table with tens of millions of records or rows and you have only one million that have changed recently statistics will not be updated automatically. So it does pay you to use one of the third party tools we’ve mentioned like MinionWare or Ola Hallengren’s maintenance solution where it will go and check, has any number changed and then you can update the statistics there and that will help your query optimizer as well.

Steve: The other side effect you can get with that is that if you do hit that threshold where it decides it’s time to rebuild those statistics that might be in the middle of your peak load during the day. And right when you’ve got a lot of traffic coming to your database and index or statistics rebuild occurred that can have impact on things too.

Randolph: That in fact is one of the recommendations for SharePoint style databases. There’s a lot of things I don’t like about SharePoint but the fact is a lot of people use it so one of their recommendations is to turn off automatic statistics updates on any SharePoint style database. That includes CRM, it includes GreatPlains, all of those even. What’s the one that I used? Whatever it is, there’s a whole bunch of them where you should turn off statistics updates automatically and then include statistics rebuilds in your maintenance plans. So it’s just to keep in mind each database is different.

Steve: It will probably apply to Navision and Dynamics as well. 

Randolph: Yes that’s the one. Yeah, Dynamics is the one I’m trying to remember. Thank you! So any SharePoint style database they do recommend turning off statistics updates automatically and to do them in a separate maintenance window. So it pays to do your research to make sure that you’re doing the right kind of maintenance plans for your database and your instance.

Steve: Yup, very good point.

Carlos: So I guess a couple of different things we talked about all of these components. And I guess let’s now talk about some of the benefits right, so all of these things we have to go in, right, kind of the holistic approach, kind of having to know our data, getting more familiar with it. Ultimately to what end are we going to do that? I think, so we’ve talked a little bit about performance. I guess we should probably talk about how that performance gets into the system if you will or the mechanisms that cause the improvements?

Randolph: Ok, so when I do my performance session, what I say is, “You’ve got expensive items like your CPU. You’ve got less expensive items like your RAM, and you’ve got even less expensive items like your hard drives.” So your CPU is the most valuable thing to you because the SQL license is per CPU core and you want to make sure you’re using the most efficiencies of your CPU and memory as you can. What an index does is, we spoke about this before, it is a copy of the data so you want to keep your indexes as efficient as possible so that if you’ve got a large table you don’t want your index to be large as well. You want it to be smaller so that less of it is in memory because that’s what this game is about. SQL Server is all about being in memory as much data as possible in memory. So for Standard Edition up to 2016 even you’ve only got a certain amount of memory that you can access. 2016 Service Pack 1, the limit is still there but that’s your buffer pool that is in memory as opposed to everything. But the fact is that there are limits to the amount of memory you can use for SQL Server specifically with Standard Edition because not everybody can afford Enterprise. So you have to manage the amount of data that’s going into memory as much as you can and that is most effectively done by the right kind of indexes for your queries. And that’s also why you don’t want to have duplicate indexes because it will also be in memory. Also, you don’t want to have wide indexes because they will take up more memory than they need to. And that’s why included columns are very handy way to reduce the size of your indexes that’s why we have filtered indexes. All of these things to try and reduce the amount of data that is in memory so that we can do a lot more with what’s in the buffer pool.

Carlos: So then how do we go about or what’s the best ways to determine, we talked about looking at execution plan that kind of gives the recommended index there. So what are some good strategies to go about deciding which index is get included and which ones we need to ignore?

Steve: One approach I take on that is if I’m looking at an execution plan and it suggests an index. If it’s a development environment one of the things I’ll try initially is just create the index and see how it helps and I’ll drop it, and then go look and see is there a similar index. I mean, if the index that it suggested was beneficial I’ll go and see if there’s a similar index that could be modified or added to that would give the similar and a fact that the suggested index was doing. And sometimes that means you’re taking existing index and just add in an include to it or taking an existing index and add in another column into it. 

Carlos: Ok, so that’s kind of a dev environment, one query at a time.

Steve: Yup, and the other approach that I see there is I can even go and look at some of the missing index DMVs to go and figure out. Actually you look at DMVs for a handful of things, one is to figure out what are the missing indexes and figure out the overall cause associated with those missing indexes. And then come up with what are some good ones to add that are not going to be duplicates of other indexes and then see how that improves performance. You can also, using DMVs, go in and look, you can figure out what are your duplicate indexes. If you have duplicate indexes oftentimes you’ll see that one of them might be not being used at all and the other one is being used a lot. You can go in and drop one of those. However, you need to be careful when you’re looking at unique indexes. You don’t want to drop a unique index versus a clustered index or non-clustered index you want to look at and drop the right one there. Another thing to look at is big clustered indexes, that’s another thing that you track down is what are the clustered indexes that are really wide. And that really wide means it’s going to ripple through all the non-clustered indexes with that extra size. You can also track down unused indexes. What are the indexes that are on your system that are not being used? Now when you look at that you need to be very careful because that’s only going to be the ones that haven’t been used since the SQL Server instance restarted. Also, if you rebuild and index I believe it clears those index usage stats.

Randolph: Sometimes.

Steve: Sometimes, exactly. So it’s something that if you see there’s an index that is not being used you might want to track it over a few weeks to a month and confirm it, yeah it really isn’t being used and then go and take a look dropping those.

Randolph: That goes back to one of the best practices is to have a baseline. So know what your system is doing and track that overtime and then you can refer back to that baseline and say, “Well, this is exactly what’s happening. This is different.” And go from there. Yeah, that’s a good point.

Steve: Yup, and that reminds me of a project, Carlos, that you and I worked on. Were we setup a monitoring component that ran for a month and kept track of unused index details, and log them every few hours, and then we went back at the end of the month and reviewed the ones that over the entire month had no utilization and then suggest those as possible candidates to be dropped.

Randolph: That reminded me of Pinal’s comment as well that he’ll only look at the database that’s been running for a week. The one thing I wanted to mention is the DMVs that you’re referring to there’s one in particular, there’s a group of diagnostic queries that Glenn Berry, Glenn Alan Berry, from sqlskills, he uses and maintains them. He’s got a couple of, and so does Brent Ozar and a whole bunch of other people. But Brent Ozar’s one called as sp_BlitzIndex and Glenn Berry’s one is included in his diagnostic scripts. It’s a very similar query and what it does it waits your missing indexes as according to number of scans, number of seeks and something called Average User Impact. Now that Average User Impact number can be, I think it’s unit less, I’m not quite sure how does that number is calculated but if it’s really in a high impact like tens and hundreds of thousands then usually I will go and look at that first and say, “Well, how many times has this index been used in terms of number of seek and scans.” And if it’s a high usage index that is missing or a high impact then I will usually create that without too many modifications if it’s not too wide or doesn’t have too many include columns.

Carlos: I think it’s the number of times requested multiplied by the number of hits.

Steve: Yup, and just to jump in there I think I’d feel a little bit left out I didn’t have the chance to mention Database Health Monitor on that. In Database Health Monitor there are many indexing reports similar to what Randolph has described there.

Randolph: Certainly, from Steve Stedman’s solutions has a magnificent and free tool called Database Health Monitor which I have used. I don’t even get paid for this statement. It’s a good product. It’s free which is even more amazing. This is the great thing about the SQL Server community. There are a lot of free tools that are out there that are adding value all the time. And all that people asked is that you recognized them and I recognized Steve as a valuable member of our community. That ends the sponsorship message.             

Steve: Thank you for the shameless plug.

Carlos: So it’s interesting, I guess, and maybe I’m being a little bit paranoid but using, so when I’m looking at that impact and I will take that and also using the ones that I’m interested in adding. Of course I want to do what Steve mentioned looking for duplicates or kind of what’s there, right? Then am I going to get over my threshold of ten or whatever just kind of taking into consideration what additional load am I putting on this table by creating the index. And while you want to test that out mostly the environments that I worked on they just don’t have a good mechanism for creating the similar load as I have in production. So when I go to implement that index one of the first things I’ll start to look at is those usage stats, right. Because I want the usage stats in the index that I just created to be going up because I know that, “Ok well, wait a second what did I do? Is this still a really good candidate?”

Randolph: Yeah, that’s a very good point.

Carlos: So I guess, other that’s how are going to go out and creating them, couple of other things that we didn’t really get into some of the other new indexes like column store or even XML indexes. But I guess other thoughts about when to start looking at these other, we even try to approach that now. Maybe I feel like we should punt on some of those. 

Randolph: I can briefly mention about XML indexes that the way they work is they’re going   to be larger than your table or your XML column. The reason being is that it will create an internal table according to the structure the XML document or XML field that you’re indexing. So it actually expands out the XML data into an internal table and then indexes that so you could end up with a column that is say 100kb maximum. That’s a small one and you can end up with an index that is 500kb, or gigabyte, or tens of gigabyte because it’s creating an internal table under the covers. A system table that it is then indexing so be wary of XML columns in that if you’re going to index then make sure that there’s sensible indexes that they’re only indexing certain nodes inside that XML document and be aware of that. And also I’d like to add here that if you use the XML data type, the data going into that XML row or column is not going be the same as what you get out. It does modify the XML data going in for whatever reasons.

Carlos: Say that one more time.

Randolph: If you have XML data that you put in for auditing reasons for example. If you pull it out it’s going to have been modified somehow. Either the tags will be slightly different or the spacing will be slightly different so do not use an XML data type to store audited records of XML types. If you want to keep an absolute record of what your XML looked like as it came out of a web service for whatever reason store it in a varchar or nvarchar column instead because in it it is identical. If you put it into an XML data type it will modify the data. It is still the same data but it will slightly modify. The tags will be slight different or whatever.

Steve: So then as far as column store indexes I think that’s something we could probably dedicate an entire episode just to talk about.

Randolph: We definitely can.

Carlos: I think you’re right.

Randolph: A column store index is neither an index nor clustered so it’s so complicated. In fact, there is a series of post. I think it’s Nico who has done it over one hundred posts on how column store indexes work. We could talk for days on that. They are fascinating and completely subverts what you think you know about indexes, and data, and row level storage and all that kind of stuff. It’s fascinating stuff.

Steve: Yeah, and just a few notes on that I think that it’s one of those things that is great for data warehousing or OLAP type things. And may not always be the best option for your OLTP side.

Randolph: At the same time you could have reporting style queries in your database and with 2016’s optimizations for column store you could have which can change and can be clustered and all sorts of stuff. You could have some stuff that in your OLTP environment that could be OLAP style indexes. Yeah, there’s so much to think about there.

Carlos: We saw that in Episode 78 with Brian Carrig. They talked about using their column store index in a transactional environment.

Steve: Yup, and then one of the things that I’ve seen that is one of those misunderstandings around column stores is that if you, because column store does the compression on each column, is that people think of it as I don’t have to have any non-clustered indexes if I have a column store index. And that’s not true at all. And that if you’re looking at a bigger table and you’re using column store you may still need to have some non-clustered indexes on it as well.

Randolph: Oh yeah, so do you research. Read all one hundred and whatever post. Become an expert then implement them.

Carlos: So we’ve talked a little bit about, so implications, how do we know what to use. So space considerations we talked a little bit about more in the rebuild process, right? That we’re going to need additional space in our data file and our log file things like that. I think we’ve had a pretty good conversation there. I guess one last idea I’ll throw out that we can bat around a little bit. We talked a lot about having multiple TempDB files and so one of the thoughts that I’ve seen out there is if you’re potentially, I won’t say, so what’s the value there is that if you’re looking to spread out or move your indexes from where your data might reside. One way to do that is to create another data file potentially on a separate storage and then rebuild those indexes.

Randolph: Ok, there are two things that I can talk about there. The first thing I wanted to mention is if you do need to move your table into a different file or file group all you have to do is do a clustered index rebuild and target that file group as the new location for the clustered index because remember the clustered index is you data. So that’s a handy way if you need to move your table into a different file group that’s how you will do it. The other thing is by virtue of that definition there is that you could have your non-clustered indexes in a different file as your data and you might want that for improved I/O performance or if you’ve got your read-only stuff in a different file group or all sorts of reasons for that. It’s very handy for splitting your load on the I/O level. Less of a problem these days but it’s a nice thing for large tables to split your non-clustered indexes from your clustered index so that’s not reading from the same portion of the disk or the I/O subsystem or both.

Carlos: Right.

Steve: Just to note on that index rebuild for a clustered index to move it to a different file group although that will move the table and the entire index. One of the things that could be left behind when you do that is any of the varchar max or nvarchar max or other large items that are located outside of the data pages for that table.

Randolph: Yeah, off row data will be affected. Steve, I don’t know because I’ve never tried. What happens if you tell it to do the index rebuild with the log compaction enabled.

Steve: You know, that’s a good question. I have not tried that for a long time.

Randolph: I smell a blog post.

Steve: Yup.

Carlos: Ok, very good. Well, awesome. I think great conversation and obviously there’s a lot more that we could talk about indexes but I think this was a noble attempt at covering some of the basics and getting into some of the nitty gritty as well.

Randolph: Yeah, the fact is that indexes don’t stand by themselves. They are a very important part of everything including statistics and everything else so don’t think that once you became an index expert you become an index performance tuning expert because that’s not true. You have to have a very broad knowledge of how things work in a number of different fields used upon to get the best performance out of your system. And there’s nothing wrong with good enough. You don’t have to have 100% defrag indexes. You don’t have to have indexes rebuilt. You can have them reorganized. Don’t have to have them reorganized at all if your statistics are up to date in certain contexts. There are a lot of tradeoffs that you have to think about when doing your maintenance plans and indexes form just a small part of that.

Steve: Yup.  

Carlos: Great.

Steve: Very good point.

Carlos: So shall we do SQL Family?

Steve: So how did you first get started with SQL Server, Randolph?

Randolph: Steve, that’s a good question because when I originally got this question I had to think very hard about it. I have been fascinated from a personal level about organizing data. Back in my youth I would create lists and lists of things CDs, books all that kind of stuff and I would do them in the paper equivalent of Microsoft Excel. And then when Lotus 123 came along I started playing with that because my dad worked at the bank and he was a 123 expert. If I’d know about VisiCalc I probably would have used it. And then I got into Microsoft Access because it was there. It was a very old version of Access. I think it was version 2 that I started with and then I started working at a PeopleSoft implementation partner in South Africa where I am from. And that’s where I first work with SQL Server and Oracle at the same time, and I was not a fan of SQL Server. I have to be honest. At that time it was version 6.5 and version 7 that’s around the time I joined so there were issues because as some of your listeners may know SQL Server’s engine was rebuilt around 7 time, 6.5, 7 time. In fact, if you look on MSDN you can download 6.5 and you can download 2000 but you cannot download version 7. It is very difficult to come by. There are maybe good reasons for that. I got exposed to SQL Server that would have been in 1997, around there. So yeah, that’s where I first got exposed but then I didn’t really play with it until I got to the bank. I was more of an Oracle guy. Got to the bank in 2006 so there was a large gap of not playing in SQL Server and then I couldn’t figure out why DESCRIBE, the keyword DESCRIBE wasn’t working in SQL Server. For all of you Oracle office you notice there I prefer SQL Server now because as I said before a lot has changed. It is a much better product than it was in 1997. In fact, I think it has surpassed Oracle. I think it’s better than anything else as well and that’s because I’m trying to keep abreast of all the new stuff. I don’t want to be stuck in the past and have assumptions about the product. I want to play around with the new stuff so. That was a long way of saying 20 years or so.

Steve: Ok. Well I know that in the late 90’s Oracle was a much better product than SQL Server.

Randolph: It really was.

Steve: Yup, and that has changed. That has changed significantly in the last 17 years.

Randolph: Yeah, it has.

Carlos: If you could change one thing about SQL Server what would it be?

Randolph: I had a pot for this but I’m going to change my mind. The pot answer is in-memory processing on SQL Server, in-memory OLTP which was called Hackathon, that was the code name for it. There is a transaction log operation that happens even for stuff that you don’t want kept. It’s still being backed by disks somewhere. It may not be in the transaction log itself but it is still being backed by disk somewhere. I would like to have in-memory objects that do not have anything on disk backing them at all. They must just be there, and if the server goes down, oh well tough luck.

Carlos: Alright. Yes, I remember seeing you in the Bob Ward presentation at SQL PASS talking about the in-memory OLTP.

Randolph: There’s some very very cool stuff with in-memory OLTP that I’ve been playing with. I’ve been blogging about it as well on my blog bornsql.ca. That speaks about some in-memory ops and I found. Interestingly I found another issue that I’ve spoken to Jonathan Kehayias about where Temp tables are slower than Temp variables in certain cases. So I’m investigating that at the moment and it had to do with my in-memory investigations.

Steve: Oh interesting, I’d like to see what you find there.

Randolph: If I could change, that was my pot answer about SQL Server what I would change. What I would change is having a TempDB per user database as a configurable extra.

Steve: That would be nice.

Randolph: In other words it doesn’t have to be default but if I need a TempDB for a particular database I would like to have that as a separate entity so that I could manage it separately and keep track of the objects that are happening for that particular database. The architecture for that is going to be really difficult to do so I don’t know if Microsoft is going to do that but that’s what I would like.

Steve: Alright, so what’s the best piece of career advice that you’ve received.

Randolph: Oh, it’s going to be along answer as well, sorry. “Let your work speak for itself” is the first and foremost one. So it doesn’t matter what people say about you if your work can speak for itself then you don’t have to say anything. That’s the best piece of advice that I had. But the other is “Everything you say and everything you do is being recorded somewhere” so please treat people with respect. Treat people as you would have them treat you and don’t say things on voicemails that can be played back to you at meetings with the CEO and saying Randolph don’t do that.      

Steve: I like that too.

Carlos: We won’t ask for the personal experience that led you to that advice.

Randolph: I have at the end of the year or every now and then when it’s a quiet time on my blog I say career limiting moves and that was one of them. So there are a bunch of career limiting moves that I can recommend to you. Don’t do because I’ve done them not because it’s good advice but it’s because I’ve done them so don’t do that because I’ve done it, and I will be one of them.

Steve: Alright.

Carlos: Our last question for you today. If you could have one superhero power what would it be and why do you want it?

Randolph: I thought about this and I honestly don’t know. I liked Pinal’s when where he could touch a server and know what’s wrong with it. The problem is whenever I touch as server if it’s going to break it’s going to break then and there. Maybe I want the reverse of that superpower.

Steve: To just know without touching the server.

Randolph: Yeah, you know per cost of maintenance is one thing. But if I’m in a room and it is going to break it is going to break while I’m there which is good I guess in way because you then you know that we can figure it out then and there. But I’d like the reverse of that that would be cool.

Steve: Ok.

Carlos: Well, awesome. Randolph thanks so much for being with us today.

Randolph: Well, thanks for having me it’s a topic near and dear to my heart indexing and maintenance and stuff like that so I’m glad I got an opportunity to speak to you guys about it.  

Episode 80: Containers

When was the last time you deployed code and then found an issue with the version of a stored procedure, a setting, or even a service pack?  While we data folk don’t generally have as much trouble with environment issues are our developer counterparts, it can still be a big deal.  Spinning up a VM may not be such a big deal anymore; however, most of us still have to request one and wait.  What if you could take care of the OS and the SQL Server and not worry about setting up another development environment?  Our guest today is Andrew Pruski and he talks to us about how he is using containers to support his environments and the flexibility it provides to his and his co-workers.

While the Linux containers seem to get lots of love, one unique thing about Andrew’s setup is he is running Windows containers with older versions of SQL Server.  What is cool to me is there are tools out there that can help us folks running windows get up and running without having to wait on our infrastructure to upgrade to Windows server 2016.  If you are using containers, I would love to hear about it.  Use the hastag #sqlpodcast and let us know!

Episode Quote

“Containers getting more and more popular with more diverse software that you can run in them.”

Listen to Learn

  • What containers are and how SQL Server can live in them.
  • The benefits of containers
  • How Andrew gets containers to run on Windows
  • Resources for more information

Andrew on Twitter
Andrew’s blog
Windocks
SQL Server Containers

About Andrew

SQL Server ContainersAndrew Pruski is a SQL Server DBA with over 5 years in database development/administration. Originally from Great Britain, he now works in Ireland. He is a regular speaker at SQLSaturday events.

Transcription: Containers

Carlos: Andrew, welcome to the show.

Andrew: Hi guys! Thanks for having me.

Carlos: Yes, so one of the reasons I reached out to you is ultimately our topic today is containers. And you had put out a series of posts on your blog about containers so I wanted to have you on because I think containers are going to impact database administrators here in the coming months if not years. I heard the analogy, and most of what we’ve been hearing from the containers kind of come from the Linux side but it seems like the Windows side is starting to speed up there as well. The analogy that I heard was that containers are to environments much like VMs were to Servers. And just kind of carving them up and making them their own little islands if you will so that you can then move them around much easier. That is what kind of the container idea brings to a specific application, and in this case we are talking about SQL Server in the container. Is that like a fair representation or understanding of containers? I guess why won’t you start as off and take us through kind of that definition of what containers are and how you’ve been using them?

Andrew: Yeah, definitely we’ll say that’s a good analogy. Their own horizon especially the Linux has been out for long as more established technology but the Windows side is coming up fast. And in, I’ll just say, months or years we’re going to be looking at containers getting more and more popular. And more and more diverse with the amount of software that you can run in them and especially with Microsoft announcing support SQL Server with their containers. It’s going to be an interesting time in next year to see what happens. I’ve been looking at containers as opposed to VMs is if you have a lot of different apps that you want to run in an environment. Maybe you have a VM with only apps running but you have the same app that you want to run multiple instances of then containers can definitely be the way forward for you.

Carlos: Right, so it’s kind of that encapsulation of an application that I can then move the application around.

Andrew: Yup, definitely this apps will give you the portability features very very good. I mean, with the SQL Server I think the main advantage that you’re going to get with containers is the ability to spin up an instance at a very very short period of time. So if you need it quick throw away instances into SQL Server, do some development, or testing on containers they’re definitely be the way forward.

Carlos: Right, so talk to us about how you’ve started integrating containers into your environment?

Andrew: My QA and Dev Departments have a couple of old servers sitting in the backroom somewhere that they’ve installed hypervisor on and they’ve got access to it. They’ve constantly building DMs, testing their apps and then blowing away the VM, and they do this every single time. Each time they do that they’re also installing a local instance of SQL Server. Now, the servers they’re running on exactly high spec, we’re talking 72K disks, CPUs that are five years old, probably older so the install time for these VMs was taking up to an hour. They did installed SQL and then start restoring all that databases from baseline kept in TFS and this process wasn’t exactly reliable either. When it did fail, they look, they have tuning to error logs and what not. They’re looking at this process and always thinking, you know, there has to be a better way but what to do. I mean this process has been in place for awhile I didn’t know there are other options to give them. Until probably about this time last year I start reading about containers and what options are there. I started to look around and got into companies that are out there, Docker being the main prevalent one. I started researching into containers and how SQL can integrate with those. What I basically did was I went off and had a word with a couple of my system admins and ask if there was a spare box lying around that I could provision, wiped it, installed an OS on it and then got the containers up and running. The idea is instead of these guys building their VMs, deploying their apps and then installing SQL Server. Because they deploy their apps to their VMs but then they run, the older apps they contact to container that spun up on a separate host, our container host. And the benefits that we saw were, well, we went from an install time of 40 minutes to an hour to spinning up container in about two minutes. That’s a new instance of SQL Server with all our databases available. Bear in mind they are small databases but the same database that we’re using for the install. And so from 40 minutes to about 2 minutes, that’s a huge saving.

Carlos: Right, now does that require an application change. You mentioned instead of pointing them to the database they could point it to the container. Does that mean that the application, instead of giving them a Port 33, excuse me, 1443, 1433, yeah and server name that you’re giving them a container name now. There’s another layer there?

Andrew: No, so what we did was one of the prerequisites I have was I didn’t want them to change that process too much. All I want to do is remove the install of SQL Server and have everything else exactly the same. What we did was I’ve written a partial script that goes off and creates the container from calling the Docker daemon. And then it creates a bunch of aliases, SQL client aliases on the application service because all our applications use DNS entries to contact the apps, to contact the server so we don’t need, so instead of using DNS the client alias overwrites DNS entry with the new container port number, so the app doesn’t know anything about it. The app still thinks it’s talking to an instance of SQL when really it’s talking to a container.

Carlos: Got you.

Steve: Ok, so then to take an example like what I do is I work on a Database Health Monitor Application and part of that is testing it on lots of different versions of SQL Server everything from 2005 all the way through 2016, and different installs from Express, to Standard, to Enterprise. And to do that I have a number of VMs that I’ve spun up and then I also have stacked instances on each of those VMs. So it sounds like if I was using containers instead of having all those VMs and stacked instances I could just have a container that I spin up with that test environment. Do what I needed to do and then turn it back off again. Am I following this correctly?

Andrew: That’s exactly, yes. Instead of having your instance there you could literally just kick off, say a little script that you’ve got to say create me a new container from a saved image that I’ve got especially for this one test purpose and off you go.

Steve: Ok.

Carlos: So again, the components there are, so there’s a host server, right? That host server is going to have an operating system and then I’m installing the containers on top of that. And my application could be a container and my database is a container and they can talk to each other.

Andrew: Yes, absolutely, so instead of having another layer with a hypervisor there. So instead of having the hypervisor the container is running on it that you have base OS and then you have your container daemon running that would create your containers for you which basically is the service that you call to say spin me up this container, spin me up a container.

Carlos: Very cool! And so we talked about Docker being kind of the leading technology there. It seems like as we’ve mentioned all the love was kind of going to the Linux world but you’re doing all of these in a Windows environment.   

Andrew: I am, so one of the things that trip me up at the start of the year was great. So we get in some support for containers on Windows which is actually brilliant. I think Microsoft has announced that partnership with Docker way back in 2014. So we’ve been all sitting on how it’s going, come on then when it’s going to happen. But it start to progress in the recent SQL Windows Server 2016 that the community technology preview, and Windows 10 as well if you’re inside of track preview which have the Enterprise Edition you can run containers on Windows 10. And so yeah, as you said, Linux was getting well of, and we’re finally start to see some of that come true now. We’re getting the same functionality that these guys have been playing for awhile.

Carlos: Right, so now what are the apps or components that you’re using on the Windows side to get that done because it seem like you’re really just talking  about from an operating system perspective it seem like Windows Server 2016 was the operating system where that finally arrived.

Andrew: Yeah, so that’s the operating system that Microsoft announced first supported feature. It is actually an in-built feature that you can switch on when you install the operating system so it’s just like anything else like .Net and things like and you’re away. You could be up and running with it in less than 10 minutes.

Steve: So when you say less than 10 minutes is that less than 10 minutes for someone who is a pro with Docker or less than 10 minutes for someone who’s just getting started.

Andrew: Well, it literally is enable the feature run. I think it’s two scripts and you’ve got your service up and running so it’s very very simple.

Steve: Ok, so then today I know if I’m working with VM to sort of make my own environment. I remote desktop to that VM, I do whatever I needed to with it and then I could shut that VM down later. But with containers then is it similar that you can connect to that container and then you have a choice of saving what you’ve done or aborting it and going back to that original state.

Andrew: Yeah, so what you can do is you initially connect to the Docker engine, Docker daemon, and you can create your own base images from templates so you have the base OS which I think Windows core on 2016. And then you can get SQL image, download that and spin up your SQL container. Then what I did was install a lot of databases, shut that container down and save that as a new image that I could then create new containers from, so instead of having to restore databases each time and to each container when I spin the container up those databases there are already in the state where I wanted them in. 

Carlos: Well, that seems like a slightly or I guess I’m not sure that I quite understand that either because the databases, I thought what the databases were not actually part of the container. Now maybe the data on that database, I thought they were separate like I can move my container around and those database files are not part of the container.

Andrew: Yup, ok the portability feature of the containers is something that I hasn’t actually played with that much so I’m looking at getting into. What I’ve been mainly concern with is I want an instance of SQL up and running very quickly. The software that I’m working with which I mentioned a bit called Windox is I don’t think you get that portability functionality because it is obviously dependent on the I/O subsystem. One thing I have tried is that with the Windows containers, the ones that support on Windows 2016, the database and the SQL instance are all wrap up in the container which is I/O independent. I need to double check that I’m afraid guys but you can basically push your image up to the Docker hub, save it there and then pull it down unto a different server and the databases will all be there.

Carlos: Right, now again that’s what I’ve been seeing from these demos, all these part of what they’re talking about. It seems like from an application perspective it’s a little easier to understand and grasp. But from the database perspective it seems like there’s a little bit of leap there still.

Andrew: Yes still, again very very new, I mean obviously that’s stateless so doesn’t really matter where they live and then not depend on the I/O subsystem. But of course with SQL Server, so depend on the I/O subsystem getting you head around how it actually all working in the background.

Carlos: Right, and you also mentioned your Win Docker tool and one of the things you like about it, the reason you’re using it besides the fact we’re on Windows was it helps support previous versions of SQL Server.

Andrew: Yes, so that was another thing that sort of trip me up from the start as well when it comes to this new software that’s been available, new functionalities available on Windows Server is what you’ve got at in the Docker hub at the moment is Express Editions of 2014 and 2016 and then full edition of the next edition of SQL Server. Which I think all the features are available that are available on Windows Core, so it’s pretty fully pledge database engine there. But again it’s a higher version of SQL. I wouldn’t say anyone is using particularly right now. There’s no point in getting my QA and Dev people testing on the higher version that I have in production. We’re running Windows Server 2012 R2 with SQL Server 2012. As far as I know we have no plans to upgrade in the next year anyway so what I did is I have to go out and start looking for it. Is there a software solution out there that will give me SQL 2012 running containers? I eventually found, yup, this company called Windox who have basically built a custom port of the open source software that Dockers put in the market which allows earlier versions of SQL server to running containers on earlier versions of Windows Server. I think it is 2008 upwards, both cases. I mean, I was pretty tough of that because it was exactly what I needed so I went off and attached to them and they’ve got a free Community Edition which I went to grab to ask them. Touch my system admins and grab my old beat up server, installed Windows Server 2012, followed the Windox installation instructions which again was pretty simple. They’ve even simplified it more since I had the first run of it. It’s basically a partial script to get the engine up and running, and once you’ve done that you’re good to go. That has allowed us to get SQL Server 2012 in containers on Windows Server 2012 so I got my Dev and QA Teams now working containers down the same level, and patch level as my production environment.

Carlos: So what’s the difference between the Community Edition and in the paid version?

Andrew: It’s basically the number of containers that you can have.

Carlos: Ok, so what’s the Community Edition limit you to?

Andrew: I think it limits you to two. Literally it’s just a soft gateway to show you how the functionality works and what you can do.

Carlos: Got you.

Steve: Ok, so in that solution then, if you’re downloading the container with the SQL Server version to match what you’re using in production. How is the licensing then done on that? Are you then having to put your own license codes in for SQL Server?

Andrew: We’ve used the Developer Edition so we neatly side step all of that.

Steve: Oh, perfect.

Andrew: I think the way it’s going to work with say like Standard. I think it would be nuts to go with Enterprise in the container at the moment but I think it would work the same way as how you would license your host and you can build from there. 

Carlos: Yeah, I think that’s a great way to make that adoption and to get comfortable with it is to using containers in those lower environments because they do tend to move. You want to spin them up, you want to take them down. You want to shake and bake them a little bit more than your production environments obviously, and that’s what actually containers are really really good at as well. That’s an interesting approach.

Andrew: Definitely treat the containers as throw away object. If you go out and use this one don’t spend time restoring databases into it, fixing things. Blow it away, run a script, build new container.

Carlos: Right, so now what happens if you wanted to add a new database to the container? What’s that like?

Andrew: It handles and works exactly like a normal instance of SQL through Management Studio. You don’t get the agents so that’s the only thing you will see. But you create a database and if you wanted to then replace your base image, you stop that container, blow away your original image and save that as new image. And you’ve got an updated image with your new database. I mean that’s part of one of the reasons as well that we went for containers instead of deploying to VMs. Because the guys we’re deploying our databases from base lines back up files basically kept in TFS, and then applying a load of updates to each and every database to get it to the level of production. Whereas what we’re doing is now is every week we have a certain amount of releases and at the end of the week I deploy all our releases to our base image, update that image, blow the old one away, save a new one. And then next week can all blow the containers away, build new ones and in a couple of minutes and they’ve all got up to date containers exactly the same as those environment.

Carlos: Got you.

Steve: So then for that development environment you’re then able to get those containers, sound like up to date a lot quicker than if you had like even standalone SQL Server Developer Edition installed in every developers machine. You are able to just grab that latest package or latest container, and then everyone is on the same baseline rather than each developer maybe getting out of sync somewhere.

Andrew: That’s it, yeah, so we can say to them shut down, you’ve all got your Dev VMs pointing to the containers. Your containers are now out of date, blow these away, build new ones, and you’ve got, every single one of you has got a replica of the exactly the same image which is up to date with production.

Steve: Yeah, I could see how that alone would just save a massive amount of time.

Carlos: And confusion.

Steve: Yup.

Andrew: The deployment worked on my VM but it’s now working on yours. Why is that? Because there is a mismatch there which we’ve now managed to get rid of.

Carlos: Yeah, exactly. How many times have you heard that? Right, well works online.    

Steve: Yeah, that’s one of the standard developer lines I thought.

Carlos: So I guess we’ve talked a little about some of the pieces that you’ve used. I guess anything, any other advice as far as getting up and running.

Andrew: I mean it is very simple to get it up and running. Bear in mind that this is brand new technology, we’ve had a few issues with integrating it into our own processes. I think the bit of advice I would give to anyone looking to go with containers is don’t go with containers just because you want to use containers. Have a look at your businesses processes and practices and see how they’re going to fit in there. Don’t try and work containers into them if it’s not going to be right because they are great technology to work with but you really need a business case for them there.

Carlos: Ok. But I guess I’m going have to say that I think particularly that scenario that you just provided that almost everybody has that circumstance with their lower environments. Well, I guess assuming their development shops, right. If you’re installing SharePoint for example and that’s your application then maybe that wouldn’t necessarily apply. But I think most development environments where you could make a case for that.

Andrew: Yeah, agreed. I would suggest have a look at and see what they can do for you. I’m very guilty of which technology I want to use and I want to play with it. But I really have to put out business case to get this in. It’s been working pretty well. I mean, we’ve had some issues to the fact that sometimes the containers are up and running and it’s new and it has crashed, and then we have to redeploy and do things a lot. But we get into the habit of the throw away objects. Get rid of them, rebuild and you’ve got another one straight there.

Carlos: Exactly.

Steve: Alright, so then I mean, sounds like containers have been around with Linux for much longer than they have with Windows, obviously because this is brand new Windows. But are there any real conceptual differences between containers in Linux versus containers in Windows other than the obvious operating system differences?

Andrew: I won’t say there are any conceptual differences. I think the main thing is that, as you said, they’ve been around on Linux for a lot longer so that technology is a little bit more mature and some of the features. I think some of the networking stuff isn’t available on the Windows side yet but it’s really just a case of just watch this base I think.

Carlos: Right.

Steve: So then on the networking side, I know if I spin up VM with hypervisor it’s doing this virtualized network interface for that VM. Is there something similar for containers where they’re getting a virtual network interface or they’re using the network card of directly of the OS?

Andrew: I think they used the network card. I’m not too sure I’m afraid actually there. The way I’ve been working with the containers as each one of them, each SQL instance listens on its own specialize port that you can specify when you create the container, so you can map the host port to the container port and when you’re connect them locally or connect on host port. When you connect them remotely you’ll get your individual pullage specify when you create the container. So that’s the one thing we did when we’re building the containers and the aliases that I mentioned is that we use aliases because you can’t use say like host because you can’t specify port. But with aliases you can say connect to this server on this port and that takes them straight into SQL instance.

Carlos: Right, so that mapping actually happens in the container where you can specify when the incoming port is this send it here.

Andrew: Yes. So it’s just a little switch when you specify say, run, create a container from this image, this port to this port, off you go. It does mean there are differences when you connect say, if you’re on the VM where the host is running, where the containers are running. You’ll be connecting all different ports as you were if you were outside of that VM.

Carlos: Ok, so then for someone like me who has not used containers before but it sound interesting and I want to go and give it a try. Is there any advice that you would just recommend to watch out for or be aware of on that first time giving it a try?

Steve: Well, I think if I was going to do that, just to jump in here Andrew, I think also obviously on the Windows side, right, you have Windows the Community Edition to going to get started. And then I would jump to Andrew’s blog and start there.

Andrew: I mean there are blogs there I’ve written from Part 1 to Part 4 that will take you through initial setup and then playing around a little bit. And then actually working with the Docker hub as well which will allow you to upload and pull repositories down and things like that.

Carlos: Alright.

Steve: And that’s side is dbafromthecold.wordpress.com. Is that right?

Andrew: That’s it, yes.

Carlos: Yeah, and we’ll have that available in our show notes as well which will be available at sqldatapartners.com/containers.

Steve: Perfect. Well, I think that wraps up the questions I’ve got about getting started.

Carlos: Ok, awesome. Well, I mean so great conversation, great, you know, interesting technology. Thank you for coming walking us through some of these use cases that you’re using it for. I think particularly on the Windows side, because admittedly that’s where I feel more comfortable. I’m interested in trying Windox and kicking the tires there, and see how it goes.

Steve: Yeah, thanks for having me guys. There are a couple of things I’ll come back to on especially about the network adapter. Those are interesting questions although I wasn’t too sure. And also I do also want to double check the upload and download of containers into the Docker hub and see if it will actually just keep the actual states of the SQL instance or it will take the databases with you. So I can come back to them if you would like to.

Carlos: Alright, yeah, that would be great.

Steve: Or I’ll have to keep an eye on your blog and see if it shows up there I supposed.

Carlos: So shall we go ahead and do SQL Family?

Steve: Yeah, let’s do that, so how did you get started first using SQL Server?

Andrew: I started using, well the first job I got was this local government coming out of university, and it was writing Crystal Reports. I didn’t have any idea with the backend was. I was just writing SQL queries. I thought it was Oracle at first, it was actually SQL Server. I was doing research on, you know, I was writing, and things were running slowly and I start doing a lot of googling around how to speed things up for. So basically how to do certain things with SQL I didn’t know. I eventually stumbled at sqlservercentral.com. I’ve found this huge wealth of information about this whole entire industry of professions that I’ve had absolutely no idea about. And just started peak my curiosity from there and eventually decided that I wanted to become a DBA.

Carlos: Very cool. How then did you get your first job as a DBA?

Andrew: I think completely out of luck. I was very very inexperienced. I was applying for jobs and eventually got into with a place called the United Kingdom Hydrographic Office. The interviewer there gave me a chance and gave me my first job as a DBA. That was about 5½ years ago now.

Carlos: Very cool, ok. Now if you could change one thing about SQL Server what would it be?

Steve: I want to have dark theme for Management Studio but it’s not going to happening anytime soon

Episode 79: SSIS Catalog

Although the SSIS Catalog DB is created with management studio, it does not behave like other databases.  Our conversation in this episode revolves around the catalog, SSIS packages, and some of the complexities of migrating packages.  Steve and I are excited to  chat with our guest Andy Leonard about his thoughts on the catalog and how this feature provides some really interesting benefits for ETL architecture.

Episode Quote

“[The SSIS Catalog] is really there to facilitate a lot of Enterprise functionality that is needed in any data integration engine you use.”

Listen to Learn

  • The benefits of project deployments
  • Some of the issues migrating packages
  • Why restoring the catalogdb can be problematic
  • The various levels or components of a package
  • How you can view reports without going into SSMS

Restoring the SSISCatalog DB
Integration Services stairway series
WorldWideImporters Sample Database
DILM Suite

About Any Leonard

A former chicken farmer, Andy is now is a Data Philosopher at Enterprise Data & Analytics, an SSIS Trainer, Consultant, and developer; a Business Intelligence Markup Language (Biml) developer and BimlHero; SQL Server database and data warehouse developer, community mentor, engineer, and farmer. He is a co-author of SQL Server Integration Services Design Patterns, and author of Managing Geeks – A Journey of Leading by Doing, and the Stairway to Integration Services.  He lives with his family in Farmville, Virginia.

Transcription: DILM Suite - SSIS Catalog Viewer

Carlos: So Andy, welcome to the program.

Andy: Thank you!

Carlos: We want to have you on the show ultimately one because you’re pretty close to Richmond. And of course you’re Richmond famous if nothing else for helping start the local group and we’re very appreciative of that. You’ve been doing quite a bit in SSIS, written several books on it, been kind of putting together this catalog, this DIML Catalog which we will talk about here in a minute. But ultimately, I guess our conversation today is going to revolve around the SSIS Catalog. So let’s go ahead and maybe jump into it and kind of give the 101 overview of what the SSIS Catalog is and what it’s going to do for us?

Andy: Well, first I want to thank you, Carlos, for that warm introduction. I’ll just say that there are a lot of people involved in the Richmond, Virginia SQL Server community and just all of our technical community. We have a great collection of really different communities there from JavaScript across to the .Net community, to SQL Server. And as you know there’s a lot of interaction between the groups. And I’ll just also give you props for the amount of work and the amazing work you’ve done in the past few years after stepping in. I stepped away. I’ve been an attendee and a person who’s been enjoying a lot of the advance there that Utah put on recently. So thank you for that, it’s not just us. There’s a huge team of people involved and we could name names but they know who they are. So with that said, I’ll jump into a Catalog 101. And this aptly time, Kent Bradshaw, a friend and co-worker at Enterprise Data Analytics. He and I earlier today put on a free webinar where we talked about this very topic, an introduction to SSIS Catalog. So what the catalog is, it’s an SSIS framework. And what do I mean by that? It manages SSIS package execution, logging and configuration. And before the SSIS Catalog was introduced in SSIS 2012, a lot of people wrote their own, and I was one of those people. But it’s really there to facilitate a lot of Enterprise functionality that is needed in any data integration engine you use. It doesn’t matter if it’s Informatica or SSIS or one of the others. You always need some kind of framework. You need something to manage execution, something to log, something to manage configuration, so at a very high level that’s what the catalog is.

Carlos: Now, that’s an interesting topic. I didn’t realize that you could use the Informatica packages with the SSIS Catalog.

Andy: You cannot but Informatica has its own functionality built in, its own framework, so it does a similar set of functions as the SSIS Catalog.

Steve: So then, with the SSIS Catalog, people can still run SSIS packages without it but you get a lot more functionality in the logging and capabilities there if the catalog is installed and used. Is that correct?

Andy: Yeah, absolutely can Steve. Microsoft, the way they built SSIS starting with 2012 is they realized they were offering this big shift in paradigms for execution and they made the existing way of operating. They made that backwards compatibility, a very easy to accomplish inside the new, I guess the new paradigm. So what they did was they gave the old way of executing packages a name. They called it package model and the package model deployment, and that is the old way of doing things. And then they created a new default the project deployment model. And so in order to execute packages in the catalog you need to have them in that project deployment model and that’s a property of the Visual Studio solution, the SQL Server Data Tools or SSDT solution that you built. And they really created this really very rich variable boost. And I’ve never seen it failed once, wizard for walking through. First, importing packages and two, SSIS 2012, 2014 or 2016, and they come in as this package deployment model if they are earlier than 2012. And then, once you walk through that wizard they’re now in the package deployment model, and then after that if you want you can convert that project to a project deployment model and deploy it to the catalog and run it there. But you’re right Steve you can do everything the way you’ve been doing it for 10 years if you’d like even in 2016.

Steve: However, that sound like there’s a lot of benefit to using the catalog and not doing it the old way or the package deployment way.

Andy: There are some benefits to doing it that way, I’d say logging. And really the way the logging engine has been architected for the catalog that was a big step in the right direction. Microsoft decoupled a lot of things right at that logging works and just because of that you get package performance boost anywhere maybe from 20% to a couple 100%. So you can see in just a performance boost by importing the projects into, say, running in the file system to running in the SSIS Catalog, and again, it’s all around that logging.

Steve: Interesting, that’s a good point to know then, so if you’re having performance issues it might make sense just try moving them into the SSIS Catalog and see if that improves things.

Andy: Absolutely. You know, a lot of people schedule jobs in SSIS especially if you’re running again SQL Server and if you’re using SSIS you have to have SQL Server. Although the engine itself is free and you can develop and run in the debugger anything you’d like. In order to have it run outside of that development environment you need to have a SQL Server license on that machine, you know, that’s always been. But certainly, if you pull SSIS projects that are pre 2012 or even if you’re running 2012 projects, 2014 projects, and package deployment. Pull those into the Visual Studio Designer. You do the upgrade right there in Solution Explorer, you convert to a project deployment model and then you can right click the project and deploy it right over to our catalog. Yeah, you’ll see performance improvements. It’s not just the logging but the logging is probably the largest one. Probably the next largest performance boost you’ll see is they’ve got a different execution engine that they built specifically for running SSIS packages that are executing in the catalog. So it runs a little faster which isn’t surprising. And when the catalog executes packages, it executes it in that. And I don’t know if you had this experience, but I have. I’ve built a version of something, and then comeback a few years later, been asked to upgrade it, make a new version. And that second version is always better.

Steve: Oh, sure. Yeah, you all know the mistakes for the things that you would do differently the second time around. And it sounds like that’s what they’ve done, is that they’ve just done a better job that second time around.   

Andy: I think so. There’s spots in it where it could improve. And that’s part of the definition of software development, right? Anytime you build a database or piece of software you can always improve it. And I’m sure, I know for a fact talking to some of the folks involved with the project they’ve looked back and said, “Gosh, yes, we should have done that, we should have done this”, but seriously kudos to them. It is a stable engine and it does what they built it to do, and it does it well I think.

Carlos: I am getting flashbacks to the 463 Exam Prep with all these talk of converging from package to projects deployment.

Andy: Yeah, I mean it’s involved. But once you’ve done it a couple of time, Carlos, I don’t know if you had that experience but it’s really not hard. The wizards are, you know in my opinion the wizards are work of art. It’s well architected and it’s something that is really easy to, you know, the kind of fall out of a project plan. The importance of those conversion wizards or either upgrading projects or changing them, they’ll go back and forth. You can go to a project deployment and back to a package. Not only that but the deployment wizard that they built for deploying stuff into the catalog, again it’s another snappy piece of curd. Very, you know, there’s a lot of Enterprise features in there. And if nothing else, I’m really glad we’re talking about this because I don’t think people realized just how well some of that software works and how flexible, and well architected, well designed it is.

Carlos: Yeah, when we talk about the flexibility, I think particularly that’s the big benefit I’ve been able to see is the ability to change the environment parameters. And then have it execute in different places without actually having to then create all these copies of the project of those packages and then have version issues and all that kind of stuff.

Andy: It’s definitely more aware of lifecycles then in any previous version. There are still some corners there when you start thinking about it in the lifecycle management perspective especially for Enterprise software. You know, some of it is just, some of it is not Microsoft’s fault or anybody’s fault. Really, it is things like the nature of XML. So you know that just any XML engine, any document object model that is semantically managed if position, relative position in a file for instance doesn’t really count. What counts is where it is in the tree. And so when you start doing things that you would expect in an application lifecycle management like compare between two versions of a package. You may have identical versions of a package functionally. But internally the text maybe moved around because it is XML. And from a semantic perspective it is identical but coming up with a good semantic compare is not easy.

Steve: Yeah, and I think that’s one of the interesting challenges that I’ve come across and trying to do source control around SSIS packages is that you committed to source control and then it tries to do a text based gif and you end up with a lot more gifs than what you might have expected because things shifted out in the XML.

Andy: It is very challenging. Yeah, and even the act of deploying an SSIS package to the Catalog alters some of the value that are stored in there. Some of those are just insignificant things like there’s a last deployed field that maybe updated. But there are some other things that actually happened to the package XML itself for instance the package protection level is set to a value that in 2012, 2014, 2016 does not appear in the list of protection levels. So in the old days there were six and they were, don’t save sensitive. It’s all around sensitive information so there was don’t save sensitive, encrypt sensitive with user key, encrypt sensitive with a password, encrypt all with a user key, encrypt all with a password, and then the sixth one which isn’t in the new stuff. In the old days there was one called the server storage. And that primarily existed so that you could set your packages to be encrypted on server storage and that was only used when you deploy to the MSDB database which you can still do that’s part of package deployment. But what they did when they create a project deployment is they made the SSIS Catalog and the SSISDB database. It uses encryption and in order to create it you have to supply a password that will solve the encryption. In order to restore it you also need that password and that we’re going to talk about that in a minute. But what happens is in the act of deploying it changes whatever protection level you set for the development environment into the equivalent of server storage. But again that’s not in the list. You can’t pick that as one of your options on your side but the catalog will switch whatever of those five you pick. It will change it to server storage and it’s using SQL Server native encryption, transparent data encryption. And, you know, that’s pretty rock solid. I know a lot of people have moved to that and I was absolutely thrilled when SP1 came out. At the time of this recording we’re doing, we’re doing this at the end of December 2016 back in November at the MSConnect Conference when they announced SQL Server 2016 SP1. They made encryption available across all of these additions and so much more. It was really nice to see that.

Steve: Prior to that announcement the TDE option or transparent data encryption that was only available with Enterprise. Did that you mean that you had to have Enterprise edition of SQL Server to use the SSIS Catalog in this encryption mode?

Andy: Yeah, that’s a great question, Steve, you did not. They had encryption, I don’t know how they did that but you had it in Standard. What happens when you create an SSIS Catalog is there’s a backup of the SSISDB database that shifts with SQL Server. It’s actually still there if you go look. Any place that you’ve checked the box for integration services in your installation if you go dig through program files into DTS directory. You’re going to find an SSISDB.bak file. And what you’re doing with the GUI that they supply in SSMS when you create a catalog you have to enable SQL CLR. It has to have that if it’s not already enabled because all of the execution and a lot of validation and other functionalities that occur in the catalog is actually happening down the .Net framework so there’s a lot of SQL CLR stored prox in there. So that part has to happen and you have to supply a password because you’re going to be using that encryption.

Steve: Ok, so that’s a good transition I think for us then into the topic of restoring an SSIS Catalog database. And I think that’s one of the things as a DBA a lot of people might be really familiar with normal backup and restore but they may have never done it with SSIS Catalog. What kind of thing do you need to know or you do differently when you’re doing that?

Andy: Well, there’s a great article out in the MSDN, and I think there’s a companion article over in TechNet. Both say basically the same things. So they walk you through this, this process, there’s in a lot of ways it’s like restoring any database that uses encryption. So whenever you’re using encryption you have certificates, and you have passwords and all of that sort of stuff. The catalog has all that as well. In addition to that, there are some users, some logins that you need to create to fully restore to operational status in SSIS Catalog. One of the things that I get calls, it’s a tragic call, is someone will have lost the database and they will have been backing it up for years, and SSISDB is out there and it’s all backed up. And they say, “I’m having trouble, I restored it.” You can actually restore it without the password you can do a standard restore. And it will come in there and a lot of your metadata will be press and you can get to it and read it. But when you start executing packages from that, or when you try to export a package for instance, or deploy a package to that catalog all of a sudden you’re getting this errors you never saw before. And what’s happened is the encryption is broken. And in order to make that work and work correctly you have to follow those instructions out there at the MSDN and TechNet sites. I wrote a blog post about it, and I know you guys sometimes include blog posts. I blog at SQLblog.com, and If you go to SQLblog.com and click on my name and then you can search for “restore SSIS Catalog”. I took it from the perspective of what it’s like when you try to deploy an SSIS package or project to a catalog that’s been restored incorrectly. So I started with those error messages and kind of work through, “This is the problem you have now and here’s how you can solve it.” I included my template script that I use all the time. I gave it away but it creates the users and the login you need. I put dummy passwords in for a bunch of stuff and you just go in and everywhere I mark places where you need to update it because I don’t you using the same passwords. But if you do that it restores at about 5 seconds. And you’ve got a, well, an empty one restores about 5 seconds let me say it that way. Your mileage will vary. The more stuff is in the catalog your mileage will vary but the key here is really understanding the importance like so much in our field. Like you see, like the password for the encryption. You’ll think I never need that again, right? Export the keys, export the encryption keys and you see these warnings all the time for years with Reporting Services. You know, all of these products that are great but you can really leave your Enterprise in a lurch if you haven’t done these best practices.

Steve: I think it shows there is the difference between, I won’t say a normal database, but maybe application database one that’s kind of being dig in to the SQL Server Management Studio that is so tightly integrated with several of those components that it kind of require its own special handling.    

Andy: It really does, and you know, for the longest time and I’ll still say this. In fact, I said it today on the webinar, it’s just a database. The SSISDB database but I would hang in Astrosoft that and say that, you know, with encryption and then you kind of need to hang another Astrosoft that says that with the special set of users that are happened, you know, the logins happened to be hard coded into some of the interactions that happened between the database and the applications especially those .Net framework classes. And they’re specific to each version of the catalog by the way so you can’t mix and match two much with it. Yeah, you can get yourself into a world of hurt there just by skipping a step and it’s a very easy step to skip. And like I said, I’ve had the tragic calls from clients saying, “How do I hack this password out of here?” And I’m like, “I don’t know.” I sent up someone like you guys, and say it’s now a database issue. These guys can help, if anyone can help, they can, that’s over my head.

Carlos: The links that you mentioned earlier will be available on our website for the show notes today at sqldatapartners.com/ssiscatalog. We’ll make sure to put all of those components there.

Andy: Awesome!

Steve: Now, one more thing that you mentioned when we were talking before we start recording the show was your Data Integration Lifecycle Management or DILM Suite work that you’ve been doing. What exactly is that and who would want to use it?

Andy: Well, it’s great timing, Steve, to bring that up at this point in the conversation because an option to remembering all of your old passwords and stuff that you may or may not have available is to export this metadata. And the SSIS Catalog has a slew of metadata associated with it. Carlos mentioned earlier about environment variables and parameter values. You can store all of these configurations, metadata in the catalog itself. And as Carlos said, if you setup a lifecycle where say, you’re moving from dev, to test, to hopefully QA, and then production, something like that. I know years ago when I worked at we ended up setting up six of these tiers all the way through. And, you know, the more the merrier. It’s actually a good way to do it. I imagine you can get crazy with the dozen or something but. I always say, “You need a minimum of three tiers in your environment.” And you want to run as close to production in each one those as it makes sense to. And different people are going to define what makes sense differently. Here’s what 10 years of experience with SSIS has shown me makes sense. I want three environments at a minimum because I want to be able to build it in one environment. And I’m actually going to deploy it to the catalog in that environment and I’m going to run local unit test and integration test there. I’m going to look for functionality. I want to make sure that the rows are moving and any kind of transformation is happening as I want. Then I want to deploy that to something other than production and here’s why. When I build it I’m going to design it with all of these defaults configured. I’m going to have default connection strings pointing to my either to my local laptop, or a virtual machine, or some development server that I’m login to a client. I’m going to have everything pointed natively to that. If I deploy it straight from there to production and I have forgotten to externalize something, maybe a connection string or variable value, a path. Then I’m going to figure that out when I try to run it in production and that’s called testing. And we don’t want to test in production. Certainly, any application but definitely not databases. So want to send it to some third location that’s not production, that’s not where you developed. Anyone who tests it there makes sure that it does what you wanted to do. So when you start making that transition there are really five things you need to think about in an SSIS Catalog.  There are the folders because there’s a logical separation of objects in there. All projects have to be in a folder. All environments have to be in a folder. We just talk about environment variables. There are projects and packages of course and those are really managed well by IS pack files. Wrap up the deployment in a nice little container. You double click it, it starts that wizard. I was going on about it earlier, I like that wizard. Those are managed well, that’s number two. So we’ve got folders and projects. And projects and packages are mostly together. The third thing is those environments that Carlos mentioned earlier. They have a collection of environment variables inside of them, that’s number three. Number four is really some catalog specific overrides. In Catalog they are called literals. And the fifth thing is really a reference, and this is how you get environments and projects and parameters to talking to each other. So think about projects and packages, they both can have parameters in them. And then over here we’ve got environments. And you can’t see me but I’m doing hand signs for this. Maybe I should turn my camera on. Like a reference is really a bridge between this project and this environment. And then beneath that you can like you have environments and environment variables you have these little attributes that I call reference mappings. And what they do is they connect the environment variable to the project parameter or the package parameter. So if you could think about the hierarchies kind of going on there you can, if you’re talking all the way down to say a package you can have something that’s kind of linked through about six or seven layers.

The project and the package Microsoft covers well, migrating the folders not so much. And as you get down to environments and the references, the environment variables and the reference mappings, it really the story from a lifecycle management perspective just sort of falls apart. But to get to the environments you have to double click on an environment or right click any properties. And you may have a couple of dozen environment variables in there, data types configured and values configured. Some of them maybe encrypted. There’s a sensitive bit you can set for every value individually. So you could choose what’s encrypted and what’s not. And you open that up after you’ve configured it. It’s different when you first configured it. Well the DILM Suite, when I started working on that, I started working on something else called frameworks. And I’ve got built frameworks forever but I was working on an interface for my framework, so GUI. And I realized about after three weeks, you can’t move this metadata in the catalog. There is no clean way to do this. And I needed that working before I could do anything with frameworks so I wrote a product called “SSIS Catalog Compare”. If you go to dilmsuite.com, there’s like six projects on the page. That’s the only one that I sell. The rest are either open source or are just flat out free. And one of them is called “Catalog Browser” where it’s a tree view. It will show you everything that is associated with those SSIS projects deployed to the catalog. It will show you the reference mappings, it will show you the references, it will show you environment variables and their values and their data types, it will show you environments. And if you’ve ever looked at what is in the node in SQL Server Management Studio. You can get to everything I showed there but you have to open different windows to see everything. And this just presents it in one view. And that’s one of the free tools out here called Catalog Browser.

Steve: Ok, so the Catalog Browser and Catalog Compare those are then applications that are built using the DILM framework. Is that correct?  

Andy: Yeah, that’s a fair way to say it.

Steve: Ok, so if somebody wants to do something that was maybe different but wanted to access those same pieces they could download and use the DILM Suite. And that’s your community edition, right, so people could write code to do that. But if they wanted to jump in and use the Framework Browser or the Catalog Compare they could just download those or purchase those from you and use them as well.

Andy: They certainly could. So what Catalog Compare can do is it’s really built to manage that metadata. It’s built to facilitate and support Devops. Anybody practicing any kind of lifecycle where they are promoting code, you know, stage control is a piece of this. Source control is a piece of this although I’m not touching source control. I’m just looking what’s in catalog 1 and catalog 2. And I’m not so much focused on the piece that Microsoft really does a great job on which is the projects and the packages. I’m focused on the rest of that stuff, the folders, the environments, environment variables, the references, the reference mappings because like I said, it’s look like they got to about that spot and stop for whatever reason. And without using some of these very clever and cool solutions that other people have developed out there. You have to either write your own clever and cool solution or you’re doing a lot of copy and then paste. But the new version is coming out in about, I’m hoping some time in the next few weeks. I’m testing it now. It’s going to include a command line interface. And everything in the Data Integration Lifecycle Management Suite is all built around this idea of achieving a piece of the puzzle so that we could do continuous integration with SSIS. That’s really what I’m after. It’s not going to have all of the pieces. I’m never going to be able to touch testing because frankly Pragmatic Works led you test does a fantastic job of testing SSIS. And they’re hooked right into the testing harness that comes with Visual Studio, John Welch and Brian Knight and the crew down there have done a bang up job. I don’t need to build that, that’s already built. But what I’m hoping to do with this command line interface, one of the commands is DCD for Deploy Catalog Differences. So Carlos, you may be developing SSIS packages on your laptop, and Steve you may be developing on an Integration Server in the office, and what I want to have is the ability for you guys to run this script every night that says, compare this to the catalog in this integration server set over here, and then just move whatever is different there. So whatever I have that I built today that since the last time that the script ran go ahead and deploy those services to this integration server. Now we’ve got that piece of the continuous integration going, right? We’ve got a target server where everything syncs up. So imagine that piece and then imagine running, having some way to automatically execute SSIS packages and then have tests to go look at the results of those executions and provide feedback right into TFS or whatever the testing framework can talk to in Visual Studio. And you’ve got a big piece of the picture now, right, for continuous integration.

Steve: Oh yeah. I can see how that would really save a lot of time in getting that continuous integration environment setup there for your SSIS work.

Andy: That’s what we are hoping for. I kind of mentioned in passing you need to execute the packages. Well, if you’re looking at the dilmsuite.com website, that SSIS framework community edition is a free framework that will actually do that. You have to do a little more work than if you pay for the commercial edition of the framework or the enterprise edition but that’s what it’s there for.

Steve: Wow, good stuff.

Carlos: Very cool!

Andy: Thank you! Yeah, thank you, trying to solve that problem.

Carlos: You know, that’s right. So thanks for the conversation, Andy. Should we go ahead and do SQL Family?

Andy: Absolutely.

Steve: So Andy, how did you get first started with using SQL Server?

Andy: So it was in the 1990s, and I was doing manufacturing automation at the time. On one of my projects we were storing the outputs of tags as they’re called in Human Machine Interface software in an Access database. I believe it was Access too. And to stress test it, I turned on all of the tags, it was over a thousand. And I set the software to record the value every second, and I left for a holiday weekend. I came back and there was a 3GB MDB file on the drive and when I tried to open it wouldn’t open. I went to altavista.digital.com and I typed in Microsoft database and searched and I saw this thing called SQL Server. This was about 20 years ago, maybe a little more. Carlos, I don’t know how long you’ve been around Richmond area but back then they used to have this computer and software sales and shows over at the showplace in Mechanicsville. I went over there, I’m going to say this and then, because it’s been 20 years. I think the stature of limitation is over. But I purchased an NFR copy of SQL Server 6.5 for like $20. It was like a week after that and I took it down to this plant and we installed it and I did the same test and it succeeded. I was able to open up Enterprise Manager, I believe it was Enterprise Manager, maybe it was Query Analyzer, I was able to open up stuffs and still run queries against the tables and get data out of it, even though it was 3GB of data. When they saw that that worked, that the promises were true, they bought a licensed copy. Actually they bought 7.0 because it was out. It just come out I believe and we upgraded, and of course tested it again, it run. That’s what got my toe into water with SQL Server.

Steve: Interesting. I’m amazed on how often we hear that, started with Access it didn’t do what we wanted to do so we did something different from there and that’s how we got to SQL Server. It’s been a pretty common theme along the way with other guests as well.

Carlos: If you could change one thing about SQL Server what would it be?

Andy: It would iterate better. So all software has corners right? Everything has something that it does really really well and something it doesn’t do so well. I do training, Tim Mitchell and I actually deliver the SSIS immersion events these days for SQL skills, and one of the questions I’ll ask to students is, are cursors bad? What do you think? Are cursors bad?

Carlos: depends on the usage, right, your mileage may vary.

Steve: I know, in my experience I’ve seen some cursors used very badly. And I’ve seen some cursors that have been used more appropriately and they’ve worked well. However, they’re not as efficient as I’d like to see them.

Andy: And that’s why I’m going with whole thing about iteration. It’s not just about cursors. Anytime you’re doing any kind of looping inside of the engine. Again, you know, Microsoft love it when I say this way, cursors aren’t bad, SQL Servers are bad at cursors. I feel that’s accurate and I get why the advice is out there, you know don’t use cursors. I had this experience about 10 years ago where or maybe it was more than that, but I was converting some Oracle code over from PL/SQL to SQL Server. And there were some cursors in it that were like three layers deep and they were nasty. And at that time, I knew as much as I do and I promise I’m not picking up on anybody because I’m still learning. But I looked at that and went, “This is just wrong. How could this ever perform?” And we had a test server set in there with it and I ran it and darn men it flew. You know, I rewrote it obviously. I think that may have been my first experience with, and the names of the tables flew out of my head just now, but it was Jeff Moden wrote about it. Back about that time I was looking for it. Some kind of tables, but I’m getting old I can’t remember what things are called. But basically you create a little lookup tables and you just get what you wanted they’re lightning fast.

Carlos: Tally tables.

Andy: Tally tables thank you. I can never remember what they are called. Yeah.

Steve: There we go.

Andy: Yeah, Tally tables, so I have my first experience with those but we had a solution that worked and I checked that box and move forward. Yeah, that would be my answer to that question. I’d like to see it do that. As long as they can do that without breaking any of the other cool stuff because SQL Server doesn’t awfully a lot really well.

Carlos: That’s right.

Steve: So what is the best piece of career advice that you’ve ever received?

Andy: Oh gosh! I’ve received a lot of really good career advice. I’ll share what probably impressed me the most recently. Do what you are, and you know, I’ve been a big believer of doing what you’re passionate about. I modified that belief since listening to Grant Cardone and other speakers talk about this advice where they say, “Become passionate about what you’re doing.” And I agree with that more. I think I was very fortunate in that I’m able to make money doing what I’m passionate about, technology. And, you know, it kind of just fell into place for me. But there are a lot of folks out there who want to get into technology, they’re not particularly passionate about it and they come into the field kind of hesitant and resistant and afraid. And I’ve been tailoring most of my career advice these days to those people and saying, “You know, find a way to get passionate about this.” And I’ve been sharing some of the things that I like and I don’t like.

Carlos: Andy, our last question for you today is if you could have one superhero power what would it be and why do you want it?

Andy: Oh wow! That’s a toughy. I’m a Batman fan, ok, I’m 53 years old. My wife got a picture of me in a kiddie pool when I was about 18 months old with a Bat mobile toy, ok. I am a half century into Batman. I have always been a Batman fan, but the truth is Batman isn’t so much of a superhero. Well, I’ll say it this way, he doesn’t have like those powers like I think you’re asking about here. He just train really hard and he has that discipline to do all that and that drive him to his tragic, it did drive him to that, been a huge Batman fan. But I don’t think he qualifies as having a superpower so I’m going to say that first because first and foremost I’m a Batman fan.

Carlos: There you go.

Andy: I would want to fly. I think Superman’s powers are cool but I would want to fly. You know, I like the idea, I like what Dr. Strange does, right, with folding time and being able to manipulate stuff. But as I’m saying that I’m thinking about what I just said about failing and I will probably blow up the universe. Probably not a good choice after all. Maybe, you know, my third choice, I’ll go to speed. I’m not a fast person at all in any aspects, mentally or physically. And I’ve always been a really slow runner and a poor athlete. It would be cool to run as fast as Flash. I want to stay with that one.

Steve: Nice I like that.

Carlos: Well, Andy, thanks so much for joining us in the program today.

Andy: Thank you guys! It’s been a great conversation. I hope it turns into a fun show for people listening. I really appreciate the opportunity. I think what you guys are doing is great. I’ve listened to several of your podcasts. I haven’t listened to all of them but it’s great to listen to people and learn more about what they’re thinking. And I love the SQL Family questions, those are awesome.

Steve: Alright, well thanks, Andy.

Andy: Thank you guys!

Episode 78: Conversion to SQL Server 2016

As database administrators, we will all upgrade our environments at some point; however, we don’t normally have the opportunity to upgrade to the next version of SQL Server before it becomes available to everyone else.  In this weeks episode of the podcast, Steve and I chat with Brian Carrig about the journey channeladvisor took to implement SQL Server 2016 in their environment, that it was like working with the SQLCAT team, and how they go about making use of some of the new features.  Brian shares with us some of the struggles they were having along with how the 2016 version helped address these issues.

 Episode Quote

“SQL 2016 is probably the version of SQL Server that has had most attention paid to performance improvements in quite some time probably since the 2005 release. They, I believe, I won’t say they promise but it’s not atypical to get 20% performance gains just right out of the gate.”

Listen to Learn

  • What it was like to work with the SQLCAT team
  • The features channeladvisor was after, but also how they went about using them
  • How the in memory options are helps with bursting
  • Why Brian how to create a connect item and what it deals with

Brian on Twitter
BWIN in memory OTLP whitepaper
Using memory optimized table variables
Memory Estimates for memory-optimized tables
Connect Item to store query store in a separate filegroup
SQL Server setup checklist in Github

About Brian Carrig

Conversion to SQL Server 2016Brian Carrig is a Microsoft Certified Master of SQL Server and manages a team of talented DBAs at leading e-commerce cloud solutions provider ChannelAdvisor. In a previous life, Brian spent some time as an academic and holds a PhD in Computer Science. He is a native of Dublin, Ireland but now lives with his wife and two daughters in Cary, North Carolina.

Carlos: So welcome compañeros, this is Episode 78. I am Carlos L. Chacon.

Steve: And I am Steve Stedman. And today’s guest is Brian Carrig.

Carlos: Yeah, Brian is the DBA Manager over at ChannelAdvisor. He is currently living in Raleigh, North Carolina. But you’ll notice an accent. I know that he is not from that area. It’s great to have him and we’ll be chatting with him a little bit about the 2016 features and some of the issues that they have or issues/challenges perhaps, experiences that they’ve had in rolling 2016 out and they’re actually a preview customer so before it was released they had access to it. And so we’ll be going through that. Couple of things we wanted to talk about first and that is I wanted to remind everyone that our SQL Server checklist is out on GitHub and so if you’re interested in contributing to that we’d love to have you. Again, that’s an opportunity there for you to contribute kind of give your ideas in two cents and make it a little more available for others as things continue to change. We’d love to get your experiences there as well.

Steve: Yup, and you know what I think is nice about that is if we get more input on that and update that we’ll be able to put that checklist out maybe a couple of months or once a quarter which is here’s the update with everyone’s suggestions. And perhaps build one of the best SQL Server setup checklist out there.

Carlos: There you go. Yeah, it’s interesting. There are still lots of even though people aren’t necessarily installing SQL Server. It’s still something just to reference because as they’re going in and reviewing environments. It just kind of, “Ok, am I setup correctly?”

Steve: Absolutely, and I’ve found that, I mean a lot of those setup items get missed when a SQL Server initially gets built. People who set it up are not always the ones who know the right things to do for the SQL configuration.

Carlos: That’s right. Ok, so with that let’s go ahead and welcome Brian to the show. Brian, welcome!

Brian: Alright, thanks!

Steve: Yeah, Brian, good to have you on the show.

Brian: Yeah, good to be here.

Carlos: Yes, thanks for coming in. I know we’re all a little bit under the weather. The winter weather has kind of come and then it’s worst and so thanks for taking a little bit of time with us. So ultimately our conversation today is about your experience in migrating to SQL Server 2016, and what that experience was like and some of the challenges that you had? So I guess, first, set us up with that story, what ChannelAdvisor is doing? I guess why did you feel that need to upgrade to SQL Server 2016 before it was released?

Brian: Ok. Well, just to give some background I guess not everybody will know who ChannelAdvisor are and what we do. So we’re an e-commerce software solution provider. We’ve been software solution since 2001, since probably before the name existed. What we do is we whelp retailers and brands to sell online, right. Marketplaces like Ebay, Amazon, Walmart, Jet. And there are a lots of different, there are actually hundreds, possibly thousands of these marketplaces throughout the globe. Basically, we help companies to sell across these marketplaces, manage their inventory and all that kind of stuff so that they don’t have to worry about writing API’s for the various marketplaces. They can just interfaced with us and we will manage their inventory, manage their online sales, digital marketing, all of that. And so we power about I think 3,000 retailers.

Carlos: Oh wow!

Brian: Yeah, with some pretty big customers, Macy, Samsung, UnderArmour, Staples. Yeah, there are quite a lot of big names in there. Before I moved to the US, you mentioned my accent, when I go home to Ireland people say I’ve lost all my accent so. Apparently not.

Carlos: Yes. Isn’t that the case. My dad has the same way. My dad is from Costa Rica. He came here when he was 15, lived here for 30 some odd years. Goes back to Costa Rica now and people are like, “Is Spanish your first language?” He was like, “Yup.”

Brian:  Well, when I got home they say I sound like an American now so.

Carlos:  You’ve been Americanized.

Brian: Yes. So we recently just went through what we call Cyber Five, right at thanksgiving that whole weekend Cyber Monday. And during that weekend we[00:05:00] will, last year we were close to a quarter billion in sales. I won’t say this year’s because I believe the results are not actually official yet. I don’t want to get in trouble with the SSC. But so far as to say we do a lot of processing during that weekend and that’s kind of what drives our uses of SQL 2016 or why we wanted to move to SQL 2016.

Carlos: So that’s your peak usage time?

Brian: Yes. Yes.

Steve: So then, given that that’s the peak usage time, I would assume that probably in July or August you’re really starting to ramp up for that. Is that about the right time for it?

Brian: Yeah. Generally, we’re trying to get the infrastructure in place probably around July or August and kind of gearing up then for the load in the 4th quarter. And then based on just natural growth, we might see a bit of a dip and Q1 of 2017 say. And then we start to ramp up just from natural growth and we kind of get ahead of that. And SQL 2016 is probably the version of SQL Server that has had most attention paid to performance improvements in quite some time probably since the 2005 release. They, I believe, I won’t say they promise but it’s not a typical to get 20% performance gains just right out of the gate. So that’s what interested us. We’re quite sensitive to cost even though we have quite a large SQL infrastructure because we make money when our customers do. So if we just go crazy and expand and deploy a huge amount of infrastructure we can impact our own profitability.

Carlos: Sure. Now, has any of that, I mean using cloud technologies or is it still On-Premise.

Brian: Pretty much everything we do is On-Premise. We do have a footprint in AWS. We have some customers that where we run SQL Server in AWS for because we have geographic restrictions on where their data can reside so we leverage AWS’s data center in Ireland actually to run their workload.

Carlos: Ok, got you. So you’re an early adaptor before it was available. I guess, what was that process like?

Brian: It was fantastic. I can’t compliment to SQL CAT Team enough. It was a real eye opening experience. I once have had terrible experiences with Microsoft support prior to these but this is another level entirely. So yeah, it was really enjoyable.

Carlos: Well, so you mentioned you are looking for 20% increase so that’s the marketing was telling you, right? So you get it in. You get the SQL Server 2016 version. How do you go about to testing that and kicking the tires to see what’s that going to do for your environment?

Brian: Right, that is difficult, right? Currently, we have our 118 SQL Server instances in production. I think it’s close to 3,000 databases and our environment peaks at about 1.6 million transactions per second. It’s really really hard to get a dev environment that reflects that. But you know, we do what we can. In particular, we knew we had some problem areas, and one of the most In-Memory OLTP. We’ve been using that since SQL 2014. While the performance gains were incredible when the features that’s kind of limited in terms of what data types and SQL syntax you could use. We also had a lot of stability issues with In-Memory OLTP particularly around the checkpoint engine. So we have cases, sorry go ahead.

Steve: Oh no, I’m just going to say what type of issues did you run into with that that’s interesting?

Brian: Oh ok. I think I found the connect item on this or somebody did. It was fixed in a hot fixed. But we had an issue where if you run out of transaction logs space. So basically your transaction logs filled and even before Auto Growth could kicked in. The checkpoint thread for the In-Memory OLTP engine would die and fail to
respond. So the only way you could kind of get that process to restart was that take your database offline and bring it back online which in highly transactional environments is problematic.

Steve: And given at that checkpoint files really the only place that new data is going to exist. I could see how big of issue that might be.

Brian: Yeah, so the first time it happened we did not notice and the transaction log on the server was 600 or 700 gigabytes and it did not get truncated until it filled because of this. It would basically have a full transaction log that was 600 gigabytes in size. It ended up being quicker to restore the database from backup rather than first taking the database offline and having it start and go through crash recovery.

Carlos: Oh wow!

Brian:  Yeah. That’s a lot of role for it.

Carlos: Yeah, exactly, scenario you see every day. Now that’s on the 2016 environment or that’s 2014?

Brian: That was on the 2014 environment. They actually fixed it in the service pack for 2014 but I guess that was the point of which we were kind of looking around and saying, you know.

Carlos: Ok, we need something different.

Brian: Yeah, we heard that they had rewritten a lot of the engine for In-Memory OLTP for SQL 2016. So that really prompted us, apart from performance gains. That was another thing that kind of prompted us to go, “Hey, we’d really like to get into 2016 as quickly as possible.”

Carlos: Ok, so then, I guess any other changes to the In-Memory objects if you will. Or what is really just like I needed a new version?

Brian: There was the stability side but also availability of constraints. I think that using In-Memory OLTP has become a lot more viable now that constraints are available in terms of foreign keys, and default constraints, and unique constraints.

Carlos: Now, it reminds me, so the foreign keys, the In-Memory objects. Can I have a foreign key from an In-Memory table to a disk? What are they calling them? What do we call them, not In-Memory table?

Brian: Disk based tables.

Carlos: Disk based tables. Yeah.

Brian:  I actually do not know. I have not tried that.

Carlos:  But you’re talking about foreign keys between two In-Memory tables?

Brian:  Yes.

Carlos: Oh yeah, ok. I wasn’t sure. I didn’t think that was an option and I thought ok maybe that had changed. So ok, so then I guess can you tell how many In-Memory tables do you actually have?

Brian: Right now? I want to say 3 or 4. So it’s not extensive. It could be more. Kevin designed a lot of them so he’ll probably correct me after this confessional.

Carlos: Another excuse to get Kevin back on the podcast.

Brian: There you go. He can back and explain everywhere where I was wrong. I want to say 3 or 4 of the regular In-Memory tables that’s in the database. But what we’re using extensively now and I don’t know if enough people know about this is you can create a table type that is memory optimized and use this as a table variable. Either as a table variable in the procedure or, you know, if you use TVP you can pass it in as a TVP. We’ve seen some incredible performance gains from this. And it’s really simple to implement. I think I have given an example where we reduced the CPU consumed by a procedure by about 75%.

Carlos: So when you’re doing that at that way basically you’re kind of skipping TempDB for the type of jobs or the type of tables that would normally go into TempDB and you’re doing it as a memory only, In-Memory OLTP table, is that right?

Brian: That’s right. It doesn’t touch TempDB at all. So what we were finding after we deployed SQL 2016 and we got those kind of performance gains that we’ve mentioned is that like always here. You’re just shifting the bottleneck to somewhere else so where we start to see some stress was contention in TempDB around reading and writing from the system tables. And the way we looked at reducing that was basically take some load off on TempDB by using memory optimized TVPs.

Carlos:  Always the big system. How do you take it to account the memory
considerations now? Because I mean obviously one of the things with memory optimized objects is that you have to account for that In-Memory now which, you know, previous to that it was just, ah we’ll allocate, you know, 85% or whatever the percentage was of the memory to SQL Server and be done with it. Now we have to carve that up little bit. Can you talk to us about how you decided to do that or what were your considerations there?

Brian: Yeah, certainly. So if you read books online, they say for your fixed tables, right, not temporary objects that you should allocate about 3x the size of the table. But really what it depends on is what your frequency of insertion and deletion from that table is. So basically how much work does the background garbage collection process need to do and how much overhead do you need to handle all the raw versions essentially. And typically if you’re looking at In-Memory you probably have a really high frequency of insert and delete and update type operations.

Carlos: Right, that’s why you’re using it, right?

Brian: Right, it wouldn’t be a problem if your table wasn’t heavily access to. You wouldn’t be looking at this. And also, you know, concurrency because the wider and more sessions that are doing this the more versions that will be held In-Memory. So we have found that in cases you might need to allocate as much as 10x. Yeah, and we talked to a company called bwin.  These are like the pioneers of In-Memory OLTP. They’ve worked a lot with SQL CAT. They’ve got some great white papers out there about using In-Memory OLTP to really boost your performance and the amount of transactions that you can support, and they had similar experiences.

Carlos: I admit I don’t have extensive experience with the In-Memory objects but knowing, because that’s what the recommendation is playing for 3x but you’re seeing a 10x increase. Are these for the DMVs or what are you using to then monitor how much memory the tables were actually taking up?

Brian: So I use Resource Governor and there are also DMVs that will tell you how much memory it is using. And one of the tricks that you can do and I recommend doing this because in a lot of cases your big concern is that you will run out of memory and you will cause the server to crash. If you have just memory optimized objects in one database you can use Resource Governor to bind that database to a resource pool and fix the amount of memory that it can use. So if you’ve run out of memory that you’ve allocated to that database, queries will fail but your server will stay up.

Carlos: Will stay up.

Steve: That’s a very creative approach there. I think often times resource governors really gets downplayed a little bit is not being that useful. But I mean that sounds like a great way to be using it.

Brian: I love Resource Governor. There’s not many people that use it. But we use it extensively here.

Carlos: Interesting, that might be another topic because I agree. It seems that Resource Governors kind of on the wayside but scenarios like that do make it very appealing.

Brian: Yeah, because we’re  and we have a lot of different product teams that often use the same database infrastructure. We use Resource Governor sometimes without doing any actual limiting or governing of resources. What it does is it splits your plan cache. So if you have happened to have teams that use procedures in different ways. They get their own plan cache and you can reduce parameters anything issues.

Steve: Oh, interesting. So another question around the In-Memory OLTP and this is kind of the concern that I came across as I’ve been exploring it, is that and this probably comes from my background with database corruption and checkdb and things like that. But checkdb and checktable won’t check In-Memory OLTP tables. And I guess is that something that you’ve run across or have problems with?
Brian: We have not encountered any problems with that. Yeah, it’s definitely a concern. I mean, the tables that we use for In-Memory OLTP are largely can be taught of as like staging tables, ETL processes. And obviously we extensively use the
memory optimized TVPs. And just to talk about hopefully not to kind of frighten some of your listeners too much, if you’re just using the memory optimized TVPs so these are all temporary objects, you probably won’t even notice the amount of memory that’s been consumed by In-Memory OLTP because they’re short lived.

Carlos: Yeah, exactly, it will be almost like Temp tables, right, you know, in a sense.

Brian: Yes. But no, we haven’t encountered any type problems there. We have encountered corruption on our disk based tables but nothing in the In-Memory OLTP space yet.

Steve: Well, ok, that’s good news on the In-Memory OLTP side.

Carlos: Yeah. So I guess that’s maybe an interesting point we can potentially take us off the line. But how, so the corruption on the disk happens, you know, I guess there’s something get switched there. As long as the database knows what records are there, I guess how could corruption occur there. So your memory would have to become corrupt. It would then affect more of your hardware.

Steve: Yeah, you mean with the In-Memory tables, Carlos, right?

Carlos: With the In-Memory tables, yeah.

Steve: Yes, so I think the thing I’ve seen and I experimented with a little bit is if you have corruption in one of the checkpoint files where all of the changes are tracked in that In-Memory OLTP table to the checkpoint table or checkpoint file. If one of those is corrupt you’re not going to know it until you, and the only way you’re going to know it is when you try and back it up, either with a full or differential backup.  And at that point you’re backup is going to fail, and it will tell you that the, it usually checks on there that you’ll see. And then you can’t actually backup the database and your only point in that, or the only thing you do with that point is go to a previous backup version of that database.

Carlos: Do you have anything else you want to add there, Brian?

Brian: I mean the checkpoint files are really just used for recovery purposes, right, and also for the backups. So in that scenario where you encountered at the backup has failed and you have some kind of storage corruption, presumably your actual data will be fine, right?

Steve: Well, if you’re pulling that data from another table that is a disk based table, yeah it will be great. But if it’s going into In-Memory OLTP table that data may only exist in that checkpoint file until it gets backup.

Brian: Oh, ok.

Carlos: Got you. Yeah, crazy.

Steve: Yup, so. It’s definitely, I mean In-Memory OLTP is definitely a huge performance gain there. And it sound like that’s what you’re using it for and you’re definitely seeing those results.

Brian: Yes. I mean, I think our plans for 2017 definitely involve more In-Memory OLTP.

Carlos: But that’s not the only feature that you’re using in 2016?

Brian: No, another feature that we’re using extensively is Query Store.

Carlos: Right, and this made waves recently because they had enabled it by default in Azure SQL database.

Brian: Oh, that’s right. Yeah, that’s right.

Carlos: So they maybe, you know, who knows in the SP2 it might become available. This is a pure conjecture. This is not an announcement. But I could see that coming in future versions being enabled by default. But currently it’s not.

Steve: Yeah, I could see it and even during the preview top program. They were changing the defaults almost on a monthly basis. I don’t know if anyone was paying a lot of attention to the CTPs but almost each month they kind of they got tweaked.

Carlos: Yeah, I know when I talked with Borko, you know, the PM about that. They we’re looking at those usage patterns. They wanted to get that feedback, see how people were using it and then being more responsive if you will.

Brian: Yeah, for us because we have a lot of adhoc queries. They switched the cleanup the queries that are not frequently used was important.

Carlos: Ok, so I guess talk to us, so I guess we should review quickly what the Query
Store option is. And for my perspective, right, it’s basically tables that are inside the same database that is going to keep a record or a history of all the executions that have happened on your database for further analysis. Is that fair?

Brian: Yeah, I think that’s fair. I think Borko describes it as a flight recorder for your database. And I think that’s a really good description.

Carlos: Yes, that’s right.

Brian: So it keeps a record of all of the queries as they execute and then the plans that they use. So even though, probably the primary use case is that, right, looking for plan changes that are problematic. There’s a lot of rich data in there that you can use for other things as well.  Even cases where it might be queries that are aborted, or cancelled, or timed out for some reason that’s all data you can pull out of there.

Carlos: Right, I know, from the DMV perspective, you know, that you can only see either cumulative totals or the last execution counts. And so that scenario where, “Oh, this morning I was having a problem but now it seems fine”, you know, not much that you can do about it now.

Brian: Yeah, so we typically, prior to Query Store we would typically run Adam Machanic’s sp_whoisactive every minute and dump that to a table. So we would have some kind of history of what’s executing when a developer or a customer comes to us and says, “What, you know, there was an issue an hour ago or two hours ago. What happened?” Right, it’s really hard to tell. So that’s what we use that for and that Query Store is kind of like a more enhanced version of that.

Carlos: Ok, interesting. So, I mean, you talked about aborted queries, you know, that you’re starting to look at or I guess understand better. Other things that you’re getting out of there that maybe you haven’t thought of before?

Brian: We use plan guides and once extensively but we have certainly areas where we use plan guides and if anybody has worked with plan guides to force plans, I mean Query Store is amazing, right.  It execute a procedure, you pass in two parameters and your plan is forced. None of this messing around with like huge blocks of XML or trying to capture the correct plan from the cache.

Carlos: Ok, so now you’re actually saying I can go back into Query Store. I can capture the plan. I have two plans, one is good, one is bad. And I basically say, “I want to use this one.” And that’s your plan guide?

Brian: Yeah. It is a force. It’s not, “Hey, would you mind using this one?” You’re forcing it to use this plan.

Carlos: Sure. Well, that’s kind of an interesting option and I say this only because I was just writing test prep questions and so I’ve been looking at force. Force is kind of an interesting option because any plan is really a force, right. You’re telling it, “Hey, you need to use this.” Instead of what you think you’re going to use.

Brian: Sure, but you’re not giving the optimizer a choice. So the optimizer is being told, “Here is your plan, execute that.”

Steve: And I think that’s a much more direct and guaranteed way of using plan guides rather than saying here’s the plan guide and here are some suggested hints to use.

Brian: Yeah, certainly what we have found is that we’re often using hints to try and get to the plan that we want. As you say, this is more direct. Here, I know the plan, I won’t use it.

Carlos:  Right, right, right.

Brian: I mean that should be a pretty rare scenario. Usually the best thing to do is to leave it to the optimizer to figure out the best plan. But there are always those cases that you have to deal with.

Steve: Yeah. I’ve seen that where, like, if somebody adds a bunch of hints into a query that then make the query behave poorly you can then use the plan guides to override those bad hints that were given to the original query and if you can pull that information around the Query Store that would be extremely useful.

Brian: Yeah, and that you could use it to address the same problem that plan guides largely used to address. Cases where you can edit the query to put hints in there so you use plan guides to force the behavior that you want, you can use Query Store in the same way. So if you deal with a lot of ISVs and you’ve got issues with parameters anything this could be a way to address it.

Carlos: Very good. So the last feature we want to talk about is the column store indexes.

Brian: This is one you’re definitely going to have Kevin back in for. He is our store guru.

Carlos: Yes, so one thing I’ve always, I guess because the example that they frequently
give for the column store indexes is like state. Right, and like ok, I get it state makes sense. I’m constantly grouping by that, I’m reporting by that. You know, things in that nature so that makes it a good for column store index. But outside of that, like, it might be just basically looking for common things that I am either grouping by or reporting on? And that’s where I want to go for my column store indexes?

Brian: Possibly, I mean, one of the things that maybe kind of gets overlooked a little bit. It’s just how good the compression can be on column store. And a lot of your performance gains might be from that so, we had a case where we had 4TB fact table that was reduced to about 600GB in size.

Carlos: Wow!

Brian: I’m making it, yeah, clustered column store index. Also you can generally reduce the amount of non clustered indexes that you need when you basically re arranged your table as a column store.

Carlos: And I guess that’s the new feature in 2016 is I can now create that as a clustered index whereas before I couldn’t.

Brian: Yeah, you can basically mix and match now so you can have clustered column store with non-clustered V3 indexes. Or you can have, probably less common, you can have a regular V3 clustered index with non-clustered column store indexes on the table. Yeah, and one of the gains now as well with 2016 is batch mode execution for column store.

Carlos: Right, now batch mode meaning being able to update and write to.

Brian: Yes, it will execute in batches and you can get considerable performance gains there.

Carlos: Right, very cool. Very cool! Well, awesome. It’s always good to talk with other folks about why they’re doing certain things? How they’re doing them? You know, a lot of times even on our podcasts sometimes we get kind of carried away. Well, I think we always try to dive in to why we’re doing that but discussions like this help bring the problem a little but more to the front and then how we are going about solving that. Should we do SQL Family?

Steve: Yes, let’s do SQL Family then, so one of the first questions there is keeping up on technology? How do you keep up with technology and all the changes that are happening specifically around SQL Server?

Brian: That’s kind of easy in here because we have so many people working here who are really into SQL Server. So everybody can be everybody else’s filter. So everybody is kind of looking at blog posts and so forth. We have a very very active user group in Triangle area. So yeah, basically just conversations that work will usually keep you firmly in the loop.

Carlos: Yeah, it’s interesting. Brian, you’re actually the third person from ChannelAdvisor we had on the podcast. We’ve had Mark. Mark Wilkinson was actually the first, and then Kevin, and then now yourself. Yes, we run into your folks from time to time of course being on the East Coast.

Steve: Yeah, I think you’ve got like, maybe 10 more SQL professionals to go if you want to get the whole set.

Carlos: Only 10, there you go. How did you first get started with SQL Server?

Brian: In a previous life, I was a Linux administrator. Mainly working with MySQL, and the company I worked for at that time took on a project. They didn’t have any in house developers so they took on a project where the project was going to be developed in Delphi so it made into the SQL Server backend and nobody was around to manage it so I did.

Carlos: Ok.

Brian: I started from there.

Carlos: Alright, and then if you could change one thing about SQL Server what would it be?

Brian: Apart from the licensing cost? I will say, TempDB, and I think that’s a major bottleneck for SQL Server still. I would definitely like the ability to allocate TempDB on a per database basis.

Steve: Yup. Alright, so little bit of background then there, Carlos, this question about changing one thing with SQL Server. Where did that originally come from?

Carlos: Yes, we reached out to the audience and asked them what questions should we asking and Brian came up with that one. And it’s been interesting to get, it’s been a good one, right, lots of different thoughts there.

Steve: Give credit where credit is due.

Brian: Thank you! I only realized before the show. I didn’t necessarily have an answer
myself.

Carlos:  Yeah, it’s much easier to ask the questions than it is to answer sometimes. What’s the best piece of career advice you’ve ever received?

Brian: Alright, well so I’m going to credit the leader of the user group in Dublin for this piece of career advice, a guy named Bob Duffy. Some of you in the SQL community have probably heard of him. And the advice was that I should sit the Microsoft Certified Master Exam. When they announced that it was going to be closed, I kind of assume my chance was gone. He was the one who really said no just go for it. And I kind of squeaked at the end there.

Carlos: Wow, ok, very good.

Brian: Early days before it finished.

Steve: So if you can have one superhero power, what would it be and why would you want it?

Brian: Ok, I was going to say the amazing power to reduce SQL licensing cost. I don’t know if that’s. But actually now that I think about it, I think flight would be good. I’ll fly over back to Ireland and, you know, when you’re paying that ticket for four people it kind of adds up. So if I can only fly everybody over with my superhero power that will work out great for me.

Carlos: There you go.

Steve: Yup. And I will be looking for that new comic book of the new superhero that reduces the SQL Server licensing cost.

Carlos: Marvel, if you’re listening, it was our idea first. Ok, well Brian, thank so much for being on the show today.

Brian: Thank you, happy to be on here.

Steve: Yup, thanks Brian. It’s been fun.

Episode 77 Reporting Improvements SQL Server 2016

We invited Paul Turley on the show to chat with us about some of the new features in SQL Server reporting services, but were pleasantly surprised to have the conversation take a different direction.  Paul talks about some of the new training available for those interested in visualizations and why this community can be hard to define–and connect with.  There are so many visualization options and even within Microsoft we have many products that overlap in functionality.  In addition to talking about the pick list items of new features, Paul gives some guidance and why you should choose certain products and what makes good use case scenarios for some of the new features.

With the new analytics features now available in SQL Server 2016 via SP1, I think there is going to be additional requirements for data professionals to provide better analytics features and this episode is a good starter for how to frame those conversations.

 Episode Quote

[There is] “this challenge where the industry is very fragmented. It’s moving in different directions because businesses are adapting technology in different ways.  The reason there are different tools because there are different needs and disciplines. ”

Listen to Learn

  • Why it can be hard to connect with others in the visualization community
  • Some of the new features in SQL Server 2016
  • Why choosing a visualization tool can be difficult
  • The initial architecture of reporting services
  • Good candidates for mobile reports
  • High level architecture for mobile reports
  • The new architecture of reporting services and support for multiple browsers

edX course on data visualizations
Paul’s book on Reporting Services in 2016
PASS Analytics Day

About Paul Turley

Paul TurleyPaul Turley is the owner of Intelligent Business, LLC a consultancy with dedicated Business Intelligence practitioners, visualization designers, mentors, instructors, project managers and solution architects.  He is one of the leaders of the Portland SQL Server users group.  Is both a Microsoft MVP and a Dun and Bradstreet MVP.

” open=”off” use_border_color=”off” border_color=”#ffffff” border_style=”solid”]

Episode 76 Availability Group Improvements

Availability groups provide some exciting features in mixing high availability and disaster recovery; however, from a performance consideration, there are a few drawbacks.  With the advances in SQL Server 2016, our guest Jimmy May, set out to test the features and see if they could get some really high performance out of an availability group with two synchronous replicas.  In this episode he talks with us about some of his findings and some of the pains associated with getting there.

Episode Quote

“I’ve been a big fan of flipping the faster bit ever since even before I was a member of SQLCAT.”  Jimmy May

Listen to Learn

  • What improvements have been made to the log transport process
  • Changes to the log redo.
  • A new startup parameter -k and what it does
  • Why hardware validation is still important
  • Jimmy was the guest of what celebrity from the 90’s?

Jimmy on Twitter
Get the whitepaper with all the notes from Jimmy’s test

About Jimmy May

Jimmy MayJimmy May is a SQL Server technologies for SanDisk, a Western Digital brand.  His primary skill set is related to tuning & optimization of SQL Server databases. He formally worked at Microsoft on the SQL CAT (Customer Advisory Team).  When he is not working on databases, you can find him on skiing wherever they might be snow.

Carlos:             So Jimmy, welcome to the program.

Jimmy:            Hey guys, thanks for having me. I really appreciate it.

Carlos:             Yes, it’s nice of you to come out. Always interesting to talk to the former Microsoft folks. Guys have worked on the SQLCAT team, lots of experience under your belt. So thanks for coming and being willing to share a little bit of your time with us.

Jimmy:            Sure. Thanks!

Steve:              And I know, I’ve seen Jimmy present a couple of times and I’ve always had a great time learning along the way so I hopefully will have that same experience today.

Jimmy:            Crossing our fingers.

Carlos:             And one last thing just to butter this bread a little bit more. One of the nicest people I think in this SQL community always willing to talk, kind of get some feedback, or share your thoughts, opinions with those at conferences, SQL Saturdays, you know, what have you. And so if you ever get a chance to go and say Hi to Jimmy take advantage of it because you won’t regret the time.

 

Jimmy:            I really appreciate that guys.

 

Carlos:             Ultimately we want to talk about tonight is we want to talk about your experience that you had in putting together an Availability Group, and trying to understand some of the pain points that customers are experiencing, and then trying to figure out how to get around those pain points, and kind of putting the proof in the pudding if you will at some of the advancements in SQL Server 2016.   Yeah, so let’s go ahead and dive into that. Give us a quick recap of kind of that how you put that together and then we’ll dig in to which you found.

Jimmy:            Ok, well, the goal is to, as you stated, we want to put together an Availability Group architecture that was actionable not just, we weren’t just trying to set a super high world record numbers that something we can tassel the transom for an arbitrary customer to read and implement. And along the way we, of course as you know, I work for a company that solves a lot of flash. And I’ve been a big fan of flipping the faster bit ever since even before I was a member of SQLCAT. And so that was big part of the picture and there are some challenges even on flash, storage latency, things like that. And we have a handful of lessons learned. We proved out the, really exciting a log transport improvements that were a bottleneck in Availability Groups in 2012 and 2014. No longer we get that data across the wire in a near real time. However, there is still some issues with the log we do at the high end performance and along the way also implemented some other findings. The –k startup trace flag which is something that virtually any installation can take care of. We’ll go into it more in detail I hope later but we have startup trace flag, fully supported, not well documented to a throttle checkpoints at a level that you designate and that’s pretty exciting. There’s amount of lessons learned.

Carlos:             So why don’t you take us through I guess what that setup looked like. You have a couple of boxes and ultimately you wanted to use Availability Groups and you wanted synchronous commits from the primary to both secondaries.

Jimmy:            Yeah, to setup itself was largely pretty vanilla and we do that by design. The only “exotic” pieces were some of the hardware components to make sure we didn’t run into bottlenecks. But we had three 2U boxes – HP DL380 G9s. Our friends wouldn’t like us to call them commodity boxes they’re super fantastic and amazingly performant but they were off the shelf boxes. Two sockets, we used Broadwell processors, also started with Haswells but the tragic, during the duration of the project we upgraded. 256 Gigs of RAM, nothing crazy. And we had some pretty high network throughput. It didn’t come close to hitting the bandwidth that we had. For availability, we put in two metal [inaudible – 4:15] 40GB a piece and we theme those using standard Windows theming. Pretty simple to setup and configure something which I had not a whole lot of experience but it turns out to be very easy to setup. And even a non network geek could do it with a lot of ease. So we have a total aggregated bandwidth of 80GB so if you wanted to use it. And that would serve as well want to put together multiple AGs, Availability Groups. And what else can I tell you? Oh, the data of course of, the storage, the stars of the show. We used, flash doesn’t come in just a lot of us are used to seeing the, what used to be Fusion I/O, PCIE cards, right? Well Flash comes in 2½ inch standard small form factor flavor now. And what we did was we started out with eight, decided to end up with ten 2½ inch disks and a Flash. Only 800GB a piece, relatively small, there is coming a 1.6GB flavor. And in fact, our company makes 4TB 2½ disks. It’s amazing the whole 2½ disk in your hand and you’re holding a 4TB. And very very soon we’re coming out with a 8TB disks. By the way, these were, a total of ten 800GB disks.

Carlos:             Inserted right to the box?

Jimmy:            Right to the box. Yeah, that’s one of the amazing things about these “commodity” servers off the shelf. A variety of volumes including HP. Make this boxes with 24 slots. Not a special order. You would say, “Hi, I want the box with 24 2½ inch slots in the front.” And you get it. And so think about that you put 4TB disks in there you got 986GB of raw storage. Put the new 8TB disks in there and you’ve got almost 200TB of raw storage in boxes that’s about as big as two pizza boxes. It’s amazing. It’s really amazing and you’ve got the new processors from Intel. It’s just amazing you have CPU to drive that capacity and the performance. Anyway, but we used six 800GB disks for the data and four 800GB disks for the log.

Carlos:             And then it’s all yours. You’re not going to share those disks.

Jimmy:            That’s correct, it’s a non-shared storage. Oh my! All yours.

Steve:              I’m just listening to describe that. I’m just thinking about configurations we have where we have shared storage or a bunch of virtual machines sharing the same hardware. And I was just thinking the configuration you’ve got there is certainly going way more performant than what I typically see.

Jimmy:           No question about it. Well, wait until I start talking with performance number. You know, again, a guy who used to dedicate his career to flipping the faster bit. We got some latencies that were so low that I had to double check the math. We will get into that later, hopefully.

Carlos:             Ok, so I guess, let’s go back and now kind of set the stage for some of the pain points you managed to resolve there that we’ll talk about. And so ultimately again, in our podcast we want to make sure that we kind of set the level of playing field and we’re going to try to explain some of these concepts. So with the Availability Group, you have a server. You’re setting it to synchronous commit meaning you have a second server. The transaction has to write to the first server, send it to the second, commit on the second, get a reply back and then the transaction is complete.

Jimmy:            Exactly, that we have three replicas not just two or one primary, two secondary. We actually have to get this act from two servers before we proceeded.

Carlos:             Right, the act meaning an acknowledgement?

Jimmy:           Yes, yes, before the data can be hardened or committed back on the primary log.

Jimmy:            Well, the way it work in terms of AGs is the log is flashed from the primary database. Let’s call this the main server, the primary server. The data is flashed to the Availability Group’s logs. In this case we have a simple one database one AG. Ok, and the data on that log is what is shipped across the wire. It’s received by the secondary AG logs and then it’s applied the replica data files on that side.

Steve:             So when it’s supplied on that side then that causes the write to the data file appropriately and then that log get saved as the local log file. Is that correct?

Jimmy:            Yeah, it’s the local log file. And on fail over for example that log file becomes the primary log file. The secondary, the recovery queue is emptied. The secondary is the AG does a little bit of inter manipulation to say. Preparing myself to be the new primary and then that what was the secondary becomes the primary and the user transaction transpired there just like it did originally in the primary.

 

Steve:              So then you mentioned some enhancements with SQL Server 2016 around that process of getting the logs transported then the logs redone on that end.

 

Jimmy:            Ahh! We were so pleasantly surprised. As early as far back as CTP1 we were seeing this enormous in performance improvements of getting the data across the wire. So you see historically and this is part of the hashtag that just runs faster. SQL 2016, a lot of amazing performance improvements have been implemented. I’m so truly, again, this is Jimmy May former Microsoft employee, former member of the product team, but third party now, ok. So I can say this with complete impartiality, “Huge fan of what the SQL Server product team has done.” SQL Server 2012 is when AG is introduced and there was a bottleneck. It was tragic. AGs were pretty cool conceptually, you know, an enhancement of database [inaudible 11:09] but we were limited to 40 or 50 mbps across the wire. No matter what hardware you had. No matter how many CPUs, no matter how much memory, no matter how broad your network pipe was. There was a hard coded bottleneck build into the Availability Groups. It was a visage of the database [inaudible 11:30] code buried very deeply in the valves of it. When it’s code was written way back when there was no hardware in the world that require the kind of throughput that you could get 4-5 years ago and certainly not today. So SQL Server 2012 and 2014 there was this hardware bottleneck that you just could not work around, 40-50 mbp. In 2016, they unleashed the hounds. Again, without no special tuning, same server, hardware you just use a different version of SQL Server – 2014 vs. 2016. And I’ve got some great charts. We have, in fact, I’ve got a whitepaper you guys need to, it’s part of the podcast. You need to publish the link to it that has the documentation for all these. And what I hope is a very lucid presentation. And the appendices are amongst the most valuable pieces of a whitepaper. You got the usual “yada-yada” and the body of the paper. But the appendices have some very interesting findings that a geek like me and like you guys would be interested in, including charts comparing head to head matchups of SQL Server 2014 log transport vs. 2016. Again, just changing the version of the software, version of SQL Server on the same hardware. We went from that 40-50 mbps up to a quarter of a gigabyte. Boom! And that genuinely expanse the scope of applications that are suitable. Even warehouse, ETL, any kind of index maintenance etcetera. It was easy to hit that bottleneck in 2012 and 2014. And the problem, let me back up a little bit, the problem with this bottleneck is it isn’t just that your bottleneck and you can only get 50mbps across the wire. While that data is being generated on the primary application it is queued up on the primary bottleneck waiting to get across the wire. Waiting to get, you know, inserted into the log transport mechanism. And if something happens to that primary you’ve lost data. I mean it’s possible you can regenerate it depending on your application but at a fundamental level that data is gone. And that’s not a formula for higher availability whereas now up to, easily up to 250mbps, and with little tuning you can get more, is in real time sent across the wire via the log transport mechanism hardened on the secondary logs in real time. It’s just amazing. So on fail over, boom, the data is there. No data loss, so not only do we have better performance, you’ve got better HA and then DR. Ok, so that’s the log transport mechanism. Do you want me to dive into another aspect of the AG log related stuff?

 

Carlos:             Yeah, I think we are going to talk about the log redo?

 

Jimmy:            Yes, exactly. So we’ve just been talking about the log transport and the exciting performance improvements that 2016 gave us. Log redo is the other piece of the Availability Group log related mechanisms that is required for the process to work properly. And the log redo is simply a continuous restore process. Your

data that shoveled across the wire is hardened on the secondary log and then via redo process. It’s applied to the constituent secondary replica database files. This is something you eluded a little while ago, Steve. Nothing too exotic about it just a continuous restore process. The challenge here though is that the high performance levels we are generating. The redo can’t keep up. Now the team, the SQL 2016 team, did a great job. They’ve turned it into a parallel process. That helps but it still just can’t keep up. And so at the maximum kind of levels we were doing the redo was falling behind. That’s not a formula for true high availability. Disaster recovery, yes, but in the event of fail over with if the recovery queue in the secondary are building up, when you do try to fail over that recovery queue is got to empty before the server is actually available to accept transactions. Now the good news is at this point there aren’t very many applications that require the kind of throughput we were throwing. You know, again, it’s pretty impressive. This 2U boxes, two sockets, you know, two CPUs, a little bit of flash, workload generator throwing 250mbps across the wire. Now how many applications require that kind of performance? Well, not that many frankly today. Even some of the apps I used to worked for at SQLCAT just a handful of them. So the great news is, as always the SQL Server team is very receptive. They are aware of the issue and they are actively working on it. So unlike the log transport issue which took them two versions of SQL Server to remediate, this issue, which again is not a problem for hardly anybody in the world today. It was likely to be remediated of relatively medium term before becomes a problem for the rest of the world, for your day to day application.

 

Steve:              So then just to recap to make sure I understand that bit. If you’re pushing the data through on the log transport at up to that quarter of a gigabyte per second then your log redo queue is going to fill up on the secondary side and get backlog to the point that it may take a while to catch when you fail over.

 

Jimmy:            Exactly. Exactly correct. Good summary, very peaty there. I wish I could be that peaty. Yeah, and so the log redo recovery queue just continues to grow at these high performance, at these high rates of performance. Most high performance applications aren’t get even close to that kind of a throughput and you’re going to be ok. But this is a theoretical event. It’s really interesting though, again, I mentioned the appendices in this whitepaper we just published. The appendices has this in detail. It actually shows some of the internals how you can interrogate it looking at what’s going on the secondary replica etcetera to see for yourself what’s going on.

 

Steve:              And we’ll have to include the links to that in the podcasts notes.

 

Carlos:             That’s right which will be at sqldatapartners.com/jimmy.

 

Steve:              Alright, so unto another topic. I know you mentioned the –k startup flag. Can you maybe jump into that a little bit and what you’ve found in there?

 

Jimmy:            Ok, this is one of the most exciting takeaways of this work. I know during our podcast, when we are level setting. I asked of you guys if you’ve heard about it. And the truth is very few people have heard about it. I first heard about the –k startup trace flag in SQL Skills Training during immersion, IE 01 with Paul, Kim, Jonathan, Erin, Tim, Glenn etcetera recently.

 

Carlos:             I managed to look it up on the MSDN and it’s not listed on the engine services startup options.

 

Jimmy:            It is fully supported. And it has been, believe it or not, since 2005. So we need to get that fixed. You know what we have great connect item. It is hard to find documentation. In fact, when I was first, I remember using it, I was involved again as a third party person, post Microsoft. I was invited by the CAT team come in and assist with the lab about a year ago. And I first, I saw this implemented in real life with one of the customers that do crazy cutting edge stuff. And my friend, Arvin, one of the first MCMs in 2008 was implementing it with this customer, and so I got the chance to see the behavior of –k in real life. And it’s

amazing. It’s something that I’m pretty sure a lot of places that need cutting edge performance are going to start adopting. And here is why. Even in conventional shops, you know, as a consultant for Microsoft services, you know, 10 years or more so ago. My old day job, big fast database, when checkpoint occurs, yeah we didn’t talk about it, we talked about –k. What is –k? It throttles the checkpoint to a user defined level. Many of us who have any experience in I/O or familiar with the checkpoint spikes that occur. You know, your dirty pages in the buffer pool, SQL Server at times sufficiently often to keep the database recovery on fail over to be, again, by fail over, I mean on restart I should say, independent of Availability Groups. Now, this –k is important not just for availability groups, not just for any stand alone SQL Server instances that experience challenges with checkpoints overwhelming the disk I/O subsystem. And it’s very very common, it’s in for years and these challenges can go anywhere from a spike when the dirty pages get flashed disk to a stalagmite. You know, a little bit of a pyramid, the checkpoints basically never finishes. And so 5, 10, 20ms latencies for spinning media. 1 or 2ms latencies for flash can suddenly turn into 100ms, 500ms latencies for an over long period of time. And it just hammers performance and plus you’ve got inconsistency problems, you know, suddenly the whole system during the duration of the checkpoint throttles the entire system. So in our case for example during the default behavior without implementing –k we were getting, without checkpoints, 99,000, you know, between checkpoints 99,000 transactions per second. During checkpoints we were only getting 49,000 transactions per second. So if you look at the chart in the whitepaper you’ll see what looks like literally a rollercoaster. Just high, vhoov, high, vhoov, so it’s quite a ride but it’s not something that your user wants or your system wants. Implementing –k, and you implement it by simply adding trace flag like any other followed with no space by an instanture representing the number of megabytes per second of checkpoint throughput you want SQL Server to provide but no more. And in our case, the sweet spot, we did several tests. In our case, the sweet spot was 750, -k 750, said, “Hey, SQL Server I don’t care how many dirty pages you have never shove more than 750 mbps down at my disk.” And doing so allowed this rollercoaster ride manifested by the default behavior we all know and hate to a remarkable flat system performance. Checkpoint pages per second are throttle etc. in 750 mbps and that allows the system and because you define it, you develop the testing, you define the sweet spot and suddenly CPU is basically flat. Your transactions per second are basically flat, log flashes per second are basically flat and consistent. And the great news is that not only do you get consistent behavior but the average is far higher than the average of the rollercoaster ride. And so, yeah, I’m telling you I posted the paper when it was published to the MCM distribution list and I had several comments back about it. “Oh, where’s this been all my life.”, and that kind of stuff. I think you’ll be seeing, you’ll start to be seeing some adaption. And so again, one of the biggest takeaways of this Availability Group paper isn’t just the great things that we can do with the 2U box and a little bit of flash and improvements to 2016 but also an outcome is the first time I’ve ever seen it documented anywhere. The outcome of the impact of the –k startup trace flag.

 

Carlos:             And when we start talking about trace flags one of the things we want to at least make sure that we understand the downside. Or at least why it is not enabled by default. Maybe a good reason or a good thought. And if I think about it it’s really then what you’re going to potentially increase, again, kind of your mileage may vary depending on your sizes or what not. Is that when that instance restarts you

may have to go through more of those redo logs to get back. Is that a fair?

 

Jimmy:            Ah, that’s a good guess. That’s a very good guess but first let me be clear. This –k isn’t directly related to anything involving Availability Groups. It’s basically.

 

Carlos:             I guess said redo log while I meant the log, right. So you’re going through the log and move forward or back.

 

Jimmy:            Right, so you could do a –k that is too low and you basically get a sustained checkpoint that never ever actually finishes, and that’s what we did, that’s part of our seriously. And you would run exactly that situation the recovery on database restart, on server restart will take far longer than you want. But it turns out by doing the, when I refer to the sweet spot, part of the sweet spot means not just limiting the value so that it’s below the limited disk I/O system can sustain in terms of how many dirty pages it can take per second. But also you want the checkpoint, you want that to be never sufficiently high so you don’t run into that situation where you restart the database and it takes forever to recover. And so that is a potential downside, because, you know, we didn’t talk about this beforehand so I’m impressed that you came up with that. But the reality is with just a little bit of testing, we test that thoroughly and take less than a day, half a day probably. And we get a number that is again not, so the recovery is normal you can restart the database, you know, in a few seconds yet we’re not overwhelming the disk I/O subsystem. You asked the question, you Steve, why isn’t this on by default? That’s something I hadn’t thought of but the answer came immediately to my mind. The SQL Server product team by its nature they do things conservatively. No harm etcetera. And like trace flags 1117, 1118. How long have we wondered why aren’t these the default. Why don’t they create a TempDB files or four, whatever by default. Well, they were conservative, right? They finally realized in 2016. They said, you know, the case was made, “We’re going to implement 1117, 1118 at least in the context of TempDB by default.” It’s a known bottleneck. It’s been the best practice for years for users to implement it. We’re going to save them from trouble and finally make it the default even though it’s been the best practice for 3, 4, 6 versions, who knows. I have to think really hard how far back we have to go discover when they were introduced. So I think it’s kind of like that. Especially as flash becomes more and more a predominant host for SQL Server data and log files. And with these crazy performance we’re able to provide like a temporary servers. It’s a shame that people are going to have to resort to software needlessly with the default checkpoint behavior. So, it won’t surprise me if some subsequent version of SQL Server we see that implemented by default.

 

Steve:              However though you would still need to have a way to adjust it to your specific environment even if there was a default for it. I would assume.

 

Jimmy:            No question, no question, that’s going to be, no, maybe SQL Server can be smart enough. Maybe, because you know, because first it’s no harm, it would be easy to implement a default that in some cases could be harmful. You’re right there Steve, absolutely. But SQL Server is getting smarter all the time so who knows what they’ll think of next.

 

Steve:              Ok, so on that then, how about any other best practices that you learn while you’re going through this process of building this out and testing it?

 

Jimmy:            Oh man, I could go on. At least give me time for two please. Hopefully a bit more. But the two next most important ones I think hardware validation, again this is independent of AGs. This is independent of SQL Server 2016. It’s been a long best practice I have as a geek parachuting in to cities all over the planet for Microsoft’s consulting services. I evangelize, evangelize, evangelize, “Validate your hardware before you put it into production.” You know, you flip that production bed on that big million dollar sand before validating it thoroughly then you realized you have a problem. You’re in trouble. Seriously, promise, they’re not going to be fix very easily. And no one is going to be happy about it. So the time to validate performance is before you put your hardware into production, whatever the hardware is and especially storage. So the goal is to, I would say, “When you buy a new hardware you need to hit two different

thresholds.” One you need to hit the specs that are documented. You can download from whoever your vendor is. The second one is to make sure you hold the sales geeks feet to the fire for that vendor. Make sure that the promises they made you’re able to hit and if you can’t hit them get their people in there until you do. And so circling back to the work that we did we were struggling. We can apply with our own best practices, believe it or not, the cobblers kids have shoes in my shop. I was able to hit the numbers I needed. I said, “What, here’s the specs? Oh, what’s going on here? I was consistently by 20% below what the nominal specs where.” And we struggled for couple of three days until we realized we had the wrong firmware on the SSDs. We were really expecting our hits and had to call in some bigger geeks than I am. For the record, validating the hardware wasn’t on my plate. It wasn’t my job to do that.

 

Steve:              Someone else’s job but it’s to your problem, right?    

 

Jimmy:            Yeah, exactly, but the point is we discover this not halfway into the hardcore testing where I had to restart all the experiments. This was at the fundamental part in the work. We were pretty early on and we got it done. Takeaway from this is not merely, yes, validate your hardware but also, and this is very important, this is flash awaited, most of us don’t upgrade the firmware or drivers for our spinning media, ok. I certainly never have. Occasionally, when I’m asked, I say, “Yes, yes we do.” But though if answer buts, answer no questions. Flash, like many most of the other electronic in our servers require updates, both the drivers and firmware. And make sure you have the latest and greatest. Test and verify and all of that of course. But get your greatest and latest stuff out there. And I know from experience working with SandDisk and no Western Digital and formerly Fusion I/O. That’s stuff could make a big difference, so two takeaways there. One other one I want to throw at, one of the takeaways I have. We have a luxury working on all flash environment using crazy parameters to do our backups and restores. We backup and restore multiple terabyte databases routinely as a matter of course in a matter of minutes, and it’s pretty cool. Whereas environments I worked in the past, you know, multiple terabyte database can literally take hours, overnight, multiple days. So it’s cool to be able to sling those bits around in a matter of minutes. Then we do this by modifying some long time parameters that have been available to us in SQL Server for basically over a decade. Since, well, gosh, a decade and half, two decades. The number of devices, max transfer size and buffer count. My historical default has been to use 8 devices for data files, max transfer size, you can’t make it more than 4mb, the default is 64K. So at every request from a data file the backup process will grab a 4mb chunk instead of a 64k chunk. And also buffer count, and this is how much memory backup processes are allowed to use. And the default varies, as I understand it based on the hardware configuration that the database is on. But to maximize performance, backup performance, we multiply the number of logical cores times four. This is something that my old buddy and mentor Thomas Kejser taught me, and I’ve been doing that for years. Well, you may remember, circling back to the purpose of this whitepaper was to provide some actionable guidance for people to implement. And part of a real life, real world scenario is doing backups wherein we chose to implement our log backups every five minutes. And that we needed to find values for both database backup and log backup that wouldn’t impact the latency of the application. So instead of tuning for backup performance we have to balance that with application performance which is something I wasn’t used to because I haven’t been a production DBA for well over a decade now. And that was a lot of fun. So if you were to download the whitepaper you would find that the parameters we used, I don’t need to go into the details here, we used different parameters for data files vs. the log files. Easy to implement but the point is we did some of the heavy lifting for you and provided a template that you could use to implement it in your own installations.

 

Steve:              Yeah, that’s great. I would bet that probably 99% of the backups I see out there are just going on with the defaults on those.

 

Jimmy:            And it’s tragic.

Carlos:             Yeah.

Steve:              Absolutely.

Jimmy:            I know tragic is probably overkill of a metaphor but we are talking squandered resources. Okay. So there we go.

Carlos:             Well, so Jimmy, thanks for coming on and chatting with us today. I know that you’re willing to take the pains and kind of take one for the team and then share with us so that we can learn from your experience.

Jimmy:            Are you kidding? This is the best job I’ve ever have in my life. I mean SQLCAT is a lot of fun but this job is just a blast in terms of, so I don’t mind. These are the kind of hits I have to take, fine keep them coming.

Carlos:             Before we let you go shall we do SQL family?

Jimmy:            Sure.

Steve:              Alright, so the first SQL Family question is on keeping up with technology? How do you go about keeping of all the changes that are continuously happening? Besides writing whitepapers.

 

Jimmy:            That’s really a tough one. You know, because the SQL Server is so broad and deep and, you know, the impostor syndrome, a lot of people are. Gosh, I could go 20 minutes on this. I’m keenly aware of my deficits, they are broad and they are deep. Man, I just take it until I make it. I’m on candid at this point of my career can be candid with my deficits. And my buddy John Stewart our shiny new MVP. He told me, I think it was John who told me, “If you stay one chapter ahead of your customer you’re an expert.” That’s part of the secret and the truth is I’m actually taking a new tech. I live an intentional life, seriously I create a vision statement from month to month, quarter to quarter, year to year, I craft my goals. And one of the reasons I’m very excited about this December, I have a couple of weeks really hardcore weeks off that between ski sessions, ski days I’m going to craft my vision statement. And that includes a hardcore plan, not just I’m going to read powershell[inaudible — 37:18], I’m going to plot when I’m going to do those chapters. Michael Fall another shiny new MVP highly recommended that book among other people. And also I’ve enrolled in and I have just started the Microsoft Data Sciences professional curriculum. And my goal is to finish it by the end of December. So to answer your question besides taking it until I make it, besides being candid, besides not buying into impostor syndrome, I live an intentional life and I plot out my goals. And one of the things, speaking of impostor syndrome, a dear sweet, a beautiful gem of a friend, Marilyn Grant during a training session at Microsoft Consulting Services said the following and this is very important. It really helped me in my dark days when I didn’t, when I though I wasn’t smart enough to tie my shoes, stranded by brilliant people within Microsoft the customer side, and that is the following. You are as smart as they think you are or you wouldn’t be here. And that’s a very valuable aphorism for me. Okay, I hope that answer, not peaty like you guys but I hope it was, I you enjoyed it.

Carlos:             So you’ve worked for SQL Server for a long time, Jimmy, but if there is one thing you could change about SQL Server and maybe, you did talk about that redo log, if there is one thing you could change about SQL Server what would it be?

 

Jimmy:            That would be it right now. The redo log, it’s the big bottleneck. It’s the thing right now that is keeping, that has the potential to keep SQL Server from actualizing its true capabilities. Now again, not a lot of people need a hit right now. But it’s limiting us. I can’t invest time documenting AGs solutions if we have this bottleneck. If that bottleneck is remediate tomorrow we would be off the races and with various OEMs documenting one Availability Group solution one after another.

 

Steve:              So what’s the best piece of career advice you’ve ever received?

 

Jimmy:            Man, you guys are hammering me. I used to be in grad school so this is actually a question I’ve heard before so I know the answer to this. I used to be in grad school, I don’t know his name. We’re doing our thing that day and he without solicitation gave me this advice, he says, “Find, specialize, find a niche and dig deep.” And that’s paid off for me. You’ve heard some more things maybe from

other people but it’s really paid off for me. Some of you may know my original “claim to fame” was I published the stuff related to disk partition alignment. I evangelized it. Hey, you know, I did invent that but I was, let’s call it clever enough, smart enough, had a head enough with around me when I heard about it I was just gobsmacked and couldn’t believed this existed. And you know, no one really knew about it. And so I searched for an opportunity to actually do the testing, seized that opportunity, validated it for myself and we’re off to the races. I mean it was good enough for, that is what brought me to attention of the SQLCAT team and SQLCAT team has been a very valuable part of my success and put me to a position to be successful at SandDisk, Western Digital for example with the Fusion I/O people. So specialized, find something and go for it. One other thing, I’m sorry can I interject two other things? They just came to mind.

Steve:              Of course.

Jimmy:            Oh, thank you! Focus on the fundamentals. If you notice you go to your SQL Saturday sessions etcetera. Some of the sessions on fundamentals or some of the most well attended. And it’s true, fundamentals are important. You can’t get too much of them. And speaking of those sessions, community, community, community, can’t get too much community. You know, you guys, referred to the SQL Family questions. These whole set of questions is based on SQL Family. This close knit group of folks we hang out with. Pretty cool.

 

Carlos:             So you’ve mentioned a little bit about your background and some of the places that you’ve been. But give us the nickel tour how did you actually get started with SQL Server?

 

Jimmy:            Oh my gosh, I could get in trouble with this answer. Okay, you said, nickel, I went to grad school, went to California, came home a few years later with my toil between my legs and got a job. That’s the part that get me in trouble. But I got a job way back in the days of Windows 31 and I needed to do some, we did everything in paper and I said, “We can computerize this. I didn’t know anything about computers but I thought I was, I could figure it out.” And the computer that I wanted to use that had Excel on it was always busy. So I went this other computer it didn’t have Excel but have this thing called Access on it. And I thought, “Well that looks close enough to a spreadsheet to me. I’m sure I can figure it out.” And I had a hard copy manual of Access 3.0. I smuggled it at home with me and curl up with it every night. I never slept alone.

 

Steve:              So Access was your gateway to SQL Server.

Jimmy:            Yeah, Access was my gateway. In fact, I remember the first time I saw SQL Server. I sat down and I thought, “Where’s the GUI. What am I supposed to do now? What is this crap?”

Steve:              Alright, so I think our last question is if you could have one superhero power. What would it be? And why would you want it?

 

Jimmy:            Oh, I would be, I wish I had more charisma. I know I’m a nice guy, but I wish I was more charismatic. But the truth is, I want to be indefatigable man. I need my sleep, you know, as I approach my middle age, wink wink, I need my, I need not only my 8 hours. I could start turning into Mr. Pumpkin or you know, in fact, I went to the Microsoft. I was invited to go to Microsoft Server holiday party the other night and actually had to make an implicit, my friend got an implicit commitment from me that I wouldn’t start whining after 9 o’clock. I would man up and hang out until 11. So, that’s my thing, if I could get by with six hours of sleep on consistent basis instead of 8 I’ll be a different man.

Steve:              I’m with you there, make it until 11 is much harder than it is used to be.

Jimmy:            Yeah, but I will say though related to health is the importance of fitness. I know, Steve, you’ve lost some weight. Carlos I don’t know what your situation is. I’ve lost 85 pounds three different times. Speaking of a yoyo, you know, rollercoaster ride, three different times I’ve gained and lost 85 pounds. So I can’t tell you how many times I’ve gained and lost 50 pounds. I finally, I have gone to myself to a point where I am relatively stable, and healthy, and I make fitness a priority. My claim to fame, I mentioned my internal, my misspent youth awhile ago where I came back where toil in my legs while out there on the left coast, Richard Simmons, you guys have heard of Sweatin’ to the Oldies?

 

Episode 75: Testing Storage Devices

Storage testing and validation is something what we to add under roles and responsibilities as DBAs. Every database we ever manage is going to need one, but how often do we kick the tires? Many times we’re basically told to go verify that array or we’re doing this POC, we’re testing this new storage, but are we really only testing connectivity?  In this episode of the podcast, we chat with Argenis Fernandez about how he goes about testing a SAN array and the important metrics you should consider for your storage.  If you are still using SQLIO or even Diskspeed to test the IO for your SQL Server, don’t miss today’s episode.

 Episode Quote

“Storage testing and validation is actually something we end up doing as DBAs.”

Listen to Learn

  • Why the now deprecated SQLIO tool was poorly named
  • Why you should question your storage vendor about allocation unit sizes
  • Why you should consider garbage collection when you test SAN arrays
  • How compression and deduplication make testing more complex.
  • Why testing storage in the cloud is difficult.

Argenis on Twitter
Argenis blog at Pure Storage
Hyper-convergence 
DiskSpd Utility
VDBench

About Argenis Fernandez

Argenis is a Solutions Architect with PureStorage, a Microsoft SQL Server MVP, VMWare vExpert, and a well-known face to the #SQLFamily community, especially since he’s now the Director-at-Large of SQL PASS. He’s a regular speaker at PASS events, including SQL Server Summit. He also founded the PASS Security Virtual Chapter.

Transcription Storage Testing

Carlos:             Argenis, again, welcome back to the podcast.

 

Argenis:          Thank you so much for having me again. You guys, it’s always awesome, you know, I actually requested this one. So we upfront with everyone in the audience, I did told you guys to please go ahead and schedule me again because I wanted to go on yet another rant. Like I won the one rant on the previous one, I’m ready to go on another one man. Let’s go!

 

Carlos:            Very good, and as you all know compañeros what Argenis wants, Argenis gets. And of course we’re happy to have him back here on the show today. So last time we talk was kind of just a SSD conversation, talking a little bit about disk. And today, I think we’re going to be, our topics might be wide in range but we want us to kind of kickoff potentially with testing. And testing your arrays or testing your disks that you’re using in your servers.

Argenis:          Yeah, so storage, testing and validation which is actually something what we end up doing as DBAs. You know, like a lot of times we’re basically told, “Yeah, you need to go verify that array.” No we’re doing this POC, we’re testing this new storage, you need to go ahead and test it. And what is it, you guys answer this for me, what’s your favorite tool to use when validating storage?

 

Carlos:             Yeah, so they deprecated the SQLIO, so Diskspeed is the other one that comes to mind there.

 

Argenis:          Ding, ding, ding, right, so that’s what everyone looks for. So SQLIO, I love SQLIO, rest in peace now. Because it was completely wrong name to use like might possibly the most awful name to use for that tool because it actually had nothing to do with SQL Server or whatsoever. It just so happens that somebody in the SQL team wrote that utility but it was actually never meant to validate SQL Server patterns. It was just a storage benchmarking tool basically. It wasn’t even good for base lining because you couldn’t take like one, you know, I guess you could save the outputs and then compare them to some other output that you took in the future, right, to see how you’re doing etcetera.

 

Carlos:             Exactly

 

Argenis:          But you don’t see those things in SQL Server like that. For example, you would get all these fancy scripts from people out there, you know, that would run SQLIO at multiple block sizes and multiple frets and. You know, you will get some number so, what’s funny is that as DBAs we would actually never know if that was good or not. We would be like, “Yeah, we got some numbers. They looked alright. Let’s go.” It’s actually kind of funny at least that was my experience way back when I was giving a, I can’t remember who the manufacturer of that array was, and I was basically told here, “You go test this stuff.” I was like, “Ok, what do I do?” So naturally, I found SQLIO and then SQLIO is what I ran and I got a whole bunch of numbers and I’ve got a bunch of pretty graphs and then I showed them to the storage guy and the storage guy is like, “Yeah, yeah, yeah.” Ok, what does that mean? Is that good, is that bad? Are we, you know, if I put my databases on this thing are they going to work? Or you know, what is it? So on the era of magnetic storage and this is changing super fast like way faster than any of us expected. As you guys know, I work for the flash arrays all the time so I’m not really motivated or particularly interested on validating storage on magnetic anymore. But back in the day when we have the old HDD comprised arrays, magnetic arrays or spinning rust or whatever you want to call them. We wanted to test using something that made sense so, SQLIO would be a thing because they would actually generate some workload against the array, you know, regardless of the fact that it would be a patterned dataset which is actually very important to understand. SQLIO would generate a pattern of data to be send down to the storage array, not even close to real dataset, not even close. At that point, whatever performance characteristics you would see on your storage array at that point you will be happy with because you would basically ask the storage guy, “Hey, I’m getting, I don’t know, 800 a second and that thing. Is that good?” The storage guy would be like, “Yup, that’s good.” “Alright I’m done. I validated that my performance was ok.” You will look at the latency and see if the latency will be acceptable of different block sizes. And you would commit the most frequent mistakes of all which would be tying your allocation unit size on NTFS to the actual block size that gave you the best latency. That’s what everyone would do, like they would actually make that mistake right there and then. You would go ahead and see what SQLIO told you in terms of latency for a given IO block size. And you would say, “Yup, I need to format my volume at this particular allocation unit size because that’s what’s going to give me the best performance.” That couldn’t be any further from the

truth, like literally I have no idea why people got stuck in on that myth. And I actually have a blog post that I’m sure we can reference from the podcast here that mentions that I/O block sizes and SQL Server in general.

 

Carlos:             So Argenis, just to be a knuckle dragger here.

 

Argenis:          Yeah, please.

 

Carlos:             Because everybody knows, isn’t that because just run this test it’s telling me, “Hey this is the allocation unit size that you should be using.” And then you’re like, “Ok, well that..”

 

Argenis:          That’s not what the test is telling you. The test is telling you that a given I/O block size your storage array behave in a certain way. The I/O bock size as you all know has nothing to do with the actual allocation unit size of that of an NTFS users. They are two completely two different things so it makes no sense for you. So if you got the lowest latency at 4K you will not going to format your NTFS allocation unit of 4K. That’s just not it, right? Because allocation units are meant to be used for allocation, that’s what they are for. And so larger allocation units sizes, so 64K, and quite possibly larger than that with newer file systems, like ReFS which is becoming a thing nowadays. You would not consider using smaller allocation unit sizes because you want less metadata for your file system as long as that metadata doesn’t become a contention point for allocations. This should not be a contention point because this doesn’t work like DFS [inaudible –] You’re not going to be continuously allocating new expanse  for your database all the time like one by one, right? Doesn’t become a contention point for your database in terms of growth. Yeah, Steve, you wanted to ask something go ahead.

 

Steve:              So then with that, I mean it’s used to be you look SQLIO and then you format your partition size in your disk to match whatever your best throughput is. And then came along the rule that general always format it with 64K.

 

Argenis:          Just go 64K. Yeah, that’s what you should do. There are other storage arrays that still tell you to format that at a different allocation unit size. I literally have no idea why they ask you to do that. Because I don’t even know if this people are guided by the same things that I mentioning here all day just looking at the computer one thing, right? At my company, we just told everyone, format at 64K you’ll be fine. They are just going to perform just fine. All other storage array vendors tell their customers to go 4K on transaction log and go 64K on data files. I have no idea why they say that? I do not have an idea on how they architected at that granule level that it actually matters. To me it shouldn’t matter on in any storage array for the most part. Now, would you see differences on SQLIO? Yes, because you are allocating a file upfront for SQLIO. That’s one of the things that matter, right? Like SQLIO actually takes time to allocate that file and Diskspeed does that thing as well. So where we actually swayed by the fact that SQLIO was creating a file or what? I think that is actually part of the problem.

 

Carlos:             Other examples that might include is if I’m moving or upgrading from one server to the other. I already have a larger file and I have to allocate all of that space, right? In the case of a growth, I start with a very small database and I’m growing in chunks. Those patterns are also different.

 

Argenis:          Well, I mean, at that point you should need to start thinking in terms of file system, right? You have a set of database files that are sitting on a given file system. If you’re migrating a database from one server to the other and you’re just using a restore that destination file system doesn’t have to match the allocation unit of the source file system by any means. If you’re using something like Storage vMotion or Storage Live Migration on Hyper-V or whatever it is called on the Microsoft world. You were just to migrate something and you would end up with that same allocation unit size because you’re performing basically a block by block copy of the entire thing including the file system. Alright, so, new ways of doing things etcetera etcetera but what matters in the end is, me personally I ask your storage vendor what is it that they like. If they like 64K ask, you know, it would probably safe to end the conversation right there and format in 64K. If they ask you to format at 4K, ask them why? Why is that a thing? Are they doing the right thing by telling you to format at 4K? Anyway, rant over on that one guys let’s not go into that. Let’s take a step back, we started talking about SQLIO, right? And we said rest in peace, it had the wrong name to begin with, it generates pattern data, it was replaced by Diskspeed. But guess what I love Diskspeed and I kind of hate Diskspeed at the same time. Because you know what Diskspeed does? It also generates patterned

data. Why does that matter? Most of the storage arrays that they sell today, pretty much every single one out there has intelligence built-in to the controllers, so whenever you send a patterned data set to it, it will basically just mark a little metadata entry for it and nothing else, so you’d actually wouldn’t be stressing. This is actually super important for everyone to understand. You would not be stressing your final media so if you’re laying data down, pushing data down to your hard drives you wouldn’t be stressing them. If you’re pushing data down to SSDs you wouldn’t be stressing those either because the controllers will basically drop all that data at the head, at the front end and it would basically just say, “Yup, I got some activity going on here but it wasn’t enough to merit me using the SSDs or the Hard Drives at all.” That’s kind of very important, right?

 

Steve:              So then in that case then Diskspeed may still be good for testing like local drives that are installed in a server but not the storage array? Is that right?

 

Argenis:          Here is what Diskspeed is good, yeah, if you have local storage obviously Diskspeed is good. However, if you have something like, I don’t know, like Storage Spaces Direct or whatever that might be that might have some other backend behind the scenes, you still have to go through all of that, right? You may have an intelligent controller on the other side maybe managing disk for you. You don’t know that. As a DBA you’re not told exactly how your storage stack is laid out like. So if you’re connected to some storage array they’re not giving you the specifics of whether that stuff is doing something a certain way or not. So your job as a DBA is to make sure that you test with the best dataset possible and the best workload possible. So is Diskspeed the thing that’s going to generate the most realistic workload for you and the most realistic dataset for you? I think the answer to that is a flat resounding NO. So this is why I only use Diskspeed to validate plumbing, meaning I’m getting good throughput, I’m getting good latency really quick, and like my channel is good. I have a good line of communication between my host and my storage. That’s all I am testing with Diskspeed. If you have an old fashioned, you just dump SSD for example, that you just attached directly to a server and you want to run Diskspeed against that. Yeah, that’s good but is that actually going to test everything on that SSD. I will dare say no because there’s more to that. An SSD is actually comprised of NAND, right, like NAND chips. And the NAND chips actually perform garbage collection, right? So overall, the SSD has a controller and that controller would decide at some point to perform garbage collection on that NAND because NAND is not byte-addressable. NAND is actually addressed kind of like database pages, like the entire pages are written at the time. You don’t writer on a byte by byte basis. So that actually makes a huge difference, right? When you’re testing at something that is byte-addressable like a regular Hard Drive used to be that would be byte-addressable, right, like sector addressable. You wouldn’t trigger a garbage collection on HDD because there is no need for that. But on SSD you will trigger a garbage collection and the only way to stress an SSD so that you make sure that you trigger garbage collection on it is by priming it, so filling it with data and start writing data to it again. So you would trigger the garbage collection mechanisms, ok. Did you see how storage testing is a complex story? It’s not something that’s straight forward. It’s not just running Diskspeed and seeing what kind of numbers do I get. It is so much more than that. It’s running a continuous workload at different I/O block sizes with the given reducibility of the data. Actually we haven’t even talked about this. Reducibility of the data is a thing on all flash arrays and it matters a whole lot. Because most of the storage arrays out there are doing the duplication and compression of some sort. Like ours does, right? Like the one company that I worked for does it. EMC does it, SolidFire does it, Nimble Storage does it, THL does it. Everyone does compression, right? So when you want to validate that all of the storage arrays are doing the things that you want, they’re going to reflect a certain performance characteristics upon a given workload and a given dataset. You want to test with the real workload and the real dataset. So again, Diskspeed doesn’t take you there. Diskspeed only helps you test the plumbing. Make sure that, you know, whatever is between you and your final storage device is not bottleneck somewhere. It is really good for that and it is really also good for testing dump byte-addressable storage. Here is another thing that a lot of people

don’t know. Diskspeed actually has a way to generate random datasets. As you can tell Diskspeed here go ahead and generate a random buffer but getting it to work. There’s actually a huge bug on Diskspeed. Getting it to work is actually really complicated. I don’t remember exactly all the conditions that you have to do. Like you have to use a file of a certain size and then the buffer that you pick has to be evenly divisible by the size of the file or something like that. I can’t remember what it was like. Because its buggy and it’s not quite perfect then you have to have all these conditions for it to generate a random workload. But then that random workload will be so random that it wouldn’t actually be representative of your real dataset either. So it will be like the complete opposite of what I said before. You could start with a patterned dataset which is very easy for the controller to drop, right? And then in the complete opposite end of the spectrum you’re sending a completely garbled high entropy dataset that makes no sense whatsoever and it’s not actually reflecting your real workload. This is why I keep telling people you want to test with a real workload and a real dataset. So restore your database, replay your queries against it. That’s the best way to actually validate that your storage array is going to do the things you wanted to do. And you can see how it reacts to your workload and keep it on a continuous loop right. So run it over and over and over. Increase the frequency of your workload if you can. So there’s multiple ways we can talk about that, multiple things that you can do to kind of replay your workload and kick off more threads and stress the storage array a little bit. Or if you can’t do that, there’s synthetic testing that you can do against the storage array that will kind of mimic database access patterns but it wouldn’t be your real workload. In the end, what I want people to do is forget about running Diskspeed, foget about running SQLIO. If you want, run Diskspeed really quick just to make sure that you got good connectivity against your storage array. But in the end what you want is to replay your production data. Your productions datasets, so restore your production dataset and replay your production workload against that storage array. That’s really what’s going to give you the best picture.

 

Steve:              So at that point, when you say replay your queries, replay your workload I assume you’re talking about the distributed replay controller that point then.

 

Argenis:          That’s one way of doing it, right? You also have, Orca which is also another way of doing it. There are other tools out there that, you know, from vendors that allow you to do that. I believe Quest Software has one if I’m not mistaken. There’s others right.

 

Carlos:             What’s that second tool that you mentioned?

 

Argenis:          I’ll just mention three that I know about. And even if you don’t have time to do that you can just run an ETL, right. Which most people have an ETL or you can write very intensive read queries or write queries provided that you do things cold cache. So you flash your buffer with DBCC Drop Clean Buffers then run your queries or your I/O intensive queries against that restored copy of your database and see what kind of behavior you see. But more importantly when you’re testing and validating a shared storage array it’s not one workload that’s going to be hitting that storage array so things get even more complicated. And this is why, you know, I could be talking about this stuff literally for weeks.

 

Carlos:             Even then Argenis, you just flashed the buffer cache but that’s on the database server. And you mentioned even like so that caching at the SAN level. Now is the first time you pull that it’s not going to be in that cache. But how do you?

 

Argenis:          Well, that’s another thing, right? If you have caches on the SAN side how do you invalidate those caches, right? If you’re going to have a cache into your SAN do you want to use all that cache for writes or do you want to use all that cache for reads. So it depends on your workload. So what I tell people is, you know, what kind of workload do you have? If you have a data warehouse, there are going to be times during the day when you do ETL and you load data into that data warehouse that you want to have that cache to be just for writes as much as possible. And then at some point, you’re going to start pruning that database and at that point that cache be better used for reads. Is that the best way to set it up? It depends on your storage array. It really really does and this is why working with your system engineers from your storage vendor matters a whole lot. You need to understand what knobs are available to you or even if you want to tweak them or not. When you’re validating a storage array it’s very important that you understand that there’s a cache or not. So ask that question from your storage guy, “Hey, is there a cache involved in this thing?” Because if there is a cache you want to make sure that you do enough testing that you over run that cache and see what the behavior will be after you overrun that cache and just start operating it a different speed. Because that’s something that storage people love to do. They love to abstract you from the fact that you’re running on

this low storage by throwing another to your storage in there. Is that solving the problem? Maybe, right. Maybe it does fix your problem for you. But is it a long term solution for your workload that’s continuously growing and continuously getting more aggressive? Maybe not.

 

Carlos:             Well, it’s the old adage. I mean, it’s just a new version of throwing more CPU and memory at the server, right?

 

Argenis:          Right. Tiers are the thing for storage has always been a thing. And even at the CPU level you have different kinds of caches. You have an L1, an L2 and an L3 cache for instructions on data on the CPUs. So this is not a new concept, right, by any means. It’s just that, you know, how many times do you have to change your storage so you can get a solution going which is also a very big thing, right? How many times do I have to migrate my database before we actually run on a thing that actually supports my workload? It’s a pretty common problem out there, right? Like a lot of people hearing this podcast will probably identify themselves with this. You know, I migrated my database once because when you switch to new storage, six months into it we realized that it sucked and that we have to migrate it again, so data migrations are very much a thing. But in the end it’s all the storage testing and the validation that you did that’s going to give you that confidence of using that storage array the right way. You’re making the right purchasing decisions. It’s very important, you know, and going back to this point that I was talking about before, reducibility of the workload is very much a thing, right? If you just create one database that’s completely empty and what are you testing there? You just created a database that’s completely empty. What are you testing there? Absolutely nothing, right? If you’re validating, the performance characteristics of a TD Encrypted database that you’re placing on an array that performs the duplication, compression and things like that are going to be different that if you send a database that’s in the clear. So those are any compression whatsoever. The performance characteristics will be different so you want to validate both. You want to make sure that your array reacts a certain way for your datasets. So if you have TD Encrypted databases make sure that you restore TD Encrypted databases to that array and you run your performance test on top of that. The future is filled with arrays that perform data reduction all the time everywhere. And, you know, even if your array only offers certain features and you know you’ll going to use some of those features like for example there are some arrays out there that have data services turned off but you can pick to do compression only for example. Then, make sure that you enable compression on the volumes that you are going to use for your databases and drop your data in there and whatever final form it’s going to be. So if you place compressed data on top of an already compressed dataset on a compressed volume on a volume that will perform compression, you know, what kind of performance characteristics are you going to see out of that. In general, it can be kind of eye opening to see how storage arrays react to different datasets. A lot of people don’t think about this because they just think that, you know, running a quick Diskspeed will get them out of the way quick. It’s not what you want.

 

Steve:              So let’s say you’ve got your production server running on a new storage array that’s shared and you start out with SQL Server there and everything is running great. And then overtime the SAN administrator decide to load that storage up with additional systems using it. Bunch of backups right in there, or file stores or other systems using it. One of the things that I’ve seen when this happens is that your SQL Server performance degrades over time. And I guess as far as testing that one of the things that I’m really curious about is how do you kind of test on a regular basis to know that your production performance is not degrading there on the actual I/O side of the storage.

 

Argenis:          So, I mean, you nailed it, right when you mentioned that this happens very frequently and that’s just because people are not doing their homework, right? Every single storage array has limits. There’s a given amount of workload that you can fit on it first and it works just fine. As you add more workloads into that array you start seeing different performance characteristics from that array as you’re adding different workloads for different patterns. That is just natural. You know, all of a sudden just start, you have your data warehouse running at full throttle on that array. You start adding a bunch of VMs that are going to do, I don’t know, you’re running Exchange on those or whatever, or SharePoint or whatever you’re going to see a different kind of behavior from those storage arrays. So how do you test that? The only way to test it that you can be

comfortable with is mimicking what it is actually going to be as much as possible, and as much work takes and as painful that sounds is the only way to make sure that your storage array is performing a certain way. So take your real workloads with your real data sets and hit your array with that. I you want to fit an additional workload and you don’t even know what that workload looks like then you do have some homework to do. Like you need to understand what are your current, how busy your storage device is and whether it will accept that additional workload without tipping it over. Or will it actually cause it to, you know, start generating a lot more latency or your throughput will reduce because of that additional workload because now you have, you know, everyone’s drinking and we have a big fire hose before now we have two people that are drinking from it and not just one. It kind of works like that, it’s just, every single computing element in your data center has limits. And the storage array or your storage devices are just one more. You need to understand what are those limits are, and when you are going to feed more workloads into that you need to make sure that you do your homework, and understand how busy that storage device is. And when you are ready to drop that new workload in that it fits nicely and it’s not going to tip it over.

 

Carlos:             Yeah, it’s interesting even recently we put out our SQL Server checklist and you know lots of administrators are trying to standardize our processes and. You know cookie cutter in a sense, repeatability, automation, right that’s the big mantra. And it almost sounds like, “Eerrk, you know, hold the presses here.” There are certain areas where, not that you can’t automate all of it, but at least you’re going to have to understand. You’re going to do some homework first and then choose your own adventure if you will.

 

Argenis:          You nailed it. That’s exactly what happens. And do you know what a lot of people think that they can get away with not doing this in the cloud. “Oh, it’s just a cloud so it’s fine.” “Oh it’s not fine alright”, let me tell you. And the cloud you have a bunch of, you know, Amazon may not tell you, Microsoft may not tell you, Google may not tell you but they do all these things that we talked about, right? So they do their caching, the tiers, and they have this devices on the behind the scenes that do a whole lot more than just storage. They might have actually data services involved in all that. So it’s not as straight forward to test storage on the cloud either. What I like to do is just tell people remember the basics. You could get cache some idea with the throughput and what the latency will be like by using kind of dump synthetic tools like Diskspeed or Iometer or, I don’t know, CrystalDiskMark which in the end uses Diskspeed underneath the hood, or Atol, or whatever. There’s like a hundred of them. You can definitely use those. We’ll give you an idea but testing with real workloads and real datasets is going to be the thing. That’s why I get people, I literally have a friend who was spanking me on Twitter the other day. They bought this hyper-converge solution. It looked great when they run SQLIO on Diskspeed against it. But guess what that thing was doing? Exact same thing that I mentioned before, right, it was dropping the patterns of the controller side. So they never actually saw it perform with real workloads and real datasets. They put their workload on it, they started seeing all this latency and the workload was actually kind of in the same way that it was when it was running on magnetic. Because they didn’t test the fundamentals which is test with your real workloads and real datasets. I keep hearing hyper-converge by the way. It’s a thing now, like everyone is talking about hyper-converge. Need to remind people that hyper-converge is great but not necessarily for databases. One big problem with databases is that they need low latency. You’re going to be doing transaction log writes. You’re going to be doing a lot of reads and those reads better be performed quick. Because that quick turnaround from storage is the thing that’s going to give you better performance and so.

 

Carlos:             And so I’ve been hiding under a rock Argenis. Describe to me what hyper-convergence is?

 

Argenis:          So hyper-convergence, it’s kind of involved now for describing a new kind of architecture where compute and storage is all same thing, and even networking is kind of hidden. You can do on the same compute nodes you can have storage attached to them and, you know, everything is kind of flat and it gets you like that kind of cloud experience when it comes to provisioning storage kind of

thing. But in the end you’re sharing resources between storage nodes and compute nodes. So you can end up with a situation where your databases are running on the same exact node as your storage device and your storage. The actual serving of storage takes compute resources. And those compute resources are being kind of colliding with your databases. So there’s obviously other kinds of designs or features on hyper-converge where you can have storage only nodes. But that’s really no different than, you know, having your own storage device out there, and so they basically spinning storage again into its own node. It’s basically, you know, having less things to configure on your network so basically, it’s more like the appliance approach to computing where you just buy something. Hook it up to your network and two seconds after you have it up and running. Well, provisioning is part of it, right? Provisioning and getting things to run fast is one part of it of course but you know it’s an on-going operation and on-going performance so you are going to need either of that thing that really matters a whole lot. So if you’re looking at hyper-converge solutions, please, please, please make sure you test with the right tools. And if I can mention one synthetic tool that actually works really really well. I personally hate Oracle but this is one thing that comes from Oracle that’s quite kind of decent. It’s called Vdbench, V-D-Bench, so Victor David Bench. That is quite possibly the best storage testing tool, storage testing and validation tool that would allow you to get, you know, a better idea of how your workload and how your dataset are going to behave on whatever storage device you’re testing. It actually allows you to specify the duplication and compression settings for you workloads. So you can say, oh this is like a database so it will dedupe 1.2 to 1 and it will compress 2 to 1. Or it will compress 5 to 1. Or I’m testing VDIs so I’m going to have a lot of reducibility on my workload that’s going to reduce 19 to 1 so I can test it that way. And then you can also generate workloads by saying I have, you know, this kind of I/O block sizes this time of day. I have these peaks and valleys. You may actually specify that the target device needs to be filled with data before it can actually be tested. So there’s kind of whole bunch of sweet sweet sweet features that you would love to leverage when testing and validating your storage.

 

Steve:              Ok, great. So one of the questions I have here is that it seems like when we talk about testing with an actual real workload that really that often times in my experience happens after someone has already made a decision on purchasing your storage and it gets to the environment and here’s what you have, figure out how to best use it. I mean, is there a process that you generally goes through like the pre-sales when people are looking at different storage where you can do this kind of testing usually?

Argenis:          So most of the storage vendors out there will want you to perform a POC. Some of them actually don’t like it because it takes resources away from them, right, like, you know, they have to ship you an array. They have to be on top of you and you only have so many days to complete that POC etcetera, etcetera. Most of the storage arrays vendors are or the storage vendors in general not just storage array. They will happily give you something for you to try and it’s during that period that you were able to validate that, that storage array or that storage device. If your storage vendor doesn’t do that and your management will sold on that storage array and you get it right away. I need to remind you that most things are you can actually return. So if you get something, test it right away. Like drop everything you’re doing and test it because you don’t want to pay for this in the long run. Especially if you’re the DBA who’s going to be babysitting processes that are going to be running on top of this thing. Because imagine you end up with a dot, imagine you end up with a storage array that does not kind of give you the performance you need or storage device doesn’t give you the performance you need. You are going to pay for that dearly. You’re going to pay for that with your weekends. You are going to be suffering. You are going to have to watch those backups, and watch that maintenance, and watch that update stats they’ve meant. It’s going to be painful, right. So just make sure you test things as much as possible that you inject yourself in the process of acquiring the storage as much as possible. That you become knowledgeable on the storage

side as much as possible because storage is a thing that is fundamental to every database out there. And every single database is backed by some sort of storage and if you don’t understand how that storage works, and you don’t get into it a little bit, even a little bit then you’re going to pay for it down the run.

 

Steve:              And I think that’s where if you’re working in a shop where there’s a wall that’s been built between the storage administrators and the DBAs. That’s where you get the most trouble but when they’re the same person doing both sides of it or that there’s no wall and there’s very good communication between the DBAs and the storage admins then you’re able to make those kind of things happen.

 

Argenis:          This is something that you have to ferment. You have to make sure that whatever walls exists in your environment today you can overcome. Like, you become best friends with the storage people. You know exactly what they have. The storage people know why you care about certain things. They understand why databases are the pain in the butt that they are for storage. They are, they are a pain in the butt like the storage people hate DBAs for a reason. Because the databases are nasty especially, you know, take backups right. Who has a storage guy that loves to see the performance of the storage array when taking backups? Nobody, right, because backups are just nasty on every single thing. Backups are nasty on the source storage array, backups are nasty on the networks and backups are nasty on this target storage array. Or jbuff whatever, may not be an array. But whatever you end up backing up too. So it’s a thing like people need to understand that if you don’t talk to the people that work in your company that do different things than you do. Then, in the end have power and oversee the things that matters so much to you. You’re just hurting yourself. That’s actually part, you know, we could go on another rant about why you should be best friends with your developers as well. But it’s basically the same thing, right. Like everyone is working towards the same objective which is making sure that your company continues to operate at the highest level. And you can crank out features and crank out new functionalities as fast as your business wants so don’t become an obstacle right there. That’s actually something that I advocate to everyone. Don’t become an obstacle on the database side. Don’t become an obstacle on the storage side. Don’t become an obstacle on the virtualization side. Offer solutions, tell people, “Hey, this isn’t working let’s go and see what we can do to make it better.” Engage more people and make sure that everyone understands what’s actually going on in your environment. Because the last thing that anyone wants is surprises, “Oh, we never told anyone that this actually wasn’t working.” You need to make sure that everyone knows the status of your environment out there.

Steve:              Right, right.

 

Carlos:             Sounds like very good career advice.

 

Steve:              Absolutely.

 

Argenis:          Alright, man, we went a little rant over there. Man, I need to catch my breath here.

 

Steve:              Oh, it’s all good stuff.

 

Carlos:             Well, thanks again for being with us today, Argenis.

 

Argenis:          It’s my pleasure. Thanks for the time you guys. I really really really wanted to do this because you know having, I make sure people kind of got a taste of what it’s like to pay so much attention to the little details on the storage side. A lot of us happen kind of complacent in the past and said, “Oh, I just got some storage array that I own, some storage device that I got from this guy. I’m just going to quickly run something. Ah, numbers look good. I’m moving out.” That’s not it. It’s so much more to it. So you need to understand how things work and why testing in a different way matters so much.

 

Carlos:             That’s right. So before we let you go we’ve had you on the shows, we’ve done our SQL family questions with you already. I guess for those who may not know all of your history. Give us the brief story how did you first get started with SQL Server?

 

Argenis:          The year was 1998. I remember that Google was barely getting started back then. I remember when Google came out I was like, “Oh, I’m not using Yahoo anymore. I’m not using Alta Vista anymore.” Whatever we were using back then. That felt awesome. I remember I was SQL Server 65 that I was powering the backend for this product that I was working on. I was working for an ISP data administrator on an internet service provider. The largest one done in Venezuela and the mandate was to migrate from a FreeBSD environment to a Windows NT 4.0 environment. So you guys can imagine, right, like that was really really really controversial so we got a lot of pushback in there. But in the end that was the mandate that came from management and we purchased this product from Microsoft called Microsoft Commercial Internet System. It’s called MCIS, right? And the version of that was version 2.0. So that server was, that suite of

products was NT 4.0 with the option packs so IIS and all of that, and you had an additional set of binaries for running an SMTP Server, a Pop Server, an Ldap Server. And that Ldap Server was powered by Site Server’s commerce edition, Site Server 3.0. So if you guys, you know, I like reading, ancient folks like me that worked that technology wayback when you remember all of these stuffs. Actually I can’t remember if it was Site Server 3.0 or earlier version on the first version of MCIS that I used. I can’t remember. The truth is that it was powered by SQL Server 65 at first and then they develop support for SQL Server 7.0. So that’s kind of how I got started with it maintaining that database and I was basically one of the, there were many of us, accidental DBAs for that database. That’s kind of how I got started with it. And so we were sent up here to Redmond so that was January 1999 when we came here to Redmond for the first time. We spent a month here training on that and we were basically given a crash course on Windows administration and SQL Server administration. So that’s how I got started with SQL way back when. And you know, change stops a hundred different times and, you know, made my way through this admin a little bit and I did development for a little bit, and I did kind of management for a little bit. But in every single one of those positions that I did consulting even for Microsoft at some point. Through all of those positions I was always working with SQL server in some way or another. So you know, it’s been 18 years now. It’s been a great great great career and I just went to the MVP Summit. I knew some of you guys were there. Man, the MVP Summit was awesome because it’s all those, you know, they laid it out in front of us the future of SQL Server and what it looks like. It’s now going to run on Linux. Those learning on Linux, so if you guys out there haven’t seen that you need to go take a look because I think it’s going to be all the rage. SQL Server on Linux is going to be a huge huge thing.

 

Steve:              Oh yeah, I think so too. I know I’ve got it running on a virtual machine here on Linux and it’s pretty nice.

 

Argenis:          It is, it is. So that’s kind of the backstory on me starting with SQL Server way back when.

Steve:              Excellent.

Carlos:             Awesome. Well, again, Argenis thanks so much for being here and taking some time with us today.

 

Argenis:          Hey guys, thank you so much for letting me do this again. I really really really appreciate that you guys kind of give me another slot of your podcast which is wonderful by the way. You know, to get some more of my opinions out there and I know, you know, the previous podcast actually generated a lot of controversial discussions out there on where my stance is. I actually owe some people blog posts on that to follow up on some of the points that we talked about. And they will probably ask me a blog post on the things that I talked about today so I will promise. I promise, I promise I will make that. I will make those blog posts happen and we’ll get people really really interested on testing their storage the right way.

Steve:              Sounds good. We’ll definitely look forward to reading those.    

Episode 74: 2016 in review and 2017 goals

As the year 2016 comes to a close, Steve and I share our thoughts on our accomplishments for the year and what lies in store for 2017.  Steve reviews some of the changes to the database health monitor tool and Carlos shares details about his new book and how it came to be.

 Episode Quote

“2016 has been a busy year”

Listen to Learn

  • Some of the accomplishments Steve and I have had over the last year
  • What we are looking forward to in 2017

Carlos has a new book – Zero to SQL
The SQL Server checklist repository

Transcription: 2016 in review

The transcription will be available shortly.