Episode 86: TsqlT

Based on listener feedback we got from episode 72, a request was made to have a conversation about testing frameworks and in this episode we are going to focus on tSQLt.  We are pleased to have Sebastian Meine, the creator of the tSQLt unit testing framework, on the podcast with us.  We talk about the basic principles of tSQLt Framework when it comes to unit testing. Sebastian will also be discussing about the do’s and don’ts as well as the different aspects to be considered when using the framework. Sebastian makes the case for why you should take the time to create unit tests and we discuss the challenges culture might play into the use of the framework.

 Episode Quote

“Anybody who earns a living with writing SQL modules should test those modules…and tSQLt is the best way to do that”

“For a contribution to be accepted it’s important that it is well covered with tests”

Listen to Learn

  • What the tSQLt framework is
  • The case for using unit testing in database projects
  • How the framework has been integrated into third party projects
  • How you can participate in building the framework

Sebastian on Twitter
tSQLt framework
tSQLt project on GitHub
Sebastian’s Website

About Sebastian Meine

tSQLtSebastian Meine, Ph.D. is the SQL Stylist behind sqlity.net llc. In 2010 he founded sqlity.net to fully engage in his passion for SQL Server.

Armed with a Ph.D. in Computer Science from the University of Linz in Austria and years of experience in architecting DBs for a global Fortune 50 company, Sebastian now provides training and consulting services for his clients, specializing in SQL Server Security, Test Driven Development and Performance. Sebastian is an experienced public speaker, published author, president of the Philly PASS Chapter, PASS Regional Mentor and coauthor & lead developer of tSQLt.

Transcription: TsqlT

Carlos: Well, Sebastian, welcome to the program.

Sebastian: Hello!

Carlos: Yes. I think we have some interesting things to talk about and not only technology but how you decided to go ahead and out this together. I guess, first our subject is tSQLt and so why you won’t give us the nickel tour here on how you explain to somebody what tSQLt Framework is.

Sebastian: The tSQLt Framework here that is a unit testing framework for SQL Server. And what I mean was that a unit testing really means you go down to the smallest unit of code so in SQL Server that would be stored procedures or functions of use and you’ve write test for them in isolation. tSQLt, of course, a lot of functionality around that that makes particular the testing in isolation easier.

Carlos: Ok, I guess can you give an example there of a way that someone would might create a unit test around a stored procedure for example.

Sebastian: How detailed do you want me to get it’s early on?

Steve: Maybe we could take a little bit different approach there of higher level of like around test driven development. One of the things that I’ve seen with that is the sort of strategy of you first create a failing unit test and then you write your code to pass, and then you write another test and you write the code pass that test. Is that what people are doing with tSQLt?

Sebastian: Test driven development and I have to prefix this, I love it. Test driven development has a better reputation of that. A lot of people think it’s scary and it takes too much of work and we can talk about that a little later. Yes, tSQLt allows you to do test driven development. In fact, tSQLt itself was developed a kind of bootstrap using itself completely test driven. Important in test driven development is that you take a really small piece of functionality. You write a test such as that functionality isn’t there. And you implement a functionality and then you run the test again and it hopefully shows you now that the functionality is there that really does a few things. First, writing the test for us making sure that it fails shows to you that the test actually is testing something. It’s really easy in unit testing in any language to write a test that doesn’t really test anything. A test that can fail, so by writing that test first have the assurance that you have written your test correctly. And then by implementing just the minimum piece, minimum amount of code required to make that test pass. You end up with very modular code which in turn is really easy to maintain and you also have to test that tells you everything was still working.

Carlos: Ok. And you’ll forgive me right, so being a little bit of knuckle dragger that I am, you know, I guess it makes sense to me that you could, you know, a piece of functionality is not there and you want to test for it so let’s just say I’m adding a parameter, right. Or adding another column that you would return or something like that, right. Ok, that seems fairly straight forward. Am I using this also to test the results or to interrogate what gets returned from a store procedure or from a view as well, or is this just limited to the object itself?

Sebastian: Let’s talk about the view example. So I have a view and I want that view to return some aggregated data so I might have a count column and a sum column let’s say.

Carlos: Ok, right.

Sebastian: And so, tSQLt allows you to compare result sets. It actually compares tables not result sets so the content of tables. So what you do is insert your result set into one table and then you insert the expected result into another table and then you use set equals table to compare the two tables and it will tell if there is a difference or if they are equal.

Steve: Ok, so then in that case you would know based of the contents of the table that would be for instance being inserted from a view if the view is producing the right result or the wrong results.

Sebastian: Correct.

Steve: Based off of comparing that to a known expected outcome.

Sebastian: Correct.

Steve: Yup, ok.

Sebastian: And when starting out in general I recommend that you follow test case heuristics and you write your test cases. And for testing results sets, for sets in general you want to follow the zero one sum heuristic. So you start out with zero rows and the table that the view is accessing. Then make sure that the view is returning the right thing. So say you’re starting from scratch you haven’t written a view yet. The first thing you would do is you write a view that contains a select column say the sum column from the view and put that into, I tend to use temporary tables into a Temp table, and now the view isn’t there. The table that the view is accessing to be assumed that they’re ready, the view isn’t there yet so when you compile a store procedure you get a warning. And then if you run the test case you get an error since the view isn’t there. Let’s you first call it call to action to now implement the view. Now, what the test at this point is asking for is you have one column called sum, probably a little more expressive than sum. And the test cases are accessing the view as a name so the view has a name and the two things that are required at this point, so you implement the view. You don’t have any logic at his point because it has not ask for anything yet so you might just return a hard coded empty result set.

Steve: Ok, so then once you’ve done that you’ve confirmed that the zero case that if there are zero rows return that you’re getting back the empty set that you’re expecting.

Sebastian: At this particular example I guess you shouldn’t make up examples on the fly and sum as an aggregate so you will get one row back. In this case we would expect zero in the sum or the count column input is empty.

Steve: Right.

Sebastian: That is by the way another thing that test driven development really good at. It makes you think for you requirements so you catch something like this early on whereas often this particular case you want no row to be returned or you want a zero to be returned goes unthought-of and then the result is whatever it is but that might not match what the stakeholder actually requested.

Steve: Ok, so then once you’ve got the zero rows situation figured out I assume you move into the one row.

Sebastian: Correct. Zero one sum so the next test is the one row so you write a new test case. You insert one row into the base table then you expect the sum column to return the value that is in that row in the column that we are looking at.

Steve: Ok, and then, so then you would have your, what was the set method you mentioned that compares the tables?

Sebastian: Set equals table.

Steve: Ok, so a set equals table would make sure that your two tables are the same for your one row and then I assume you would just continue on with that for the more than one rows, the sum.

Sebastian: The sum rows, yes. And sum really means stick with the few. I usually recommend three or so. I rarely run into a case where you really need more than ten rows. And important to note here is unit testing is not performance testing. So you want to set your test data as minimal as possible. You want your test cases to run as fast as possible. In fact, you want to run all your test cases every time you make a change. The only way that is going to happen because we all are impatient and lazy. If executing a test case is easy and fast.

Steve: Right. We had the same recommendation in Episode 72 with Lin. We were talking about test automation for data warehouses. Now is kind of here thing is like find a representative sample, right. Don’t need the whole database, find your sample, couple of customers that you’ve worked with and start there. It sounds like you’re saying the same thing.

Sebastian: Absolutely.

Carlos: Ok, so then one of the next questions that we had was around who was using this? We’re not really looking for specific company names, what we are looking for is like what type of people, what type of roles are the people? Are they software developers, or are they DBAs, testers? Who is really using the product and getting the most value out of it?

Sebastian: Anybody who is writing SQL modules should be using it. And I don’t have company names so I couldn’t even answer that. Anybody who earns a living with writing SQL modules should test those modules. While I’m extremely biased I would say SQLT is the best way to do that.

Steve: You know, a good example is that I was working on a store procedure for a client recently where after the 1st of January it behaved differently than it did in the months that we’re near the end of the year, and after seeing the problems that we ran into there because of somebody doing month addition or subtraction. It would have been really nice to have a tSQLt test case for that so that we could have run through and done extensive testing there beyond what it originally done with. That’s something that I could think would be great use for it.

Sebastian: Yeah, that’s a boundary condition that crossing from one month into another one. There is a heuristic for that too. You want to make sure that code is behaving correctly at those boundaries and following test driven development and heuristics would have caught that problem most likely before it actually became a problem.

Carlos: Yup, ok, so if somebody is listening and they think, “Oh, I want to give this a try.” Where would you recommend the good place to get started or steps to follow or where to learn about using this? How do you jump in using tSQLt?

Sebastian: It’s called tSQLt and the website for it is tSQLt.org. It is open source so you can just go there download it and start playing with it. There is a small introduction manual if you want to call it, that too also a complete functional manual almost complete. There are few functions I haven’t gotten around documenting yet.

Carlos: Yup, ok. So even more than I think just to jump in a little further there is, you talked about, so the case, you made the case that gosh if I had unit tested this I could have avoid the problems down the line. But it’s almost a mentality shift because, you know, just use that example. I have a store procedure that needs a view. Well, I’m going to be thinking, “Ok, let’s see what my view needs to have so that I can feed this procedure.” And it sounds like I need to reassess what it is that I’m asking or what it is my requirements are. And so how do you begin that mind shift to say, “Ok”, instead of just jumping in and creating the view let me think about how I could setup this use cases or what questions I need to ask myself to get them setup. Does that make sense?

Sebastian: Yeah, that make sense. So unit testing, automated unit testing is a complete mind shift. It is hard for people to get use but once you have done that a little bit regularly. It’s not actually hard. It’s scary at first but you get use to it very quickly.

Carlos: Yes, I guess you can tell where I fall into my automated unit testing practice.

Sebastian: Yeah, just think about what does this view need to do for me then pick the thing that seems easiest for you right now to write a test for that functionality. And then write the test and then implement the functionality and then pick the next thing.

Steve: Ok. Now, I know I’ve seen at a couple of SQL Saturdays. I’ve seen Steve Jones with Red Gate presents on tSQLt and the Red Gate tools that integrate with it. Can you maybe share with us what part of that are the Red Gate products versus what part of it is the tSQLt framework?

Sebastian: Red Gate provides that easy button. tSQLt is a bunch of SQL scripts and to write your test cases you write store procedures that follow a naming convention. Actually, you need to create a what I call a test class. The reason I call it a test class is because that’s a naming convention that JUnit Framework, the first unit testing framework put in place and tSQLt is following the JUnit convention. So a test class is really a schema that has been marked by the framework as a schema that contains test cases and then to mark that test case store procedure in that schema you just need to start their name with let us test. That allows you to put other help a store procedure in that same schema. That makes it very flexible even though naming convention seems rigid at first. Now, to execute your test cases you have to run the store procedure tSQLt run all. That runs all the test cases in the database. A run takes parameters what you specify what you want to run. And that can be at times a little cumbersome that you have to actually call that store procedure. What Red Gate did is they provided a log in for Management Studio that makes that easy to execute test cases. You just have to select the test case you want to run and click the run button, and there’s a nice GUI on top of tSQLt. And I have to say Red Gate isn’t alone anymore. There are now I think three other vendors that offer similar functionality, but Red Gate was the first.

Carlos: Interesting. Now, so you’ve mentioned the documentation. Is there an example, I mean, so I think again, a knuckle dragger that I am It would be helpful for me to see some examples. Like, “Ok, here’s a view, here’s a procedure.” Let’s walk through that. I know you mentioned you had the procedures there to kind of get going. Are there good examples out there that people could reference to kind of get a handle on this?

Sebastian: Yeah, so there’s the user guide which is really the manual of the procedures and the processes. There’s also a quick start guide on tSQLt.org which is a very basic introduction into how to use the framework and the unit testing in general.

Carlos: Very good.

Steve: So for the unit testing to work, is this something that the people use and need to have, like if it’s developer sharing a database that they have to have their own individual instance or database they’re working with. Or can multiple developers work against a shared database using tSQLt.

Sebastian: As a general best practice, you want to have your own development environment. And SQL Server Developer Edition now being free there’s really no reason anymore to not have your own development environment.

Carlos: Very good point. Yeah, ok.

Sebastian: And the database code has to be, it’s not even an option anymore, has to be on source control and it’s easy to run, install your own copy of the database.

Carlos: So when you say it has to be in source control does the tSQLt do something to integrate to pull things from source control?

Sebastian: No, tSQLt is only for testing. But Red Gate has a nice plug-in, Red Gate Source Control, that even manages things like migration scripts.

Carlos: Yup, ok. So with tSQLt being an open source project. Is that something that you have other people contributing with or is it currently just you or where’s that at?

Sebastian: It started out with Dennis and me. And Dennis got lost somewhere on the way, alright, now it’s only me and every once in a while I get somebody asking if they can help out but they lend us up here too for whatever reason. So yes, I would love to have help on the project and the project is on GitHub. You can find it there if you search for tSQLt. I have been stretched a little been over the last few months so there has not been a lot of progress on tSQLt. I always doing a little bit but I haven’t actually put out a new version in a while.

Steve: Yeah, I’m seeing this, there are some issues and it looks like people are contributing some issues. 

Sebastian: for additional functionality for problems to be fixed.

Carlos: With that, part of the beauty of GitHub and being open source is that when people asked for things to be fixed they can also contribute.

Sebastian: Yes. And for a contribution to be accepted is it’s important that it is well covered with tests.

Carlos: That’s a great point since it is the SQL testing tool. It would be silly for someone to submit a change or a feature that isn’t well tested, right.

Sebastian: Correct and you will be surprised what I sometimes get.

Steve: So one thing that you have said is part of the story there was and I think you mentioned earlier, was that tSQLt or unit testing is not intended for performance work but it seems like it’s a great way when you’re doing the performance work to now if you broke anything or if you’re given the right results.

Sebastian: Correct. Anytime you have to change your code you want to be sure that you haven’t broken anything. And have you ever been in that situation where you change a little bit of code and of a store procedure because of a requirement change and other feature that you thought was totally unrelated suddenly broke?

Steve: Oh no, that never happens.

Sebastian: Unit testing is really good at showing you those scenarios while you’re still are in the code. So instead of hearing about it months later, “What did happen there? I don’t remember.” You get that feedback within a few minutes so you still know what you did just change and then you can easily understand hopefully easily as a connection figure out how to actually make it work. And that’s really one of the things behind Agile Development that we haven’t talked about yet. Unit testing is part of that Agile Development mantra. Agile development is all about reducing the lengths of your feedback loops. If something breaks I want to hear about it as soon as possible. The longer it takes, the more time passes the more the cost, call it exploit, it is actually exponential.

Carlos: The risk increases of your ability to quickly resolve that.

Sebastian: Yeah.

Steve: And it’s far cheaper to fix the bug that you find when you run your unit test than it is to test the bug that a customer reports after you released the code.

Sebastian: Correct and so far has like three 0s.

Carlos: So a lot of people like to pay lip service to that and you know particularly organizations, you know. I mean a lot of people say they’re running Agile and they really just kind of have a lot of meetings, right. What kind of numbers are you seeing from the tSQLt side, I mean, just from downloads. You’ve been working at this for a couple of years now at least from the GitHub persepective.      

Sebastian: It is still very low. I would like to add a zero to the number of people who downloaded it.

Carlos: So you need a few more people to get in there. And again just for me I think there’s still a pretty tall barrier to entry.

Sebastian: That’s the thing with unit testing in general. Test driven development in particular but even if you just do unit testing makes it a lot easier to maintain your code in the long run. You spent about 80% of your time maintaining existing code and only 20% of that writing your code. That makes sense to figure out a way to reduce the cost of maintenance. And if you have a good set of test cases it makes maintenance so much easier. It actually makes you more likely to implement a change too because you’ve learned now that it didn’t break anything. It reduces a fraction too good thing there. But anyway and to write that code of unit test that’s a lot of work so the actual writing of the code becomes more expensive. The problem there is that that cost us upfront. Now it’s hard to get into developer’s minds, it is really worst investing the time now because it will save you later. But it’s also hard to get that into the management mind because a few showed that in the old way would have been, let’s call it done, tomorrow is now going to take three or four days to be done.

Sebastian: Yes, so management needs to buy and it’s also hard to get them agree to it because it’s not going to take one day to finish the code but it’s now going to take four days. But the thing is you have the code done in one day and it breaks in a month from now and it is installed in a customer. You cannot really call it done in the first place. From my perspective code is done if it is proven to be correct. And that’s when you achieve this automated unit testing.

Carlos: Sure, and often times though when management buys in and say they want to be an Agile company or business unit. They don’t always buy in to the extended time that it may take to make some of those things happen and that’s where the challenges come up.

Sebastian: That is correct, yeah. Management has always more than one first priority.

Carlos: That is so true.

Steve: That’s right.

Carlos: So shall we go ahead and do the SQL Family?

Sebastian: Sounds good.

Steve: Alright, so Sebastian can you share with us how you first got started with SQL Server?

Sebastian: Yeah, that was the same company so when I was done with my Ph.D. in Germany. The job market had just tanked so I decided to try my luck in the US and ended up getting a job with a single phone call. That was luck. But the one thing that I told the guy in the interview was that I really like programming but I don’t want to have anything to do with databases.

Steve: Yeah, look at you now.

Sebastian: So he said, “Ok, that’s great.” And he ended up putting me in the database team.

Carlos: It must have been an accident, right, the accent, you know, it didn’t quite get across.

Sebastian: And to be fair my first assignment was C++ code there. I was working for the driver of those 128 store procedures. But then at some point, as our team member said, maybe we should ask the Ph.D. to help us with this performance issues and that’s how I got pulled into actually working on SQL Server. Somehow I learned liking it really quickly.

Steve: Very cool. Now, if you could change one thing about SQL Server, what would it be?

Sebastian: Yeah, a lot. What we really didn’t talk about today yet is testing an isolation. And to test and isolation in other languages you use things like test doubles, mark objects, so replacements of the actual object that behave in an expected way so that you can write your text around that. tSQLt provides that functionality by providing fake tables and spy procedures and fake functions and to make that work you have to go really deep into the internals of SQL Server and two things SQL was not written for. I would really love if that could be easier. By example the feature that’s most requested right now that I haven’t been able to implement completely yet is to deal with schema bound object.

Carlos: Alright, yeah, I noticed that on the request list.

Sebastian: To get a schema bound object replaced was a test double I need to first replace a schema bound object with a non-schema bound object. So after that I need I need to script out the object which we know SQL Server has that functionality built in but it doesn’t make that accessible to see for our code. I have to hand code that script thing again while there are really not a lot of mistakes or bugs in tSQLt. The ones that keep sneaking in are things like, I forgot to account for, it was a scale in the new date at DATE_TIME data type because I hadn’t use it.

Carlos: Got you.

Sebastian: And honestly I shouldn’t have to think about it because SQL Server should provide that functionality. And another thing that would be great is there would be a way to measure code coverage. Which there is one framework out there now that is doing something like code coverage based on extended events but It would like to get that a lot more fine grains in that solution.

Carlos: If you would forgive me, what would you mean by code coverage?

Sebastian: Code coverage means if I have my code and I have a bunch of test cases. I want to be able to pin point the areas in the code that have not been covered by a test.

Steve: So for instance a chunk of dead code that’s part of an IF statement that never been hit or something.

Sebastian: Part of an IF statement but likely will be hit in real life but I’ve forgotten to write a test for it. If you follow test driven development very closely that will not happen. Most people don’t and then the code coverage really becomes handy.

Steve: What is the best piece of career advice that you’ve received?

Sebastian: Keep learning.

Carlos: Yup, never stops.

Sebastian: Don’t stop learning, you know.

Carlos: Sebastian, our last question for you today, if you could have one superhero power, what would it be and why would you want it?

Sebastian: Yeah, so there was this movie ones very long time ago. It was about this Robot #5.

Steve: Short Circuit.

Sebastian: That goes back to keep learning. I love to read and I suck at it. I would really like to be able to read that fast.

Carlos: Ok, speed reader, very good. Well, Sebastian thanks so much for being on the program today.

Episode 84: MVP

The Microsoft Most Valued Professional or MVP award–while it’s hard work and dedication to be called an MVP, what are the benefits? Well, Kathi Kellenberger is a great member of the SQL community, traveling to many SQL Saturdays, speaking at conferences about SQL server, and this time she is here with us to tell us what it really means to be an MVP and the great journey it has been to becoming one.

Do you actually know the percentage between men and women MVPs in the world and US? How many posts do MVPs have on forums, helping others? How do they keep track of all the SQL Saturdays, User Groups, conferences they have been to? Kathi will be sharing her story, the process that she went through until she became an MVP, how you get nominated, and how to get renewed!

Episode Quote

“Do these things because you love to do them. Don’t do them because you want to be an MVP.”

“Women, you’ve got to get over your reluctance to promote yourself.”

“It’s not a job, nothing paid […] We’re kind of a free support for people in a way. A lot of what I like about it is that I can share my enthusiasm about SQL Server.”

Listen to Learn

  • What it’s meant to be an MVP?
  • Benefits of being an MVP.
  • MVP categories
  • The process and criteria on how to be awarded as an MVP.

Kathi’s Blog
Kathi’s Kerberos and SSRS course
Kathi on Twitter
Kathi on Pluralsight

About Kathi Kellenberger

Kathi is an independant SQL Server MVP and Teammate with Linchpin People. She is author of “Beginning T-SQL 2008” and co-author of “Professional SQL Server 2005 Integration Services,” “Beginning T-SQL 2012,” and “SQL Server MVP Deep Dives.” Kathi enjoys speaking and writing about SQL Server and teaches the occasional SQL Server class.

Transcription: MVP

Carlos: This is Carlos Chacon.

Steve: This is Steve Stedman.

Kathi: This is Kathi Kellenberger.

Carlos: So Kathi, welcome to the show.

Kathi: Thank you and thanks so much for inviting me. I’m really excited about it.

Carlos: Yes, I’m glad that we finally were able to put this together, and of course, you know, the rest as I say is history. But this is something that we’ve been trying to get together since October of 2015. It’s a long time of coming, we’re glad to finally have you on the show.

Kathi: Yeah.

Carlos: And as your Twitter handle suggests @Kathi, right. You’re a family member to all and a great member of our community and so we’re excited to chat with you.

Kathi: Thank you.

Steve: We have an interesting, oh, go ahead Carlos.

Carlos: No, no, go please.

Steve: And we have an interesting topic to cover today and that topic is?

Carlos: Yeah, we want to talk a little bit about the MVP, Microsoft MVP. What it means to be an MVP and what’s it meant to you and so I guess let’s go ahead and jump into that, and give is a little bit of your story there. How long have you been an MVP first, I guess? 

Kathi: Ok. I was first awarded the MVP in 2008, and then in 2010 I took a detour in my career and ended up at Microsoft for about a year and a half. So I lost it during that time and so then I got it back again in 2013 and have had it every year since then. Kind of a weird back and forth but once I was eligible again I got it again. That’s how long.

Carlos: Now ultimately looking at the MVP website, Microsoft is looking for two professionals to provide support and guidance if you will to the community kind of taking their time to help others along their way particularly with the Microsoft technology stack. I guess what do you see or kind of the qualities of being an MVP?

Kathi: Yeah, again, probably should clarify that. It’s not a job, nothing paid. It’s an award that Microsoft gives so I feel like it’s in some ways we’re an extension of Microsoft sales because the things that we do I’m sure help companies use SQL Server more effectively. We’re kind of a free support for people in a way. A lot of what I like about it is that I can share my enthusiasm about SQL Server. I have this problem where when I learn something, the first thing that I wanted to do is go tell somebody else about it. And this is a great way to be recognized for that because really it’s like a 20 hour per week unpaid job in some respects because I feel like that’s about how much time I spent a week on MVP type activities. Some of it you get paid because if you write a book you get paid a little bit but for the most part it’s like a part-time unpaid job.

Carlos: Interesting. I had never heard that number before, 20 hours a week, that’s quite a bit. But when you think about some of those other things, I mean, you mentioned about writing a book, blog posts, speaking in front of people, preparing a presentation.

Kathi: Right, yeah, you know Perl site courses. I volunteer at a non-profit to teach people about SQL Server, User Groups, SQL Saturdays. Yeah, you know, some weeks are more than others but being an independent consultant right now if I only am billing 15 or 20 hours that week I am going to fill in the rest of the week with the MVP type stuffs.         

Steve: So then when you’re dealing that and you’re doing your 15 or 20 hours that’s focused on that. Are you keeping track of that and somehow having to report that to Microsoft or is that just what you do to be out there and be recognized as an MVP.

Kathi: Right, right. You don’t have to keep track of it in an hourly type basis. I did keep track of it for about two months just because I was interested in how much I was staying. But then that time keeping, you know, that gets to be exuberant in itself so I just wanted an idea so I kept track. What you have to do as a current Microsoft MVP there is a website where you are to upload your activities. So if I speak at a SQL Saturday, I have to fill out a form and said that I spoke at SQL Saturday, there were 30 people in the room, you know, linked to it. So you have to record your activities and that’s for renewal. So let’s say you are awarded the MVP, you don’t have to do anything for the next year. You’re an MVP, you could drop off the face of the earth as far community stuff is concerned. There are no requirements but to get it renewed you have to do things. You do report but it’s not an hourly type thing.

Steve: Ok, so then the title MVP stands for Microsoft’s most Valuable Professional, is that the right?

Kathi: Something like that, yeah. Most Valued Professional I think. 

Steve: Yup. And then so you’ve done all these work and you need to keep doing that work in order to keep getting it year after year. I mean, that’s a lot of things you need to do there and what are the benefits that you gain for doing that?

Kathi: My favorite is a free MSDN subscription so it would include Azure credits. That’s probably my favorite because I can really use and have available to me very easily any of the developer tools, SQL, Windows Operating Systems, things like that for training, or experimenting, that type of things. That’s really my favorite and other probably my second favorite which I don’t always take advantage of is the MVP Summit up in Redmond. That’s the last couple of years that’s been like either overlapping, one year is overlapping the past summit, and the last couple of years it has been like a week after, couple of weeks after. But that’s just really awesome because you were on the Microsoft campus. You’re surrounded by, you know, in my case lots of my friends are MVP. I’m pretty lucky. Get to hang out with friends a lot more and learn about the new things that are coming up. That’s probably the third thing that’s a really great benefit is you get a lot of insider information about what’s coming up, what SQL version is next, or all of the things that are not public knowledge at this point. Obviously you got to keep those things to yourself but that’s pretty cool. They also have webinars from time to time about things that are coming up. In fact, the last year or so they’ve really increased the frequency of those webinars so if there is a topic you’re interested in just watch and chances are there will be information about what’s coming up, or maybe what’s in the next service pack, or what are the new features in the next version of SQL. There are really a lot of benefits that way.

Steve: Ok. Now, I’ve seen that there’s like different categories of MVPs in each country like you might have a Data Platform MVP or an Azure MVP. Am I understanding that correctly?

Kathi: Yes. In fact, last year they revamped all their categories. We were SQL Server MVPs and now we are Data Platform MVPs. Another really interesting thing is that you can now be awarded MVP in multiple areas so for example if somebody is really focused on SQL Server but they’re also really focused on Windows Server they might get an award in that area.

Carlos: Got you. Yeah, PowerShell is probably another category that overlaps quite a bit as well.

Kathi: Yeah, yeah, and I don’t know all the categories. I just look for what I do so at this point everything has been SQL Server. However, I’ve overlapped a tiny bit into the security area because I did a course for Perl site and I’ve spoken on it at a few places on Kerberos so I just want to have and put those in sure security. Whenever you add your activities you have to kind of put them in a category so through people that have different areas. Probably one of the big, I talked about the tangible benefits but for probably another thing is if you want to be a consultant this is probably not necessarily that great to if you are just like a database administrator at one company. I was when I became an MVP originally and my company did not care. They really care less. But as a consultant or somebody with your own company it is gold. It’s going to get you noticed and really be beneficial.

Carlos: One of the things or criteria they talk about creating on as well or is included if you will of contributions and that is form replies and answers. You know, so just having, you got somebody who searches forms and what not or put questions out on forms. I am always amazed at the people who can respond, the #sqlhelp is probably another good example, right? They’re just like, they’re waiting, right, and they just. I mean, almost it seems like and what’s also interesting is that they have really good answers. So couple of people that kind of jump to mind is Aaron Bertrand on Stack Overflow. I don’t know how many points he’s got. Robert Davis is another one. He’s probably on the SQL Help. Seems like he’s always charming and every so often. I have never been able to get into that because I guess maybe a knuckle dragger I guess I don’t know all that much.

Kathi: Yeah, I’m sure you knew. 15 years ago I was on SQL Server Central forums all the time and answering lots of questions. In the 90s, I was on whatever was back then for Visual Basic and constantly answering questions and I don’t know if as I’ve gotten older I can’t multitask as well. I just don’t get, not only that, not only forums but the same people are on Twitter and on Facebook. And all these things all day long and I’m like, “Man, I’ve got work to do”, so that is one area where I did not contribute much. I’m very grateful for the people that do contribute but I do not. I’m more likely DB writing, recording, or presenting. Those are the things that I’m going to do but as far as answering questions, man. And the bad thing I really like doing the TSQL questions on at one time and actually at Pragmatic Works when I was there. They really encouraged us to answer question when we were on the bench. The problem is which I wasn’t really on the bench that much luckily. But the problem is for me someone puts a question out there. Maybe they have the schema and the example and then they, maybe they don’t so you have to asked them about it and you become so, it’s such a commitment to the question. You know, getting out the information from them and maybe you got a solution. Well then you’ve got a, you know, create a little temporary database, not a temporary database, create a little database and populate it and try to work it out. I just feel like it’s too much of a commitment so I don’t do it. But yeah, I pretty much stick and obviously Microsoft does look at that. That’s one of the things they look at but luckily I do enough writing, recording and presenting, and volunteering that doesn’t hurt me that I don’t answer questions.

Carlos: Yeah, the metric they give on the MVP side is this guy from Brazil. He had 728 helpful posts.

Kathi: That’s awesome.

Carlos: That must be total, over the year, 185 answers and 407 posts. Like, holy cow.

Kathi: I guess you can do that. You know, I do remember at one time and I was at jobs. I was getting paid and I was able to do some of that but, you know, I can’t.

Carlos: Sure, that’s true.

Kathi: I can’t, so I don’t know maybe I just can’t multitask anymore.

Steve: You know, I feel that same way too. I think that if I’m going to go and answer questions on Stack Overflow or whatever out there I’ve got to focus on that. And that’s something that I can’t really do when I’m doing work that I’m getting paid to do and there’s got to be that separation there. That bench time that as you mentioned it, if someone has a lot bench time well they might have more opportunity there but I think it’s a balance to figuring that out.

Kathi: Yeah, absolutely.

Carlos: Part of putting together those presentations or even the courses, I mean, you mentioned your book and having just put a book out myself kind of that time and energy that it takes to do that. But it also kind of helps you like the side benefits there is that you that become more familiar with that topic and can talk to it more rather than this specific TSQL problem that they’re having or they’re trying to loop or, you know, whatever the specific implementation is.

Kathi: Right, right, yeah. I just like what I do to scale. Occasionally people will send me a question personally. I will try to help them if I can.

Carlos: Right. And that’s different, if they send it to you personally that’s different.

Kathi: Right, yeah and sometime. I just had a gentleman who I don’t know if they were not familiar with SSIS but they had wrote a post on my blog site about SSIS, and had a couple of questions and I was able to help them very easily with that question and I felt good about that. But sometimes I feel like and I haven’t had anything like this for quite a while but sometimes I feel like they’re trying to get free consulting out of you and it is just too much. There are times whenever I’m asked personally and I say, you know, this is bigger than we can do, you know email it. Maybe you want to put this question out on this other forum for people. You know, if somebody sends me a question about replication or something. I mean, I’ve used it but I’m not an expert on replication. That would be better to go ask that on a replication forum.

Carlos: Yeah, there’s a fine line, right there. You know, particularly as the MVP, you mentioned the time you have to put in and then kind of chasing some of these specific problems when all the hundreds of myriads of variations that could take place can be difficult.

Kathi: Yeah, and that time just to clarify, that’s just an estimate of what I do. I don’t know what other people do and there is no requirement. That’s just pretty much what I do on average.

Steve: Ok now, you’ve mentioned that you’ve been an MVP twice and that you originally have that and then you went to Microsoft and you became an MVP again after that.

Kathi: Right.

Steve: Now, how did you go about getting that? Is that something that somebody nominated you for or what is that process and how does that happen?

Kathi: Right, so what usually happens, and this is kind of funny, but you will be at past summit or SQL Saturday and people start asking you, “Hey, are you going to the MVP Summit this year?” And then you will say, “Well, no, because I’m not an MVP.” And then they’ll say, “Well, you should be.” And then they will nominate you so that happens a lot. I guess I can say who nominated me originally because this happened in, I guess around 2007, people started saying this to me. Andy Leonard, and there were like three or four people who said I’m going to nominate you. Well, one person did and that was Andy Leonard. And I knew you just had him on a podcast the other day because I’ve listened to a part of it. He’s such a sweetheart but he nominated me. Of course I didn’t get it the first time. It took a couple of times for it to go through. But that’s usually it, you fill out this form. You can nominate yourself but that’s kind of weird but somebody who thinks that you deserve it will nominate you, and then you may or may not get contacted from Microsoft. And if you do then they’re going to ask you to fill out a form listing all your activities for the last year, and then you may or may not hear anything back again. It is kind of weir because it just seems like, if you don’t hear anything. If you’re not having a conversation back and forth don’t expect to get the email on the MVP Award Day so what I would think, you know. I would think you probably have some conversation back and forth with them. But even if you do that is no guarantee that you’re going to get it.

Steve: Yup, so mentioned the MVP Award Day and I saw recently, right after the first year, I think there are a bunch of MVP renewals and new MVPs out there. Is that an annual thing or a quarterly thing? How does that work?

Kathi: It’s quarterly. So it’s January 1st, April 1st, July 1st and October 1st. The really funny one is October 1st because it’s, not October, but April 1st because it’s April Fools’ Day, you know, that the really funny one.

Steve: We could all be MVPs for a day on April 1st.

Kathi: We could be, we could be, yeah, so it’s a pretty exciting day for you if you’re up for renewal or you’ve been nominated. You’re going to be checking your email every 10 minutes. But the frustrating is it comes out late morning or early afternoon. Usually because it come out, so they’re coming out of Redmond probably so that’s specific time so, you know, you’re waiting. It is an exciting day if you get renewed, so far so good. I need to tell you something really cool about me in particular with MVP. So I first received an MVP in 2008 and at that time I asked the MVP lead, kind of liaison at Microsoft that deal with it. I’m like, “How many women are MVPs of SQL Server MVPs?” He gave me these numbers. He knew at that time they were, this is 2008, there were 4,000 MVPs in the world of any area.

Carlos: Total?

Kathi: Total. At that time I think there are around 250 or 260 SQL Server MVPs at that time, and I don’t what it is now. And there were 60 SQL Server in the U.S. and 6 of those were women. So I was like, “Wow!” The numbers are just fluctuating constantly partly because Microsoft keeps hiring MVPs. They’ve hired a bunch of them recently.

Carlos: That’s true. They have been on the hiring spree.

Kathi: Right, but I think women have, you’ve got to get over your reluctance to promote yourself. I’ve had that. I feel like women or girls are taught from early on, you know, work really hard and be really smart but don’t let people know about it. I feel like we’ve been taught that from childhood and so I had to get over that and start learning how to really promote myself.

Carlos: eah, that’s interesting. As the father of four girls, that’s probably something I need to key in on because, yeah.

Kathi: Yeah, we were really taught to be exceptional but hide it. You know, I saw that in my own daughter when she was a teenager. Especially if boys were around, all of a sudden she was not quite as smart. Yes, you know.

Carlos: Interesting.

Kathi: Yes, because she doesn’t want to be intimidating. But I think that one of the luncheons at past summit one year they talked about becoming an MVP and they started and saying, “Hey, you need to let people know what you’re doing.” And then there were quite a few of us that became MVPs around that. I don’t remember what year it. I think the number is still pretty small compared to the guys. But I think we’re represented probably about the same percent that actually is working in the field so I guess that’s not bad.     

Steve: When you talk about letting people know what you’re doing or promoting yourself, what type of things did you start to do differently at that point in time?

Kathi: Yeah, I think it was a gradual process. But what I would do is, like say for example, I have a book coming out. I would contact Steve Jones and say, “Hey, I want to write an article that’s going to promote my book.” So I started to think of ways to promote myself. I’m not much on Twitter but if I have a new blog post, or have a new Perl site course, or whatever I’m going to get out there and promote that. I think part of it is speaking. Trust me I’ve got a C in Speech in college and I was a very shy, terribly shy. And speaking in front of people is not something I would do but I made myself do it. Those are ways to I think at least that I did myself out there. I look for opportunities that can help me in multiple ways is really what I do.

Steve: Ok, so how many times in a year would you say that you generally speak at a SQL Saturday or other big events.

Kathi: Let’s see, darn.

Carlos: Put you on the spot there.

Kathi: Yeah, yeah, I’m going to say at least 15. You know, I’ve had to go back out to my MVP thing because I have everything tracked there. But I try to go to at least 6 SQL Saturdays a year. I’ve spoken at past summits every year for the last four. I don’t know how long my luck is going to hold out there but somehow I’ve managed to do that. I speak a lot for the local User Group, plus I do a lot of remote so I’m going to say between 15 and 20 presentations a year.

Steve: Oh, that’s really good out there. I mean, that’s more than one a month and that’s a big commitment.

Kathi: Oh yeah, but you know what, to me once I started speaking it became a little bit addictive. It’s thrilling, you know.

Steve: I think I have that same feeling on the addictiveness of speaking. It’s just the travel side of it that slows me down.

Carlos & Kathi: Yeah.

Steve: Ok, so I guess at this point then is there anything else that you want to share with us?

Kathi: Yeah, I think that making MVP is a pretty, there’s a pretty high bar to make MVP and I don’t want people to be discouraged if they don’t make it. Keep doing the things that you’re doing. When I was at Microsoft even though I wasn’t eligible to be an MVP I continued to do the same kinds of things I was doing because I love it. Do these things because you love to do them. Don’t do them because you want to be an MVP. I think an MVP is got to be, the MVP award is a side effect of doing these things you love. I don’t love posting on forums but I’ll do it. But luckily if for some reason I was not going to be an MVP anymore. You know, that’s always a possibility, would I stop writing books, you know, maybe. But I’d still be writing blogs. I’d still be doing Perl site courses and speaking just because I love to do those things.

Steve: Ok, that’s sounds great. Shall we move on to the SQL Family questions then?

Kathi: Sure.

Steve: Can you share with us how you first got started using SQL Server?

Kathi: Yeah, it was kind of crazy. I had just become a developer. I was a mid life career changer and this was 1997 and I was working in a really tiny company that there were only four of us at this little startup. And they had bought a package of training courses for this other guy that worked there. And he was wanting to take, now they call them Systems Center, but one of those type classes Operations Manager, one of those type classes and the course just kept getting canceled and the expiration date was coming up and they said, “Kathi, how would you like to take one of these courses at this training company because it doesn’t look like this guy is going to use it.” So I said, “Sure.” And I saw there was this SQL Server class and I want to had took that class so that was early 1998. So if it wasn’t for that fact that that guy couldn’t get his Operations Manager class in. I don’t know how my life would have changed. I’m sure it would have got in you know. So that was a SQL Server 6.5 class, so that was my first actually getting to do anything or seeing it. And then shortly after that, well then I started actually using it, implementing it at customers.

Carlos: Sure, interesting. Now, in all that time when you were on SQL Server, if there was one thing you could change about SQL Server what would it be?

Kathi: Well right now, since Service Pack 1 came out and all of these cool features have been downed. I don’t know what you want to call it now in Standard. There is one feature I’d love to see back ported down to Standard Edition, and that’s read-only replicas and AG so that you can do availability groups in Standard and you can actually do that in 2016 before Service  Pack 1. But they’re only, they’re only for fail over. You know, I would just love to have this read-only replica in Standard. That’s what I would love to see.

Steve: Well, perhaps we’ll see that in the next Service Pack or the next version, who knows.

Kathi: Yeah, I’m not going to hold my breath though.

Steve: Ok, what is the best piece of career advice that you’ve received?

Kathi: Yeah, so like I said was a mid life career changer. I had automated an auction at my kids Grade School. I was a pharmacist at that time but I automated it with that Microsoft Access. And this other parent saw what I was doing and he worked for a company that produces training materials for technology. And he saw what I was doing and he said, “You could be a developer.” That was really the first time anybody, this was like early 90s and somebody said something like that to me, “You could be developer”. And he, you know, got me some training materials, encouraged me to take a certification test. So I would say that was probably the best advice, just someone saying, “Hey, you could actually do this.” That really got me started down this path.

Steve: And that was actually triggered the change from Pharmacist to IT side of things, developer, SQL Server? 

Kathi: Yeah, there are a lot of things that happened along the way. Probably the first thing that happened was I was in my last year Pharmacy college, working my Bachelor in Pharmacy back then. And there was a computer course where they had TRS-80s in the room and the first time I saw a little program, we had to type a little programs in, and record them on a tape recorder. The first time I actually typed a little program in that TRS-80 I was hooked from that moment on. It was like a light switch went off in my brain. I had programming and things like that as a hobby to just write some things for home up until I started doing some things for my kid’s school. And then really that automating that auction was the project that got me started down the path, you know, before I felt like it was something that was out of reach.

Steve: Very interesting.

Kathi: Yeah.

Carlos: Our last question for you today Kathi. If you could have on superhero power what would it be and why would you want it?

Kathi: So I’m going to change this up a little bit. I’m not really into superheroes that much but I love Star Trek.  I love Star Trek and I watched the entire Star Trek: Next Generation when I was on the treadmill last year. And this year I’m double dipping. I’m watching Voyager and Deep Space 9 both at the same time this year. I always love Star Trek but I didn’t get a chance to watch it a while when my kids were little, so busy, so now I’m actually getting to do it. Seeing episode here and there but my favorite technology on Star Trek is the replicator. If I could have a replicator and it’s going to look like a little microwave that I can just say to it, “Computer, you know, tea hot, Earl Grey ”. You know, they can get you any food that you want or any piece of clothing that you want just like that. I just think that would just be amazing. That’s what I want.

Carlos:  There you go. I think Amazon would pay a lot of money to have that not be built.

Kathi: You’re right. But we have the 3D printers now that are not quite the same as replicator technology but we’re heading down that road. People are printing all kinds of things, you know, so. Someday we will be there. In fact, Apress got a book. I don’t have the title in front of me but Apress got a book about Star Trek technology.

Carlos: Really?

Kathi:  Yes. And how they came up with the ideas, where the Science is today, where it’s going. There’s a chapter on each technology.

Steve: Oh yeah, it’s really quite amazing how much technology has come out of things that were surely science fiction. Like the communicator, and then we have the flip phone and now flip phones aren’t even new technology anymore.

Kathi: Oh yeah, these little computers in our hands that we walk around with are better than the SIRS I was using in the late 90s. You know, they’re amazing. They’re better than what they had on Star Trek, you know. Everything you could ever want to know is right in there. Then you can to talk it to if you want to.

Carlos: Well Kathi, thank you so much for joining us. It’s been a pleasure to have you.

Kathi: It’s been a pleasure for me. It was a lot of fun.

Steve: Yeah, thanks Kathi, this has been fun.

Kathi:  Yup. Thank you so much.

Episode 83: Data Visualizations

We’ve all been there – you have some data and you need to make it pretty for the report. With the all the new reporting options you have probably been tempted to find a new way to visually present information.  In this episode of the podcast, we are talking about data visualizations and our guest, Jonathan Stewart, has some issues with the way we currently go about it.  Jonathan has been on the show as a panelist and I have really enjoyed his perspective on things so we invited him on the program to discuss some ways you can go about improving the data visualization of your reports, without adding additional burden on process.

Psychology plays a big role in data visualization. To make a good report, before even starting your tool, you should be able to summarize what the report presents. One of the major aspects of psychology is color and how we perceive it. Jonathan explains how we can use this while visualizing data.  We discuss color blindness and I haven’t heard this discussed too much elsewhere and I was curious to know how we can solve this problem with an easy button.

 Episode Quotes

“One of the major aspects of psychology is just color, right? What colors you’re using? And this goes back to my storyboard and you see everything comes back to storyboard, right because that’s like. Actually I have everything done before I ever open a tool. Even the colors picked out should be done before I ever picked a tool.“

“Don’t think of it as, you know, I have to this extra work. Think of it as you’re doing it to prevent extra work. If you develop a framework, you develop your own storyboard and you get your stuff there“

“Out of 30 people in the room, there’s 3 people at least in that room that are colorblind. And then too, there’s not just one type of colorblindness. There are multiple spectrums.“

“We’re datafying everything. We datafy everything from our footsteps, to our calories, to our heart rate, to traffic on the road, we datafy everything because we like to be able to see and analyze things.“

“Are we using the numbers to prove our bias or we’re using the numbers to disprove our bias and to show the proper thing?“

Listen to Learn

  • How to do a storyboarding before even opening a tool to create a report
  • How to include storyboarding in data visualization process without investing too much time
  • How you can prevent additional time spent on report modifications
  • Color psychology behind data visualizations
  • How to create a report that colorblind person can understand
  • How the quality of the source data affects the end result
  • Predictions how the reports and data will be used in the future

Jonathan on Twitter
SQL Locks LLC
Edward Tufte and data visualization
Colour Blind Awareness
See you charts as a color blind person would
Building a data visualization storyboard

About Jonathan Stewart

Data VisualizationJonathan is a SQL Server professional of 18 years. His experience ranges from BI architecture, data modeling, data warehousing, database architecture, database administration and strategic IT planning for Fortune 100 companies. He sees data in everything and is constantly looking for ways to organize, read and analyze whatever data set he can get his hands on. He is passionate about data and making sense of it. He is passionate about using data to effect positive change in his local community of Columbus, Ohio. When not doing data things, you can find him mentoring youth in his community.

Transcription: Episode 83

Carlos: This is Carlos Chacon.

Steve: I’m Steve Stedman.

Jonathan: I’m Jonathan Stewart.

Carlos: Jonathan, welcome to the program.

Jonathan: Thank you for having me. I appreciate you having me on. It’s a great honor.

Carlos: Yes, always interested at chatting with you. And ultimately our topic of discussion today is data visualizations. Now, this isn’t necessarily a reporting discussion but ultimately I guess the way we present and potentially consume reports or as we say data visualizations. So I guess let’s jump into that. And you have an interesting story why is this an important topic for you.

Jonathan: Well, first off one of the issues that I’ve always seen is that whenever we look at data visualizations first thing we think about is a report, a chart, the tool whether Excel or PowerBI, Tableau, Qlik doesn’t matter.

Carlos: Sure, SSRS, a lot of our people using SSRS.

Jonathan: Yeah, Reporting Services. And that’s how we see it but one of their problems, you know, we end up with problems of was the report right and stuff like that. All those problems we end up having with it can be hedge upfront if we look at it from a different view. I begin to look at things from the focal point, the first point. Look at it from the problem and then begin to look at it as a story versus just a report. And how can we get the story better, how can we tell the story better? So I begin to look at, I’ve always been a psychology junkie, so I always love stuff like how these colors make us feel, how this, you know, things like that. All these things begin to build in me about visualization so I begin to study it more and that became almost my focus. I was already a business intelligence consultant so I was already building reports and stuff like that so I got to see things of how people receive certain reports. You know, why pie charts suck. I got to see all these things as I was building out my career and building my focus. It became my interest. It’s a huge interest to me. Everything from how we perceive the world, color blindness, all kind of stuff like that begin to take shape and I get to see it. And as I learn more, and this is just me in general, as I learn more I want to share more to people. I was like, “Oh, this is amazing. Let me share this with them. This is amazing.” So then it became, you know, somebody was like, “You should just speak about it”, so that’s actually one of the things.

Carlos: There you go. Help us here because we’ve had discussions on this show about many of the different reporting options that are available to us, right? And ultimately while there are different needs that’s why there are so many different tools. Why are we fumbling, on what ways are we fumbling when we actually talk about the presentation layer if you will?

Jonathan: One of the major issues that I see in the problem of visualization is that the current tools are sold as self service BI, self service this, self service pumping your gas, and stuff like, of self servicing. The pumping your gas joke is something that Edward Tufte in one of his blogs he made a comment about. He has a huge issue with people like Qlikview, Tableau, even PowerBI, you know you have our tool you can make this. He’s like it’s not like pumping gas. You can’t just take data and just show it because if it has no context, if you don’t know what it means then it is still is a bad visualization.

Carlos: Got you, so we need to be able to better represent what the data means when we present it.

Jonathan: Right, so one of my focus is that I bring it into, like when I go to a client and they’ve never see things like this before I actually storyboard before even I open up the tool. I’ll go and I‘ll interview. If I get a request, I need this, so I’ll go and I’ll sit and talk with them. And it will be a legit interview, almost like I’m building full blow data warehouse. I’m interviewing businesses and stuffs like that. One of the first questions that I ask is, “What problem are you trying to solve with this report?” So before I even open up the tool, well, one of the things too that we do. When we open up a toll, as soon as we start coding something or writing something whether it’s five minutes, or an hour, or 5 days a week whatever, if it’s wrong we really don’t want to change it because it’s our baby, right? Like we don’t want to change it. We write a certain procedure. I really don’t want to just throw it away. We want to modify it. So even before I get to opening any tool, it doesn’t matter what the tool is, I want to make sure that I understand the problem.

Steve: So one of the things you mentioned there was Tufte and I know, I’ve looked at some of his books and I’ve seen some truly amazing visualizations. And I know one of the disconnects that I’ve had is going from here’s the tools we’re given, SSRS or whatever, or maybe that we’re using to, how do we get from that to some of these amazing things that you’re seeing there? And that’s where; I mean even with PowerBI, there’s a lot more features in there. But to really go to some of the stuff that he shown, it’s a big jump.

Jonathan: Right, right. And I think that one of the things as I can get more people to think about in terms of a story, storyboarding, you know, you’ll get to that. So you better see what’s the problem we’re trying to solve? You know, who’s the audience? What action are we trying to take? And then what type of mechanism? Are we doing PowerPoint or PowerBI then, understanding your tone, stuff like that. Are we showing success or failure? Once you have all these questions answered then you can say, “Ok, how can I display that properly?” And then you can start getting to those amazing visualizations that Tufte and all those people show then you can start showing all those things. But once you have all that stuff answered upfront then you can solve. I think one of the problems that we try to jump to that is we say, “Oh ok, I think I can show that in this amazing histogram.”

Carlos: Yeah, exactly. I’ve got some data now let me just throw it in this visualization tool.

Jonathan: Right, and try to make it look pretty, and then it ends up not being what we wanted. It may not be exactly what we want and we try to tweak it. It can still in that being wrong. One of the examples that I use is you have somebody panting for gold out in California. And over the course of a week they have 10,000 tries and over the 10,000 tries let’s they say they find 147 pieces of gold. And so now somebody says, “I want a report of that”. And you say, “Ok, well what do you want in a report? Do you want me to show all 10,000 tries?” You may not want to see all 10,000 tries. You may only want to see the 141 successes. But you may be somebody who’s developing the tools that did the try so yeah you do want to see all 10,000 tries. So which is the right answer? And a lot of times too when somebody gives us a report requirement it may be three reports that we may have to develop, two reports. But we don’t know that until we actually go back and actually interviewing this and figure out who the audience is and stuff like that because at the end of the day it’s not just show us the data.

Carlos: Right, and we’ve talked about this before on the show and kind of maybe the new wave or the new technologist another differentiator is that I think a lot of times like you say,  we’re technologists I have the data I’m going to toss it into the report. I’ve done my job kind of a thing. The new wave or I think what executives are looking for. We’ve had CEOs on this show talking about what they’re looking for in data people. And they want people that understand the data. They want people who understand the business, right? Why are we collecting data this certain way? And I think this could then be a major differentiator to be able to go in and say, “Well you know what I am the guy who develop, use your Gold Rush example. I am the guy who, I understand how to develop this gold developing software, process whatever therefore I can give you better insights into that 10,000, or know how to display that kind of a thing.”

Jonathan: Right, and then two, know who your audience is for too because if you just don’t report for another person who’s going to be out there panting for gold that’s a whole different report than your CEO. Understanding all those things upfront before you ever open up the tool will help ensure that your visualization becomes useful because at the end of the day that’s the whole point to it.

Carlos: Right, that’s not the only problem.

Steve: So one of the things that I see working with clients specifically around reports is that oftentimes you’ll get a request that comes in from one of the customer of the client or an internal customer of the client that says, “Take an existing report that’s like a simple grid type report, and add to it or modify it or do these things to it.” And oftentimes what we run into there is that people know what they’ve seen and they know how to ask for alterations to what they’ve seen but they don’t necessarily know to ask for what they’re really looking for. Do you run into that similar?

Jonathan: I do and I have a couple of tricks that I will share. One of the things I do is like you said, you see that a lot. We have Report A, we want to extend Report A this and this. Once I have all these can I make a one sentence summary of it. Kind of make a three minute story of that. If I take what I have there and I give it back to them and they say, “No, no, wait. That’s not really what I want.” That’s what you will get when you start engaging the user by saying, putting stuff until a story is formed because we understand stories because we like stories. So once you start explaining that to them, they say, “Oh, wait, wait, that’s not it.”

Jonathan: One of the things that we always run across is that we think we know something and somebody may have, not that we don’t have, so it will be I think they know that. And it happens with all of us. We say, “Do such and such.” And we think that everybody may know all of the answers and they don’t. Putting it into a story form and presenting it back then we say, “Ok is this, and this…” And they’ll say, “Oh, wait I forgot to tell you such and such.” So you’ll prompting them to give you more information as well before you ever modify that chart.

Carlos: You’re now assuming that they know something that they may not know, or you know you’re working on different assumptions.

Jonathan: Right, because at the end of the day regardless of what kind of visualization you’re doing. What, you’re doing an infograph, simple chart, histogram doesn’t matter. You should be able to summarize that in words real quickly to whoever is presenting it to you and they just say, “Yeah, that’s exactly what I want.” Whether it’s the grain, everything should be summarized quickly to your user back and say, “Ok, yeah, that’s exactly what I want.” And at the end you could open your tool.

Steve: Sounds like some great advice there. You mentioned psychology earlier, can you tell us a little bit about how psychology plays into what you do with reporting and visualizations?

Jonathan: Yes, definitely. One of the major aspects of psychology is just color, right? What colors you’re using? And this goes back to my storyboard and you see everything comes back to storyboard, right because that’s like. Actually I have everything done before I ever open a tool. Even the colors picked out should be done before I ever picked a tool. So you go back to the tone of your visualization. Are you showing, you know, did you company lost $10 million last quarter? Do you really want to show that with happy colors and stuff like that? Maybe, right, you know. Right, maybe? But you probably don’t want the whole report to be red either. The color connotations of red, you know, how red makes us feel? How certain things we have innate biases to it. So like the color red, the color red has a connotation of power, energy, war, strength and stuff like that. So we use that in that aspect and that’s great. But it can also be violent, and brutal, overbearing, anger. It’s all about understanding the color that we use when we’re using it. And you see a lot of stuff, the visualization, you see the psychology of it and things like company logos, right? Like there’s a reason why majority of company logos have red, blue or white in it, if you’ve ever thought about that? Majority of company logos have red, blue and white because they wanted to show, you know, the red they want to show power, passion, strength, desire. The companies that have white want to show purity, and innocence, and simplicity. And the blue, like Microsoft obviously, right. They want to show confidence, and intelligence, and trust, stuff like this. So they’re already doing these things, you know, showing you these type of things. Even in movies there are certain actors, directors that use color to show feeling because we automatically have these innate feelings of a color when we see them so not going too far from the subject of visualization. We want to go back and make sure that we’re showing, that we’re using the right color. We are not using green because we like the color green. We’re suing green because we want to show nature, or down to earth, growth, vitality. But we got to be careful with it too because it can also be devious, and selfish and greedy as well. So want to make sure we’re cautious on how we use colors in the certain times. You can take a perfect visualization that shows exactly what the user wants and change the colors, and it changes the whole meaning of the report.

Steve: It makes me think I work for a company years ago where one of the marketing directors had basically a rule that nowhere on the company website would green ever be used. I never understood why but thinking about some of the things you’re talking about here maybe they have some kind of psychology behind not using green, and they fail to share that with the development team.

Jonathan: Well, what was the industry they are in?

Steve: It was in sort of self service websites like back in around 2000-2005 sort of early on.

Jonathan: Yeah, because I was trying to think of what kind of negative meaning has green would have or something like that, and they have them in something else that they were thinking about. Yeah, there’s all kind of reasons why people want to do that and then you get the stuff that’s. One of the funny things is like the color orange. Orange is historically a color that you either love or hate. But kids are drawn to it like that’s why Nickolodean’s Logo is orange, right. it signifies like happiness and stuff like that so you get things like Amazon’s logo and stuff to.

Carlos: Well, so it’s funny you mentioned that because people would ask me growing up like what my favorite color was, and I could never really pick one. But I always enjoyed like orange candies and flavor so I chose orange.

Jonathan: Because kids love orange, and this one of those things people don’t think about it until you’re like, “Oh yeah, you know what that’s true.”

Carlos: I think there’s one other area that I want to jump into but I guess I want to maybe stop and say. Well, look all of these sounds well and good and I think yes it make sense. Like we want to present the best data because at the end of the day we want to make sure that our user are happy, that people are understanding the information well. But all of a sudden this sounds like a lot of work. An additional task that I have to that now take on, right, in addition to this. Where’s my easy button, right?

Jonathan: Alright, so here’s your easy button for you.

Carlos: Ok, here we go.

Jonathan: Preventing rework, number one, because at the end of the day if your visualization is wrong you can do it again then you just wasted everything, right? Don’t think of it as, you know, I have to this extra work. Think of it as you’re doing it to prevent extra work. If you develop a framework, you develop your own storyboard and you get your stuff there. You develop your go to 5 questions. Because for me, like I have probably, maybe 8-10 charts and graphs that I use all the time. I already have those in my head. I can throw stuff into them and just go with them. I don’t have to think about those because I already have them. And if anybody who has done reporting sure you guys both have your favorite charts and graphs that you like. So you already have those, you don’t really have to develop them because you already have them and know them. Now if you develop your own…

Carlos: I don’t say, you know, showing the lazy man that I am, right? I mean, 99% of the time I’m taking the default, the default theme, whether that’s in Word, right. Whether that’s in like, you know. So whatever the chart is, I mean, like Excel might give me a couple of options but I was just like, “Ok, I want to take that one and then if they don’t like it then I’ll change it, right?” I admit that, and I think maybe to your point that idea of developing a process around how you’re going to create them and then coming up with some of your favorites. Once you’ve kind of gotten couple of winds there you can go back and make that process a little bit faster.

Jonathan: And you know what thinking of that too, what I’ll do, and I’m committing this now right on air so I have to do it. I’ll write a blog post of like let’s say 5 questions. A quick process that a professional can go through and may take them 20 minutes to gather these answers that will help frame a quick storyboard to build into their platform. It would be basic that you can extend it yourself but I’ll create a couple of questions that you can have and take in that way to help you get to that point so that you can eliminate a lot of the rework that we typically have. I’ll do that and put it in my blog.

Steve: Yeah, that would be great. If you give us the link to that we can include that on the show notes which will be at sqldatapartners.com/visualization.

Jonathan: Ok, yeah, definitely.

Steve: Ok, so now that we’ve talked about colors. What about people who are color blind? How do you handle that?

Jonathan: Ok, that’s actually a big big thing. And it’s actually dear to me too because I grow up with people in my family who suffer from Color Value Deficiency. And one of the things, I even remember as a kid and things that you would never think about unless you actually live with them or actually experience them in day to day life. Like one of the things are cakes, birthday cakes. I remember a birthday cake. We were doing a birthday cake and we have to actually stop and think about the color difference between the cake and letters on it because the person we were creating the cake for wouldn’t be able to see the difference. I was probably like 8 or 9 years old and I was like, “Wow, that’s…” So at an early age it was impactful for me to be able to see that.

Carlos: What are the stats on colorblindness?

Jonathan: I believe it’s something like 1 out of every 8 men are colorblind. 1 out of every 12 men, 8%, so yeah. 1 out of every 12 men and 1 out of every 200 women in general are colorblind. So if you take a SQL Saturday with 30 people in the room, there’s 3 people at least in that room that are colorblind. And then too, there’s not just one type of colorblindness. There are multiple spectrums. There is the Color Vision Deficiency, there’s different types of Protanopia. Those are the three major ones. But there are like Green Cone Deficiency. One of the things that we hear too when we think of colorblindness, we think that the person can’t see the color red but they lose the whole spectrum. So somebody has a green deficiency they lose the whole green spectrum. They could lose the whole red spectrum. And one of the things that I show in one of my talks is that, and I do this purposely because like as I said it’s dear to me, is I show people what other people see because until we actually see how other people see things we can’t really empathize with them. So I’ll take a picture and I’ll show it in the different types of spectrum. I’ll say, “Ok, this is how we see the world.” Then I’ll show, “This is how they see the world. This is how this person sees the world.” And they’re like, “Oh my God I didn’t think about that.” There’s actually some cool things that we can do though. Outside off the bat we can immediately, we can use shading and gradients which are tools that are already available to us to be able to show differences. We can avoid using reds and greens together when possible. But there are other tools too, like one is the ColorLab. And I will give you guys the links so that you can put it with the notes, the ColorLab. The ColorLab is actually pretty cool because you can go through and you take your company’s color palette because that’s a lot of thing too. A lot of companies have color palettes that they have to put their reports in these palettes.

Carlos: Right, they have to meet the criteria, sure.

Jonathan: Right, so they probably better use shading and gradients. But you can put the colors in here and you can go through and change the color spectrum, the CVD spectrum and see how people see these colors truly. And there’s another site, color-blindness.com, it’s called COBLIS, it’s the Colorblind Simulator. And so what I do is when get done with my visualization whether it’s my Reporting Services, PowerBI, Tableau doesn’t matter. I’ll screenshot it and I’ll upload it there and I’ll look through their simulator to make sure that I can still see what I want to show. Does my report, does my visualization still show the story that I want to tell so, am I loosing tone because I lose the color? And if I am then I can go back and change it. Those are quick things that we can do to make sure that we include people from the CVD spectrum because one of the things is that as time goes on we have more and more data and everything is becoming “datafied”. You’re going to have more and more people depending on data. And the one thing that you don’t want to do, right now it’s not written into the ADA, the American Disability Act, but I can see it being used later on because somebody say’s they can’t do their job because you don’t show them a CVD friendly report and they can’t do their job. And that’s actually something that is simple as changing the colors. You could be potentially liable in the future for stuff like that. I see those types of cases coming because I see lawyers chasing stuff like that. That’s a whole other spectrum to chase and make money. To hedge is bad. What you don’t want to do is to be the person who developed that report that got the company sued for $10 million. Right, I mean.

Carlos: No, that’s wasn’t me, that was the last guy, right?

Jonathan: You know, that’s stretching it, that’s an extreme outlier but the possibility exists. So just to prevent it let’s include them anyway. Because then too you will never know who you’re affecting. If you can make somebody’s day by including them without actively going out and searching them that makes somebody feels good. At the end of the day you can help somebody out. That’s what this is all about anyway, right? So 5 more minutes of your time to help include somebody who has CVD, that’s cool.

Steve: Who knows that person you’re helping out by making it look good for the colorblind is maybe the decision maker on whether you come back for that next project as a consultant.

Jonathan: Definitely, yeah.

Steve: That could make a big difference to your future as well as or my future.

Jonathan: Right.

Carlos: So what are some of the tools doing to help us with some of that, right? Again, I’m a lazy knuckle dragging Neanderthal. Alright, I want to kind of out of the box to be that way. What are the tools doing to help me there?

Jonathan: Well, I’m interested to see how PowerBI is going to directly address that. One of the things that you can do right now and like Excel and PowerPoint, and PowerBI as well, you can change gradients and stuff like that. I mean, right now, you got to be actively thinking about it. I could see in the future having a switch that if you have CVD, you could hit the switch and it would change for you. Actually that would probably make a lot of sense to do something like that because that’s your easy button, right? I don’t know if they’re going to do that but I could see them doing that because that’s simple. They could say, for this color, this RGB color we could show this color because they have the budget to be able to delve into that and figure out what works for who and what works for not. I could see them doing something like that in the future.

Carlos: Right. Yeah, just translating that so they do have the different colors like you mentioned the reds and greens where it just translates to gradients or shades of blue or whatever.    

Jonathan: Right, because the nature of the PowerBIs and the Tableaus stuff like that and then with the HTML 5 enable you to do stuff on a fly because it will then be just a quick button that will let you change the styling or shade something like that change the colors on a fly. Obviously a Reporting Service report is static. It would be able to do that. But the new HTML 5 reports will be able to do that. You know what I mean? I could see that coming in the future. We don’t know yet but they’re actively working on it so we know that they know what’s the problem and we will see something soon.

Carlos: Right, now will you talk a little bit about, again, kind of data everywhere, right. And you know SQL Server 2016, the integration with PowerBI and Mobile Reports. We may already feel that there are enough reports out there but it seems like that’s only going to expand, right? Will you talk a little bit about, you mentioned the dangers of datafication. I mean, you feel like the role of reporting and the need to go through this is just expanding. We’re now just starting to kind of get into some of this?

Jonathan: Oh yeah, this is the forefront of it. Like we’re datafying everything. We datafy everything from our footsteps, to our calories, to our heart rate, to traffic on the road, we datafy everything because we like to be able to see and analyze things. We want a quick answer. We want to be able to say, “Oh, I get this.” And it’s good though. It lets us make changes and decisions we can plan. They’re using it to be able to track diseases. They can track the spread of Zika virus using data. That’s amazing stuff you can do but there’s dangers in that though. There are extreme dangers in that. We need to be careful especially with us being the person that’s, the people that are actually showing the data because it falls back on us. I’ll give you two quick examples too. One is a mechanical issue and the other one is a human issue. And so at the end of the day as data professionals we love the term garbage in and garbage out. We live by that. So thinking along that I’m actually writing a blog post about it now so the story will be in there as well. I had a client that was a retail client and they want to track store attendance with a mechanical piece. And as every time someone came in it will track and count so they would know how many people came in and out of the store. So they would do everything, they would base stuff on store bonuses and stuff like that based on that number there. And they would based even their finances is bases on that attendance in the store because they could do store attendance divided by sales and all that kind of stuff like that. So as I was working on one of the analytical project there I was interviewing one of the business people, well I was interviewing the business people in general, and I begin to hear them call it the random number generator. And I was like, what? But it was known that the numbers we’re wrong and it was a joke. I’m like, wait a minute. Your basing your business on something you know that’s wrong. They like, “Yeah, we know it’s wrong.” I’m like, that’s not a problem? Everything they were doing in the future was based on these incorrect assumptions so the analytics was right, the visualization was right but the source was wrong. So you have garbage on the end because your source as wrong so that’s one of the dangers as having an incorrect source, having low quality input. The other one is the human factor and this is little touchy because how it transpired out but we’ll use the election. One of the things that we saw up until the point of the election, a week before the election, most major poll showed Hillary Clinton winning between 3-6% points. You know, ok that was great. And she had based the fact that, she didn’t have a concession speech because the numbers says she could not loose. Obviously we know how that went out. So the first thing that they went back to, they were like, “The polls are wrong, the polls are garbage.” So that reflects back on people like us writing the report so that will be the analog of we wrote the report wrong. No, we didn’t write the report wrong, the data came in wrong. What end up happening was it caused a social desirability bias. People says stuff what they think you want to hear not what they’re really are going to do. So people were saying that they were going to vote for her when obviously they didn’t. But like I said we won’t go into that, that’s also psychology as well. I told you I’m a psychology junkie. But see even with that, people knew we collected the data, they did the full random sampling. You have faces like the Gallup poll and USA Today poll so they’re doing all the stuff that’s correct. They’re doing the proper distribution of people, the proper distribution of the population to be able to get exactly what’s needed to be able to give accurate results and stuff like that. They go through, do all these and have the proper margin of error, you know, everything, right? And they showed, “Ok, 3-6% she should win. Ok, great let’s work on that.” And then that doesn’t end up happening because the source was wrong. One of the things that I want to caution is as data professionals we need to make sure that we’re comfortable in our sources. Is our source correct? Was there a bias in collecting it? Were there errors in the machine that collected it? Was there possibility of errors coming in with the data corrupted coming in, all kind of stuff like that because at the end of the day…

Carlos: Are we looking for the number that will help us prove our point rather than looking at the numbers a little more objectively to help us answer the question.

Jonathan: Yeah, I think so and that’s a great way to put it. Are we, in essence that’s another bias. Are we using the numbers to prove our bias or we’re using the numbers to disprove our bias and to show the proper thing because that’s the end. Like you said, they were using the numbers to prove their bias because they wanted her to win so the numbers proved that so that works let’s go with it. What nobody ever said was, are people truthfully telling the story that’s actually happening. Are they lying to us? Are they truthfully going to vote for her.

Carlos: Right. Man, well some interesting stuff and I always see, I think. This is not going away, right? The reporting options are not decreasing that are available to us.

Jonathan: Not at all.

Carlos: And so I think it will be kind of interesting to see how some of these plays out. I’m interested to start playing with even some of the colorblindness components. Because when I heard those numbers I thought, “Wow that is much higher than I would have guessed.”

Jonathan: One of the things too with the influx of data and how it’s going to become. I could see these principles being used for everybody from the Administrative Assistant to the Executive Director and CEO and stuff like that because everybody at some point is going to become natural to be creating reports. We need to make sure are we comfortable with the source because it’s not just going to be us creating them in the future. As more and more self service tools come out, you know, the iPhone 15 is going to have a way to show reports of your tweets and stuff in real time. You know, Facebook is going to be like, “Hey, show my data.” I mean, I can see that. That’s coming because it is just the next natural evolution of it because as more and more people understand how valuable data is the more and more people are going to show their data. More and more people are going to show what they’re doing with their data. I mean, think about it. Even your FitBit, you know the FitBit, get your own reports so that you can share with your friends. You know, “Look, I had 10,000 steps today. I’m averaging 9.2 hundred steps a week.” You know, stuff like that. So even those small stuff, even now we’re beginning to see people wanting to show their own data. So it’s just going to get more and more in the future as we get further along.

Carlos: So that’s an interesting point. And I think ultimately as data professional we can continue to provide value there if we’re willing to educate people on how to do that and do the best practices rather than I am the creator of the reports, here’s some guidance on how to do that.

Jonathan: And I think that’s the big thing at the end of the day is willing to see control for the greater good. Because at the end of the day you’re more valuable when you provide true benefit to the enterprise than just you know, “I created this”. If you can show more people to do it better than just having, because you’re a bottleneck no matter how good you and me, we are bottlenecks. But if ten of us can do it then we’re better. That’s how I see that.

Carlos: The value there again is an understanding the data and kind of where it comes from and then the presentation that’s like the final piece. You can push that down a little bit and just have that be taking off your plate as the tools become easier and what not. Yeah, good stuff.

Steve: I think what is interesting on that too is that oftentimes the visualizations or the reports that are shown will drive behavior. And when you’re asking someone to build the report that’s going to drive behavior oftentimes they don’t know the complete understanding of that behavior. And I think that goes back to what Jonathan said in the beginning around the story, building the story and storyboard. But if that is pushed out to everyone who’s able to get to the data and build their own visualizations then people really have to build their own stories in that case so that they’re showing the right data. Otherwise, you’re going to end up with data that’s driving business decisions that’s all over the spectrum. One person’s report may contradict another’s based off of their biases or approach to creating that.

Jonathan: Correct. And another thing too with reports this is something that I learned a long time ago too is that a report does not have to answer your question. Sometimes a report makes more questions which eventually lead you to the right answer. That’s something that lot of us do is that we want to report to solve a problem. Of course it doesn’t always have to solve a problem. Make them lead to more questions and that’s progress. That’s what I was thinking about.

Steve: Yeah, that’s one of the things I love with reports is when somebody looks at it and say, “Well, why is that number that way?” And you have to do the research and figure out, “Oh, well it’s because sales are off in that region” or whatever it may be. And then you’re able to fix a business problem because of those questions that came up.

Jonathan: As a consultant when I’m talking to clients that’s what I tell them too. Don’t be afraid to have your results posts more questions because that’s good. You’re getting further into your data. You’re learning more about yourself. Everything doesn’t have to have a quick answer. You know, that’s our society, we want quick answers. But everything doesn’t have a quick answer.

Carlos: Well, awesome Jonathan. Thank you for the discussion.

Jonathan: Yes, thank you. I enjoyed it. I love talking about stuff like this. You could can geek me out for hours for psychology and stuff.

Steve: This has been a lot of fun.

Carlos: Shall we go ahead and do SQL Family?

Jonathan: Sure.

Steve: So Jonathan how did you first get started with SQL Server?

Jonathan: I have a funny story with that. I was working on a help desk and I was trying to figure out what I want, I know I wanted a career in IT, because I wrote my first program when I was 6 years old. I was one of those weird kids. I had that Radio Shack TRS-80. I’m taking it back, right, showing my age. So as 6 years old I wrote my first program so I had been doing IT stuff all through school and stuff like that then got a job at a help desk and I knew that wasn’t really what I wanted to do. So I was doing research and research and end up finding, you know I was like, “I think this database thing is kind of cool.” So I went and sought out the DBA at the company and I work 2nd shift. He worked the regular 8 to 5. So I would come in everyday at 8 o’clock and sit with him until I had the work at 2 to learn SQL Server from him. And he was willing to actually, his name was Carlos too which is funny.

Carlos: Great name.

Jonathan: Yeah. And he was willing to sit with me and teach me SQL Server, and that was SQL Server 6, 6.5, that’s taking back too. They saw my drive and determination with that and the created me a Junior DBA role. And that was my first job as a true SQL Server professional, I was a Junior DBA at that company, and that’s how I got started. That was my first exposure to SQL Server and ever since then I just love SQL Server. I’ve been doing it for, if I tell you how many years ago you’ll know how old I am, so 17 years later this where we are today.

Steve: That’s impressive, going out of your way to sort of work the extra hours so just you can learn something new. If more people did that I think we’ll be in a whole different world.

Carlos: That’s right. Not the typical path, right, kind of falling into your lap, you know with me. It’s more of a, hey that’s pretty cool and let me go check that out. Now, in all that time with SQL Server, if there is one thing that you could change about SQL Server what would it be?

Jonathan: The one thing I will change is that they’re actually changing it. It was, you know, the staticness between versions and fixing things. We went to that whole dead zone, the desert between 2008 R2 and 2012, and that was like, “Oh my god”. You know, trying to figure out, and there were some great things professionals found. Ways around, solutions around problems but that was one of the biggest things between there and there. And then 2012 to 2014, ok that’s cool but to see them how faster change stuff like VNX is in CTP1 now. Is that what it is? They are at a rate where they could possibly even release VNX this year. That’s crazy and that was actually what I’m thinking was, what’s to be able to make changes faster on a fly is the community. And you know, users will say, “Hey, do this. Can do that?” You know, we’re getting that base for PowerBI seems like every week. It seems like every time I open up Management Studio there is another update for that. I mean, they’re doing the things that I would like to change and it’s changed faster. So that’s really cool to see as exciting. I can’t wait to see the future. I mean, imagine SQL Server, well the changes they’re making in just 3 years, 2-3 years it’s going to be amazing what’s coming.

Steve: Alright, so what is the best piece of career advice that you’ve received? So for me the best piece of career advice is based on, for me I had struggled personally in just being comfortable on who I am. And I remember one of my managers, his name was Keith. And me and Keith are still really good friends. I talk to him more of the time, I still do. You know, Keith told me, “Just embrace who you are. Be who you are. Don’t try to be somebody else.” And what’s funny is that’s actually the beginning of my logo. You know, you guys see my logo it’s my hair. That’s me literally embracing myself. And so that was the best piece of advice that I got because once I truly embrace who I am, my strength and my weaknesses. Still trying to fit into a box that I thought somebody else wanted me to be that’s when I begin to flourish. That’s when I begin to be truly happy. I’m truly happy because I’m being who I am. I love to share, I love to talk, I love to teach, and just embracing who I am. The best piece of advice I could give back to somebody is to tell them, “Doesn’t matter who you are. Whether you’re short, tall, you are good at this and good at that. Be who you are, embrace who you are.” And that will be your gift. That will be what people will gravitate towards you.

Steve: Sounds like great advice.

Carlos: You know, that’s right, that whole idea about networking kind of finding your own people. I think all too often when we come out of college and just like, “Oh, I need a job, right, so I’ll go anywhere. If we can identify what it is we want to do or place we want to go after. I mean, even this idea of colorblindness and making reports better to help that segment that kind of speaks to the work that you want to do and just a matter of finding where that’s important to other people as well.

Jonathan: Right, definitely. I think that’s even, actually even people taking that project and run with it. Someone selling a service to a business saying, “Hey, you know, we can convert your reports to.” Somebody say it’s like that you can sell that service.

Carlos: Our last question for you today Jonathan. If you could have one superhero power, what would it be and why do you want it?

Jonathan: So my favorite superhero doesn’t even have any superpowers. You can guess who it is?

Carlos: Batman?

Jonathan: It’s Batman, yeah. And what I always love about him is the detective comics because he uses his mind and that was a superpower. He just outsmarted people. Obviously he was physically fit, right. I mean, the ability to get to the in-depth of stuff. I love that investigative and that’s kind of like what I want to do anyway. Get to the problem, get to the root problem. Figure out what’s really going on. If anything I would want to be just Batman that’s my. I don’t need super speed and super strength. Nah, I just like the super mind and call it a day.

Steve: You know the other cool thing with Batman is he had pretty much unlimited budget, and he had all the toys, and tools, and gadgets that he wanted to do whatever he needed to.   

Jonathan: That is true.

Carlos: That will be a lot of fun.

Jonathan: Yes, I guess you could say that’s also super power too, right, the unlimited money.

Steve: Yup, right.

Carlos: Having a kin business sense or at least having a dad maybe who had one.

Jonathan: Right, right. That would definitely help.

Carlos: Well, Jonathan, thanks so much for being with us today. We do appreciate it.

Jonathan: Yeah, once again I thank both of you guys. This has been an honor to be able to do this. I like to thank the community for even, having you guys available to do this so I look forward to what’s coming in the future. Look forward to seeing you guys again at the SQL Saturday being able to panel and stuff. And I thank you for the opportunity and the time. You guys have a great day!

Carlos: Yup.

Steve: Thanks Jonathan, it’s been great.

Episode 88: SharePoint

Why can’t we just get along? There can be a real love/hate relationship between those of us with data responsibilities and those who are responsible for SharePoint environments.  We want to provide good service to those environments, but there are SO MANY things we can’t do on a SharePoint database.  To see how the other half lives, we invited Todd Klindt onto the program to give us his thoughts on how we can all get along a bit better.

Because Todd suggests the “SQL and SharePoint communities should understand each other a little better”, we will try to tackle some of the obstacles the teams face so you can be better armed to support your SharePoint team.

 Episode Quotes

“SharePoint is kind of like that pretty girl in High School…”
“A SQL alias is like fire or gunpowder. I have the respect for it. It’s a good tool but don’t play with it.”
“Bad things happen, stuffs is unfair, and how you handle it is the key.”

Listen to Learn

  • Why the SQL Server and SharePoint teams can’t seem to get along
  • Dos and Don’ts with SharePoint Server
  • How you can still save the day
  • The vocabulary you might use to approach your SharePoint team
  • What an interesting sense of humor Todd has

Todd on Twitter
Todd’s Blog
SQL Server best practices for SharePoint
Todd’s SharePoint Netcast

SQL in the news
Mongo API for DocumentDB
Microsoft Data AMP Conference

About Todd Klindt

Todd KlindtTodd has been a professional computer nerd for over 20 years, specializing in SharePoint for the last 14 years.  His love affair with SharePoint began one slow day at the office when he discovered SharePoint Team Services 1.0 on the Office XP CD that was holding up his coffee cup, and decided to install it. The rest is history. In 2006 he was honored to be awarded the MVP award from Microsoft for Windows SharePoint Services. He has had the pleasure of working with SharePoint farms both large small. He has written several books and magazine articles on SharePoint. Todd has presented sessions on SharePoint at many major conferences both in the United States as well as Europe and Asia and does the user group circuit, SharePoint Saturday events, and the occasional children’s birthday party as well.

Transcription: SharePoint

Carlos: So Todd, welcome to the program.

Todd: Thanks for having me, Carlos.

Carlos: Yes, it’s a pleasure having you here and when we reached out and chatted with you I knew we were going to have very interesting conversation. Ultimately our conversation is about SharePoint today. But your response was, and we wanted to talk about the interactions between SharePoint and SQL Server. And your response was, “Yes, I’d be happy to talk with you about the database lucky enough to host the SharePoint application”.

Todd: Yes, there are many database platforms out there but SQL is fortunate to be the one that gets to support SharePoint. It’s an honor above all honor. I hope SQL appreciates it.

Steve: Now, one thing you’ve said is that you’ve taught some classes in the past around this and around SQL for SharePoint admins and around the opposite of SharePoint for SQL DBAs.

Todd: Yeah.

Steve: And I think that’s one that would be very exciting to our listeners in that a lot of DBAs may not have the SharePoint experience.

Todd: Yeah, so the way that works is we’ve all heard the phrase, “In the land of the blind the one eyed man is king.” And so when it comes to SharePoint DBAs, not SharePoint DBAs, SharePoint Administrators, they need to know a bunch of things. They need to know SharePoint obviously but they also need to know IIS, PowerShell, and they need to know DIS and SQL. And all of those things SQL is the most complicated, the least forgiving and just the most easiest to screw up. And so I came in with a little bit of SQL knowledge. Just a little bit more than the x guy and so I became one of this people. When speaking to TechEd and things like that I would speak about, you know, for SharePoint Administrators the SQL part. And I would cover just the dumb, the stuff that’s just embarrassing for SQL folks but the SharePoint folks didn’t know it. And so I had somebody, I can’t remember what the event was like, “You know, we’ve got some SQL folks they would like it the other way around.” And I’m like, “Absolutely, this is easy.” And so I kind of twisted it on its head and what really it ended up being was almost like a counseling session for SQL DBAs because I was saying all these things that SharePoint was doing that they hated. And you know, like the ever popular using as a primary key. SQL DBAs love that. So it end up being, here’s all the ways that SharePoint is going to try to destroy your SQL Server and all the things that it doesn’t go along. When I would teach the class to SharePoint Administrators I was almost like a god because, you know, because I just knew all these crazy things. Like I knew when a transaction log was and how to keep them from filling up my hard drives. To the SQL guys that was just bad news. I was just like, “Here is another thing that SharePoint is going to that you’re going to hate. And here is another thing.” I think I already did that one once. I couldn’t take it. 

Carlos: When they got the pitch forks out that’s when you knew.

Todd: Yeah, I knew, by halfway through the session they turned on me. That’s the moment of pitch forks. Yeah, it was horrible.

Carlos: So ultimately I think we’re going to get to a little bit of that, right? We want to talk about. I have some questions about why SharePoint does the way the things it does and maybe you can help shed some light there. I guess another opinion and then of course the flip side is we would like to get some feedback on what the SharePoint Admins wished we were doing a little bit better there.

Todd: I have a long list of grievances. I think I can represent all the SharePoint Administrators when I tell you DBAs these things that you’re doing wrong. So I hope somebody got some pen and paper, yeah. I’ve got all that. So I’m happy to obliged you.

Steve: Well, it’s great to know you can represent all of the SharePoint Admins from that perspective.

Todd: We are one large homogenous group, yes. They elected me as their spokesperson so very likely yeah.

Carlos: Ok, so the first thing I guess just to talk about, I mean is that, so we’re talking about SharePoint, we’re talking about a couple of different databases and logging is probably the one that is going to get the most attention only because it’s so easy for that thing to get so big so quickly.

Todd: Yeah, there are a lot of ways that SharePoint Administrators screw up SQL. I mean there is just — Like if you guys had a list of all the ways that you can screw SQL up, SharePoint Administrators have done them all and they’ve invented a couple of new ones, and the logging DB is one of them. And the logging DB came from a good place. But the underlying SQL structure for SharePoint is incredibly rigid and incredibly fragile. And so from Day 1 when SharePoint set on top of SQL. Microsoft said, “Keep your hands out of SQL. Don’t write to those databases. Don’t read from those databases. Don’t even look at those databases. Pretend they are not there.” And that was because there is not a lot of referential integrity and so SharePoint assumes the things in certain way inside of SQL. And so the administrators that’s great but there’s a really good information in there that I would like to have. It’s only place that exist and you yell at me if I read it from there so let’s figure something out. So Microsoft, I think in SharePoint 2010 introduce this logging database where they just for everything that anybody would ever want in that database and they said, “Go wild. Read from it, write to it, delete it, add columns, whatever horrible things you could think of. You can do it to this database and this database only.” So that thing can get huge and depending on what kind of recovery model you’re using, the database can get big, the logs can get big. And most SharePoint Administrators if they installed SQL once they do that they forget about it. And so they don’t think to model those drives and look at the logs and all that. That database has taken down many SharePoint forum because it fills up all the drives and just takes everything down. So you’re absolutely right, that’s one that the DBAs need to keep track of. That’s one point that SharePoint Administrator should keep track too but they don’t.

Carlos: Sure.

Steve: So are there like standards or best practices out there for how to go about cleaning that logging database up or purging all data out of it or anything?

Todd: I’m not sure if there are any official ones. Microsoft stands on it is you can do whatever you want. So you can delete it, rename it do whatever things that you want to do to it. You can because they don’t read for anything so they don’t care.

Carlos: Sure. Correct me if I’m wrong but I believe that the default is that there is no expiration or there is no kind of archiving there. So going in and setting that would be like a good first step. Alright.

Todd: Yeah and that’s one of those things that Microsoft, at least the SharePoint team has done is anytime, no I can’t think of an exception to this but I’m sure there are. But anytime there is a fork in the road and one answer means that something might get removed and the other answer is it won’t. They always go to the do no harm so things like, you know, it will be great if we archive after 30 days because that will save disk space and all that. But they’re like, “No I don’t want to delete anything.” You’ll never know when someone is going to need something 31 days ago. You know, when the most important spreadsheet ever got deleted or something. So they let that stuff run wild. So I think, yeah that’s an important first step to figuring out. You know, if you’ve got a data warehouse somewhere you’re going to keep these things in. Four of you you’re going to setup your policies the same you can’t look back any further than 60 days or 90 days. That’s definitely something that SharePoint administrators should figure out or handle that database.

Carlos: And shame on us, you know, I guess if we’re a database administrator there for having those drives fill up. When you think it’s part of the core of what the SQL Server database administrators responsibilities are beyond backups is kind of keeping the system up and knowing how much drive space you have I think is king of critical to that.

Todd: I couldn’t agree more, shame on you DBAs. This is not the SharePoint administrator’s fault whatsoever. We are the collateral damage. No, I think a part of that comes from, you know DBAs have an idea of what their roles and you cover some of the basic stuff back-up, and monitoring and thing like that. But optimization, you know, figuring out the way to get the best performance out of the databases and those kinds of things. That’s just something that every DBA does and they go in every situation expecting to provide that service and the SharePoint guys come along. It’s the longest time SharePoint was the sexy beast out there. The SharePoint guys walk in. They had a swagger about them. It was just obnoxious and I say that as a SharePoint guy that has that swagger about it. And so we come in we’re like, “SQL guys, that’s adorable do you think you could help but don’t touch anything. Don’t look at anything. Don’t change. I know you think you’re smart. But don’t touch anything.” And after that happens the DBAs couple of time they’re like, “Fine. You told us not to touch anything then we’re not going to touch anything.” Yeah, I want to go back to playing Solitaire in my desk. And so that’s how kind of this dynamic thing got worked out. And either side really understood the other side very well. And SQL guys are told to keep their hands of and a lot of them did. I do have a one funny story with a customer. They did have a SQL team, a SharePoint team and SharePoint was running fine and then all of a sudden things started breaking and I can’t remember exactly what the symptom was because this was 8-9 years ago. But after asking everybody, “Did you change anything?” “No, we didn’t change anything.” Talked to one SQL DBA who has kind of a stroke of honesty about him. He was like, “Yeah, I was playing at the public role a little bit and then I took a bunch of things out. Is that bad? Is that wrong?” Well, yeah it was. And in his defense that industry with a very secure industry and that was just his job was to make sure that nothing had permission that they didn’t need to have and he didn’t know for sure. And he was trying to do the right thing. And we are like, “Yeah, don’t touch that, that DBA.” But that kind of stuff happens all the time. So I agree the DBA should keep their eye on that kind of stuff but there are a lot of things going on and there is not always a DBA out there. You might have real SQL things to worry about and that SharePoint might got stored up in some place where you didn’t know about. Kind of off the record there and that kind of stuff happens.

Steve: So as the DBA oftentimes we are put in a position where we are responsible for that server but then we’re told, “Don’t touch it, don’t change it.” And then things happen like databases got big because a lot of login and what do we do? I mean, we can’t touch it, we can’t change it but we’re running out of disk space so we just throw unlimited amount of disk space at it or more memory.

Todd: All of the above. I keep hearing how cheap disc space is and I hear that from people all the time. When I was a server admin like, “Why does it cost so much for server space? I can go to best buy and buy.” So a couple of things so Microsoft has some documentation to kind of helps with that. And I can give you guys the links to that. The SharePoint product team understands the tough position DBAs are put in so they have hard guidance on here’s the things that you can do to secure SQL. Here are the things that you can and can’t touch in SQL to give you guys sort of a framework on things that you can do. And so I’ll get you the links for those listeners, those of you in the shownotes these guys have or I can blog, toddklindt.com, either way. So a couple of things about that, one of the things that I would bring up is that when you create a SharePoint farm and out of the box I forget where it is now but there is like 19 databases to get created with just a regular run of the mill SharePoint farm. SharePoint makes some really bad assumptions about how things should be configured for instance it doesn’t understand putting databases in different drives. It doesn’t do that automatically. It sets your recovery model to simple. All these kind of things so as a DBA you can’t assume the things were setup in a way that makes sense because SharePoint doesn’t always do that. One of the things that you can do is go in and look at the recovery model and say, “Ok, we don’t need. Sets with full recovery model.” You would probably don’t need that unless you’re Probably doing some other kinds of smart backups truncating your logs and doing all these things. SharePoint doesn’t always do that correctly. It’s going to put everything on the default data drive, there’s a default log drive, there’s obviously some databases that you can move around that makes sense. I’ll put them on different tiers of space and different sizes so there’s some of those things. Really looking at the documentation and finding out is the best way to do it. One of the things that I have talked about when I speak to SharePoint administrators is I don’t expect them to be SQL experts. There’s just not enough time to be a SharePoint expert and a SQL expert. And so they should know a few things, they should know some vocabulary, they should know some of the terms so that when it comes to you guys or when you guys come to them that you can kind of speak to them and saying the same things. So if you come to somebody and say, “Hey, these drives are getting full what can we do?” There’s situations where content databases can have huge amount of white space in them and so then you need to have a conversation, “Should we shrink this databases, should we not shrink this database”, things like that. So it’s kind of a two way deal.

Steve: So even in that default setting, Tod, you talked about like simple mode. But even back at least in, and this is dating me a little bit. We couldn’t backup the databases and use that as a restore mechanism, right? It had kind of go through the SharePoint console. Is that change at all?

Todd: So that is kind of, let me go back and say that by default SharePoint creates the databases of this full recovery model. Not simple because full is the one that fills up the hard drive and makes everything crash. But as you can treat your model settings and play with that. But if you don’t know enough, if you’re a SharePoint guy and don’t know enough that can bite you. As for databases with the exception of one database for the most part all of the databases inside of SharePoint. When it comes down to as known what kind of database is that and what you’re trying to do with it. So the one that you really can’t touch is the config database and that is the one that has all of these form specific and machine specific settings. You can’t move that thing. You can’t move it from farm to farm. You can’t really even move it from location to location side of SQL. It’s just a vey fuzzy thing. There’s a lot of hard coded things in there and that’s the one that really destructs all the SharePoint servers. So if something is wrong in that database SharePoint can’t start. Outside of that though like your content databases where all your documents, and your lists and all that but those things are completely affordable to a point where I tell folks, when I’m talking to SharePoint folks, if you do nothing inside of SQL. Get SQL level backups of your content databases because if everything burns down, if your SharePoint Server is exploding, your SQL Servers explode. If you got copies of those databases I can’t make sure you don’t get fired because that’s where all the stuffs at. I’ve got this great story where I used to work at a company and I left and like 10 years later the guy that was my boss calls me on a Friday night at 6 o’clock. I’m like, “Hey, Todd, how it’s going? Been a while.” And he’s like, “Well we got this big SharePoint environment and we’re moving it to a hosting company. And we get everything set up and we have all these VMs, all these Hypervisors and all these SharePoint farms and the hosting company did a reverse migration.” Now I’ve never heard that term reverse migration but I think I had a pretty good idea what had happened. When they were set to copy their local virtual machines and databases and all that to the hosting provider the hosting provider did it backwards and copied the blank environment over top of that production environment.

Carlos: Oh man.

Steve: Wow.

Todd: Yeah, so it gets worst, so he’s telling the stories like, ok. So we talked to our SQL team, they had databases, they had all the databases backed up the SQL team rocks. Everything is good to go. He’s like, “Go DBAs!” And I talked to the Windows team and I said, “Ok, we need to restore all the VMs.” And the Windows team said, “No, we don’t take care of that. The virtualization team does.” I said, “Ok”, so I went to the virtualization team and I said, “Hey, I need to restore all these Windows VMs.” And the virtualization team said, “We don’t do that, the Windows team does.” So this guy was in this situation where he had like I think 9 SharePoint farms and close to 30 SharePoint Servers and all the servers, all the Windows images are gone. All he had was the SQL images and databases. That’s all that he had. And so I was able to take nothing but the SQL images and SQL databases. And the SQL team rebuilt SQL Servers and monitor everything. And I brought all that up full fidelity everything that he had the week before because all the good stuff, all the important stuff is in SQL. All of the data is in SQL. And the fact that they had just SQL backups, SharePoint backups. The fact that they had good SQL backup so I was able to rebuild everything. Bolt it all back together and as far as I know we didn’t go far. So where does the SharePoint part comes in is there are a bunch of SharePoint piece that aren’t in SQL and so if you don’t do SharePoint backups those things font get back up. And there’s also unit of scale inside of SharePoint. And a content database again which has all your important documents and list items and calendars and all that. They can have one or more site collections which just kind of the unit of scale inside of SharePoint. So SQL can’t backup anything in the databases. But if you do SharePoint level backups you can backup individual site collections and give some plenty of business there. But the SQL backups are the very first thing I tell anybody to do.

Carlos: There you go so companeros if you’re getting pushed out of the SharePoint love and make sure you’re still taking your backups so that one day you too can be the hero.    

Todd: Yes, recover from that reverse migration.

Carlos: So one of the things that I was going to ask about in the database itself. There are some settings that kind of draw these bunkers, right? And one of the things that it doesn’t want us doing is creating statistics. And these are statistics, again, so you have these very wide tables, of all these list, right, column 1, column 2 and we look at we go crazy. And then it’s like now you’re not going to let me choose, you know, create statistics at the database itself is going to create. I’m not going to create any on my own. I’m going to let your database create it and SharePoint is like, “No, no, don’t do that.” Where you get in trouble particularly even if you decide to go and to do is then when you want to upgrade it will say, “You’ve touched my database. I’m not going to let you upgrade.” My question is creating automatic statistics, is that still a no no? Is that something that we can turn on and then again assuming that we know enough and then we just remove them before upgrade time? Or what’s your choice there? Which side do you coming down on.

Todd: So I think that is covered in the best practices document that I promise that I would link to you guys. So the upgrade of the database stuff I’m not sure what the last version of SharePoint trying to upgrade from or to us. So I’m not sure if that particular one is still gets caught or not. These days when upgrade SharePoint so all you can do is database attach so basically you build a new SharePoint farm as you got SharePoint 2013 database as you built your SharePoint 2016 farm. There is a command lock that you can use to test the database and it will tell you all the things that you’ve done wrong and all the things that it does right. But more importantly it will tell you whether that particular thing will block the upgrade or not. And there are precious few things that actually block an upgrade. Now, play with the database schema that might be one of those things. But I don’t remember if the statistics, one if it just yells at you or if it blocks the upgrade or not anymore. I know when going from SharePoint 2007 to 2010 it was super mean about those kinds of things. I don’t know if that’s still the case or not, best practices thing, I think it’s in there. But yeah, you’re right and that’s why I feel for the DBAs is because there are so many things that you guys do to keep things run then SharePoint comes along and says don’t do that. Now I’m looking to the best practices document right now and it says do not enable auto create statistics at SharePoint content database. We will find you, we will pull your fingernails, put cockroaches in your underwear that seems unnecessary. I don’t know why they didn’t do that. Yeah, so don’t do that particular thing now that is the SharePoint 2013 best practices document.

Carlos: So you would still follow that as a best practice then?

Todd: Yeah, I would. And I’m sure your listeners are used to talking about things in Microsoft Support. And there is support in a capital S and support in the lower case s. So depending on what you’re talking about. There are different levels of what you can and can’t do. I’ve absolutely done things that I know that Microsoft hates. But they can’t necessarily catch me up and I’ve got plausible deniability. If this is one, I don’t know enough SQL to know if there is a way you could turn that on and then you call Microsoft support and they notice that if you could turn it off and undo potential damage. I don’t know that one well enough to comment on.

Steve: Then when we’re looking at the SQL Server best practices or the normal things that we’re doing for performance tuning and then you compare that to what SharePoint is going to allow you to do. What are the kind of things that we should avoid or that we can get into trouble if we do. Or what are the things that we can do and get away with and work well?

Todd: Yeah, SharePoint is kind of like that pretty girl in High School. She’s really pretty and you want to demanding. SharePoint is kind of like that when it comes to SQL so again that best practices document that I’m going to link and that you’re going to link that will walk you through basically that question as a well seasoned DBA. You got a list of things in your head that you want to do and here’s the list of the things that you can do which is very short and here is the list of all the things that you cannot do and that really guides you. So a couple of things that we talked about you can’t mock with things like indexes and fill factor and those kinds of things. There are some harder things that can’t do. I know one of the ones we talked about that you can’t auto create statistics that’s a big no no. A gigantic no no is you can’t mess with the maximum degree of parallelization or maxed up to its friends that has to be set to one and only one or SharePoint 2013 and greater will just, depending on when it catches you changing that it will just fall down and throw up its hands and refuse to serve anything. So if you don’t have maxed up set to one on your instance when you create your farm SharePoint will refuse to create the farm. It just won’t do it. There are different maintenance processes and stuff that will check for that. So if you try to change it afterwards it will be very angry with you. While that’s obnoxious it is less obnoxious not checking and not getting angry with you and corrupting your data because it thinks everything is running and one threat that’s not. I send that to that best practices document to SQL DBAs all the time. it normally brings them into tears. Microsoft has done a pretty good job because this question comes up all the time. SQL DBAs are better way longer than SharePoint administrators have. So you guys honed your craft a lot better than we have so we have to provide you with this improvision.

Steve: Sure and we just don’t have that cool swagger that you mentioned earlier.

Todd: It’s true. I mean, I’ve met some of the coolest SQL folks and you got nothing on some of the coolest SharePoint folks. It’s true story.

Carlos: And here I was starting to warm up to you Todd.     

Todd: I’ll take care of that.

Carlos: A Chill just entered the conversation here.

Todd: So that’s plenty, many years ago. So I’m Microsoft MVP and I have that for 10 or 11 years and along one of this flight to the MVP Summit I was sitting next to Tom LaRock. I know you guys have deal with him. I do mean deal with him, not meet him, he is a special individual. So on that plane, you know, it’s all nerds. That plane right out to Seattle just have the extension nerds. I’m sitting next to Tom and he pulls out his laptop and he starts doing SQL-y thing. And so I was like, “Alright, he’s probably an MVP and I start talking to him.” And to this time, and this is probably 10 years ago, Tom and I still talk all the time because when that conversation started up I’m like, “Hey, you’re a SQL guy.” And he’s like, “I’m a SharePoint guy.” And he kind of looked at me and he gave me the stink eye. I’m like, “No, no, it’s cool. I’m on your side.” What I can do is SharePoint guy to make SQL better. Help me understand the performance things that I can do and we kind of bonded over that. So there’s some amount of, we can mix the chocolate and the peanut butter but it works out.

Carlos: Now getting back into the reverse because basically out hands are tied and don’t stick your hands into the cookie jar, right, get out don’t do anything. So on the reverse side there how are we driving our SharePoint admins crazy and how can we do a little bit more and less yelling.

Todd: Yeah, well I think there’s a bunch of stuffs you guys can and a bunch of stuff that SharePoint people don’t understand. SharePoint administrators everything in SharePoint is the can’t to them. It’s the word docs, it’s excel spreadsheets that they are laser focus on that that they can’t see anything else. And as a DBA you understand that by the time a piece of data end up in a content database has probably gone through TempDB. It sat in a lot of cloud for a while. There are a bunch of things that happened before it gets into that database. Whereas to a SharePoint admins, you know, it goes from my mom’s computer to the screen directly into that SQL database, you know, no stops between. So one of the things you guys can do is on the backend thing about where database go, which drives TempDB goes on, how many TemDBs have, where to put the log, where to put the databases and things like that to make SharePoint shine. That’s one of the things and educate those SharePoint folks on that and why you’re doing it, and help them understand the value of that. That’s makes so much difference. Making sure that your SQL Servers have enough memory, have enough RAM to do the things that they need to do. And there’s some guidelines out there but again you guys, again it’s almost more art than science at this point. You guys can feel those things out and you understand the different metrics to look at and see where SharePoint is come flowing down and help guide those folks out.

Steve: Yup, so on that point then around moving data and log files around. I mean generally we see with SQL databases that if you can have your log files on a different drive than your data files. And you’re having I/O contention you usually can improve performance with that. And one of the things I’ve run into with SharePoint or SharePoint admins that you make recommendations say this is what we’re going to do, we put in a new drive, we’re going to move the log files over here, we’re going to move the data files whatever and there’s a lot of push backs saying, “oh you can’t do that with SharePoint.”

Carlos: Yeah, you can’t touch it, you know.

Steve: That will invalidate the warranty and the entire system will blow up.

Todd: Yeah, I think that comes from two things. I think one of the things is SharePoint administrators are very sure of themselves and so they have heard that you shouldn’t touch things in SQL and so they repeat that with great authority and feeling. And the other thing is I think most SharePoint administrators are scared to death of SQL. I mean just terrified of it. And so they’re afraid and if you go back there and fiddle some bits that it might just kill SharePoint and SharePoint will never recover. And so I think it’s SQL parts in those two things. SharePoint absolutely is fine with the transaction logs and databases being on separate drives. It’s absolutely fine with if you’ve got a content database that has really fast filling logs and you need to move those. It’s fine of taking that database offline in SharePoint which is different than dropping it in SQL the way SharePoint uses offline for databases is maddening. But it actually understands that moving the logs or backing it up it actually covers all these things. You can actually do all those things.

Carlos: Just to clarify there. So I go to my admin and say, “We need to make a change here.” We’ve gotten to a point where they will trust me and I’m not going to break everything. Am I using SharePoint to take the database offline or can I do that through SQL Server Management Studio and take the database offline and move the files around?

Todd: Yes. It depends on the database and what you’re doing on the SQL backend. So the content databases and the content databases alone have this idea of being offline and detached and things like that. A content database in SharePoint if you set it to offline what does that not mean is it’s not accessible to SharePoint. You know, the way offline means in every other of the tech world every time ever except that one instance. And to make it even more confusing there are two screens that that’s on. In one place the actions are offline and ready for our database, the content database. In the other screen the actions are stopped and started.

Carlos: Oh, interesting. Okay.

Todd: And it is the same setting. It’s been that way for 10 years. So as a DBA you hear that the database is offline and you’re like, “Hey, the database is offline.” When SharePoint, what it means for SharePoint is I can’t create new site collections there. This is what that means. In content database you can detach them and then SharePoint doesn’t look for it anymore. And them what they’re doing in SQL doesn’t matter. You can do whatever. But you can have different file groups and things like that for databases and depending on what you’re doing on the backend you may be able to move those transaction logs and create another log on a different drive. You know, SQL does it thing it tries to balance things and you can move things that way as long as that database is accessible. SharePoint doesn’t care.

Steve: We don’t recommend creating additional logs and files but moving them would be a preference there.

Todd: Yeah, if you could do that. If you could say this content database because they’re the ones that are going to get really big. Say, detach this in SharePoint so it doesn’t show up. You are going to have little bit down time but that’s ok. And then take your transaction log moving at the database back in. Yeah, SharePoint doesn’t care.

Carlos: Well, they shouldn’t be getting that big anymore because now we have permission to go and take the backups to the databases and we’ll help manage that size from that perspective.     

Todd: Yeah, and that’s so I’ve got, I don’t know how many hours I’ve spent with the SharePoint admins when I was consulting and they call and say, “Hey, SharePoint is down everything is ruined.” Never would be because the transaction log file was My favorite one, and this is like in 2008, and we are talking almost 10 years ago and a guy calls up all freaked out, “SharePoint is down. SharePoint is down.” I imagine him running around his office waving his hands in the air, his hair is on fire that ordeal. And so I jump on there, sure enough it’s another one of those farms that gigantic transactions logs had taken down. And this transaction log was 350 GB something like that. I don’t know how big transaction logs I know you guys have seen this days. But back in 2008-2009 that was a pretty big file. So I walked with my typical SharePoint swagger and I’m like, “Don’t worry about this man. I got this. I clean this up all the time.” And I went in, changed it from full to simple, and I know in the background what the database engines are going through and marking transactions and all that. So I did a little chit chat with the guy and told the administrator that I’m going to try and shrink the transaction and I told him what causes it’s going to be. We shrunk it like 20GB. Which 20GB that’s a huge. This guy is seeing 350GB file and now let’s say 330GB file and he’s freaking out. But it just goes to show that he didn’t understand the back end of SQL. He understood SharePoint but he had to think through all that. And I’m like, “Good luck on your database engines working as fast as it can.” But that’s kind of stuff happen all the time. Another fun one that I had, I was working with a customer we’re doing test migration and we talked earlier in this podcast about database are being affordable and do SQL database level backups. So I jump on his SQL box and make backups of all his databases. And I look at the SQL box and it’s got the amount of space I need times 1.5 free. So him and I start to running the backups and all of a sudden it shuts down. And he’s like well is the backup still going? I’m like, “Well, that depends a lot on why I doesn’t it?” Because your connection to the server went away probably still going on because if the server burst into flames probably not. So he tries to connect and he can’t connect it and he’s like, “I don’t know what’s going on?” And then he gets a knock on his door and everything is down, email is down, the whole thing. And he’s like, “Oh, sounds like our hypervisor went down.” Okay, so he has investigation. They have been thin provisioning their drives. And I sort of this multi gigabyte backups and filled up the drive on the hypervisor everything paused. Ok, good enough so he goes some stuff. Couple of minutes later we’re back up and going. We jump back on the server of course the backups didn’t finish because the machine shut down. And set backups again. We did about 10% in. You need to figure out some space for this. I can’t keep your taking your old company down like this. I feel bad. And I remember back in those days when I would speak in SharePoint conferences I will tell people, “Don’t virtualized SQL. Just don’t do it.” There’s a performance penalty. Of course today in 2017 everybody virtualizes everything everywhere. But that’s one of those things back then. I’m like, “This is just another reason not to virtualize SQL, going to have that drive space, going to have good I/O.

Carlos: You know, that’s right. If you don’t have drive space you don’t have too much. Especially if you want to add something to it. And I guess if you want it to be read-only that’s one thing.

Todd: I keep thinking and I feel bad about it.

Steve: Alright, well so great information there I think for the SQL DBAs as well as the SharePoint people.

Todd: Yeah, one of those things that I want bring up to DBAs and this is the story of SQL aliases. And do you guys use SQL aliases much?

Steve: Not often.

Todd: Ok, that is a debate inside of the SharePoint community a lot. And the way that I feel and again it’s not my information on this is dated but I first started touching SQL aliases probably in 2008-2009. And it was like black magic to me. I mean it was just like. I don’t remember the first time I pulled the hard drive and everything kept working. I’m like, “This is black magic there is no way this works.” I felt the same way about the first time I use SQL aliases. And once word got out that those were there people were like, “We should use SQL aliases all the time.” But for the longest time when I would get up in front of a room speaking to SharePoint I would say, “Who here knows what a SQL aliases is?” And like 3 people out of 50 or 100 will raise their hand. And I would say, this is why you shouldn’t use SQL aliases. Because SQL aliases work so well that if you don’t know what you’re looking for you’ll never find it. You’ll never know why things don’t work. So my guidance to folks has always been not to use SQL alias unless you need it. Unless you can articulate the reason why you need to use a SQL alias don’t bother with it. Because you’re going to go on with your career hopefully and get big promotion or win the lottery whatever the next guy might not be as smart as you and he might not know about SQL aliases. You can drop one in anytime. There is no penalty for using that when you build you SharePoint farm. That’s one thing I told folks not to do unless they need it is use a SQL aliases.

Carlos: Yes, it’s one of those things because the people are migrating. They want to go from server A to server B and they’ll introduce that alias but I kind of feel like, you know, it just bite the bullet and get that SharePoint setting reset. Like change your applications unless it’s really going to be that difficult it’s worth not using the alias because you’re then kind of giving control if you will to who owns that. Whereas with like even DNS there’s an owner like you know where that is going from. You don’t have that quite with an alias so I think that’s kind of the tricky part with there.

Todd: Well there is one very specific reason to use aliases with SharePoint. Well there are two but it’s the same reason. We talk earlier about that config database. SharePoint does not support moving that config database, and so if you have to move servers or migrate or something there is no supported way to move where that config database is. So if you want to move it you have to use an alias. There is just no way around it. And the other thing is if you’re using any kind of non-standard ports because you can’t do anything with non standard ports with DNS because that’s just a hosting level thing. So that’s the two reasons that I tell people. And again it’s very valid. If you’re doing one of those things rock on use an alias. But the thing that I tell people is there is no benefit in using the alias when you build the farm. You can always drop it in. Because aliases I’m telling still today I’ve got half of the SharePoint administrators don’t know what an alias is or know where to find it, or know where to configure it or know that there is two of them. Know that they need to do the 32-bit one and the 64-bit one. So I tell people to be afraid. It’s like fire or gunpowder. I have the respect for it. It’s a good tool but don’t play with it.

Steve: Yup, very good advice. So shall we go on into the SQL Family questions now?

Todd: Sure.

Steve: Can you tell us how you first get started with SQL Server and or SharePoint?

Todd: Yeah, so I got my first SQL Server cert in December of 1997. I got my administrating SQL 65 certification back then. It was one of those things I worked for a couple of company that use a little piece of a kind software. Maybe you’ve heard of it. Great Plains Dynamics and this was way before Microsoft purchased them. And the company that I worked for is a consulting company and they wanted to use Dynamics that was really expensive to buy. But what they found was if you’re a Great Plains reseller you got to use it for free. And so they said, “Congratulations everybody we’re going to become a Great Plains reseller.” And to get that you have to have somebody who’s certified on the software and that person Microsoft because that stuff also run on You had to be an MCSE and you had to be certified with SQL and then you go to this class. And I was already an MCSE so they said, “Congratulations Tod, you’re about to become the SQL Administrator. Go pass this test because you’re going next month.” That’s how I got started with SQL 20 years ago now. And then SharePoint, my introduction to being a SharePoint administrator was very similar. I was a server admin at that time. This was like 2002 and Windows Exchange that kind of stuff and my boss told me to setup a webpage that had all of our servers worldwide and put it up on a webpage. And I can’t program, I can’t design webpage, I can’t mesh my clothes for god’s sake. And I look into this task and I’m like, “Well, huh, I got my resume so that’s good. I’m going to need that.” And then I was drinking some coffee and I had like what the Office XP CD and it had a trial or a thing of SharePoint Services 2001. And I’m like, “I never seen something like this in Mac.” And I installed it and put this database out there and held up there and I was hailed as a hero and it was all great and then the company decided that we’re going to write some software on top of SharePoint 2003 when it came out. And my boss said, “SharePoint we’ll I’ve got a SharePoint guy.” And that was the day I became the SharePoint guy.

Carlos: That’s when the enter swag, right?   

Todd: Exactly. I stood up a little straighter, shoulders are a little back.

Steve: Wow, nice.

Carlos: Now, if you could change one thing about SQL Server what would it be?

Todd: Boy, there’s a lot of it. This I will tell you that SQL has gotten better with a bunch of things. I haven’t played with the SQL 2016 PowerShell stuff as much but I know with the previous versions the PowerShell was shall we say odd. As a guy that uses PowerShell for Windows and for SharePoint and for Office 365. The SQL PowerShell seems to dance through its own beat a little bit so that’s one of the things. I think for the most part of SQL are really matured products. And they’ve made a lot of mistakes but they made them a long time ago. So some of the SharePoint stuff maybe around is database sizing, the recovery models and things like that. One thing that always gets me for whatever reason the way I think is security piece is backwards. In my head if I want to give a user permission to the database that should be a function of the database. But it’s not, it’s a function of the user. I have to go in this security find the user you give them an access to the database. In my head it should be database, the other way around. Does that make sense?

Carlos: Got you. Sure.

Todd: Little things like that because there is a bunch of places in SharePoint we have to do that. And then because of that security stuff is in, the master database and the database is running I can see that coming through with the backups. But just little things like that but for the most as a SharePoint admin SQL treated me fairly well. Again like fire and gunpowder I have a healthy respect for it. It has been pretty good to me.

Steve: Ok, so what is the best piece of career advice that you’ve received?

Todd: Professional football is not for you maybe you should be a computer nerd. I think that was probably pretty good.

Steve: Was professional football an actual option at some point?

Todd: No, not even a little.  I was a nerd from the day I was born. So back in High School wasn’t really cool. I spent a lot of time hiding in my locker. Spent a lot of times wondering why girls wouldn’t talk to me and things like that. Turns out being a nerd works out for me. As I look back now it’s cool to be a nerd. No, there’s never a real option for me at all. I think I had a couple of them. One of them something we talked about before we go into air was just it’s all about people. It’s all about who you know and not in an equitism sort of way. But it’s all about whom you have relationships with so never burn bridges. Always give the other person the benefit of the doubt. One thing that I found over the years is anytime I have this idea who’s the idiot who thought of that? The answer to that question is usually a well intentioned smart person who just sees things you haven’t thought about yet. It humanizes some of these down decisions that you see people make. They see other aspects of the problem that you haven’t seen. And so you give other people the benefit of the doubt. And another piece that I got was about 5 years ago when I start working for I was part of an acquisition and I was a remote worker. I wasn’t in San Antonio. I wasn’t with everybody else. And I was like my first review. I had my goals and I didn’t get them all done. And I talked to my boss and he’s like, “How come you didn’t get all the stuff done?” And I’m like, “Well, I’m not in the office and this guy wouldn’t call me back and I couldn’t get hold of this guy.” And he’s like, “Yeah, that’s the truth. Every bit of that is true. Figure it out.” I’m like, “What?” He’s just like, “Yeah, try harder. I don’t care about what they’re doing I care about what you’re doing. Yup all these obstacles are in your way figure it out. Make it happen.” And I was like, “That’s kind of…” But it was actually the advice that I needed. You just got to go through. Bad things happen, stuffs are unfair, how you handle is the key. And this guy wasn’t very empathetic he was kind of a jerk. But it was a good advice.

Steve: Alright.

Carlos: Tod, our last question for you today. If you could have one superhero power what would it be and why do you want it?     

Todd: So it’s funny I was asked this question a few years ago and for my superhero power I wanted to be a SharePoint administrator and I got it.

Steve: Wow, I think you’re the first person on the show that’s had that aspiration.

Todd: And so I think it’s a self explanatory as why I want it. It gives me out on shows like this. No, a superhero power, you know I watch a lot of Sci-Fi and I’ve found almost every superhero power comes with a really negative side effect. I don’t think I have the morals to have any of those powers and use them for good. So I think, maybe the superpower will be I’ll always know whether or not so that I would know when to put that down. Maybe something like that because I swear to God I would screw up any superhero power I had. I cannot be trusted with superhero powers.

Carlos: Not too well. Ok, so we can trust you with our SharePoint but we won’t trust you with powers.

Todd: Yeah, that is it.

Steve: Alright, well it has been great information, Todd. I’ve learned a lot during this episode.

Todd: I’m glad. I think there is not enough done to get our communities together. I think the SQL community and the SharePoint community they need to understand each other a little better. I think that would be good thing.

Carlos: Well, awesome. Thanks again for being with us today.

Todd: And if any of your SQL folks have questions or whatever I’ve got a blog toddklindt.com and I’m on Twitter @toddklindt. I’m not very imaginative with my names obviously. But yeah, tell your folks to reach out to me. I’m happy to hear from anybody.

 

Episode 87: Resource Governor

Why do some features get all the fun?  I would like to be a fly on the wall at some of the Microsoft feature development meetings. There are many very cool features that didn’t get quite enough love to get the adoption they deserve.  In this episode we are talking about Resource Governor and before you roll your eyes, you may be interested to know some of the side benefits of using the feature.  We are joined by Mark Wilkinson from channeladvisor to give us an overview of the feature and how they use it to manage SQL resources and workloads.

One interesting story is how their team won the query challenge last PASS summit using Resource Governor so it is pretty powerful. Mark will also share his experiences using Resource Governor in different working environments. He will also be discussing the basics and tips about building the classifier function of Resource Governor. Furthermore, we will go through setting up I/O caps, resource pools and workloads, and the NUMA node affinity.

 Episode Quote

“I mean, It’s just a really cool tool to use”
“Say what you mean and always ask for exactly what you want”
“Always make sure people know how you feel about stuff and always be honest and straightforward”

Listen to Learn

– Managing SQL Resources using Resource Governor
– Resource Governor Classifier function
– NUMA node affinity
– Setting up minimum and maximum limits using Resource Governor
– Resource pools and workload groups
– DMVs and I/O operations

Mark on Twitter
Mark on LinkedIn
Mark’s GitHub repository
Mark’s Posts on Resource Governor

About Mark Wilkinson

Mark is a Database Administrator at ChannelAdvisor solving interesting problems on large databases.

He is a father of 4, and lives in Raleigh North Carolina. In his spare time he likes to read, speak, and learn new things.

 

Transcription: Episode 87

Carlos: Ok, so Mark, welcome to the program.

Mark: Thanks for having me.

Carlos: Yes, it’s nice to have you back after a good amount of time being with us in Episode 14. Today, we’re going to be talking about Resource Governor, and always to me at the, you guys over ChannelAdvisor using all the cool toys it seems.

Mark: Well, the cool toys, it’s kind of, Resource Governor is a little bit old but yeah.

Carlos: Oh yeah, well you know, but I’m impressed by all the features that you’ve been able to implement in regarding to SQL Server. And Resource Governor is another one of those features and that’s kind of the topic of our discussion today.

Steve: Seems like with Resource Governor, it’s one of those things that people either know use it and love it or they don’t know an awful a lot about it and never tried it. There’s not an awful in between there and the people I’ve talked with.

Mark: Well, it’s funny actually we were in Redmond for a week in Microsoft Labs doing some testing. And part of the gig when you go there is that you have to talk to their SQL developers and telling them the things that you like, things that you don’t like and stuff like that. When we listed the features that we use, I mean there were some features on the list that we had to explain what they were because they were so like, yeah we use CMS. What’s a CMS? Central Management Server. People definitely knew what Resource Governor was but, yeah, we use a lot of features and there’s definitely some stuff in there that not everybody is aware of. Yeah, I like to tell people about Resource Governor because we got a lot of use out of it and some need to kind of things we didn’t expect that came up as well.

Carlos: Sure, well I think that classic scenario and ultimately I guess the definition of Resource Governor is it allows you to manage SQL Server resources and workloads by putting limits on what either logins, or users, applications can use and have access to, right?

Mark: Yeah, anything that’s basically related to the session. You know, so like you said the application name, the user. There are a lot of different things you can classify on but pretty much anything that’s related to the session itself and you can use to determine which group or something it’s classified until limits is put on it.

Steve: Now can you use that for just limiting or can you also use it for some type of monitoring. Yeah, that was when I’ve mentioned that there’s some kind of neat side effects we didn’t expect. The monitoring capabilities, not really capabilities but the metrics you can get out of Resource Governor are kind of amazing. It keeps track of anything that goes through each of the workload groups, from CPU time, to amount of reads and writes, I/O operation issued, total number of requests, all kind of things like that.

Carlos: And you don’t always have limits on those. It’s just keeping track of them because it’s part of the session.

Mark: Exactly.

Carlos: Well then why did you want to implement Resource Governor?

Mark: Yes. It kind of came up because we’re having issues where a certain, actually it might make sense to go kind a level higher and then talk very vaguely about our architecture at ChannelAdvisor. It’s not going to seem alien to most people probably have experienced something like this but we’ve got a set of SQL Servers and then we have got a set of application servers and basically any team can submit work to these application servers. And the work is kind and all those servers communicate with the database instances. That being said we don’t have a lot of control over how many like requests can be made from a given application server or set of application servers. So we were starting to run into situations where maybe one team is using all of the instances and there wasn’t really much left resources for anybody else to use. So we had a find a way to kind of limit that, reduce that a little bit and that’s when we start using Resource Governor.

Carlos: Ok, so these are actual teams then, so you kind of separated by team or I guess by application or these are by login?

Mark: We’re very strict in our application naming in the connection server when you connect to SQL. So any application that communicates with any of our instances should have, I say should, we would love to evolve into it but, you know, sometimes things They should all have an application naming associated with them and the team name is part of the application name so, you know, Team A, their application with the Team A, you know, file process or something like that. So we can use that application name then to push them into the Team A workload group.

Steve: Ok, so if somebody is listening and now they’re thinking, you know, I want to give Resource Governor a try. What do they need to know about SQL Server versions or editions or anything like that as far as trying it out?

Mark: So as long as you’re on a version 2008 and higher you can try it out. Now the features you can try out are going to depend on which version you’re o. Obviously, 2016 has all new bells and whistles which that’s kind of good to mention as well. It’s not one of these technologies that seems like it’s going away. I would imagine they probably use Resource Governor in Azure to kind of separate workloads if you’re using like the Azure database or something like that. But they’ve been adding features to Resource Governor as of SQL 2016 so it’s definitely something they’re continuing to work on. At a minimum you’re going to be able to put some limits on CPU memory and then the concurrent requests. Concurrent request is actually where we started. Like I said we don’t really have a lot of control over the amount of work that’s coming from our application server and hitting the database. A lot of it is customer driven. So the way we originally kind of went at it was that we would reduce the concurrent requests per team so that everybody had some resources to use there. But once you get into later versions you can start doing things like limiting I/O so you can set a minimum, maximum amount of I/O. Like in that operations per second so it’s not like, you can’t limit like throughput or anything like that just the number of I/O operations per second.

Carlos: Right, and start creating additional bottlenecks, right. You’re like, you can only return this many records.

Mark: Yeah, one really actually kind of neat thing you can do, were you guys can’t remember it was kind of a whirlwind but you guys were ate PASS, the summit the 2016, yeah?

Carlos: Steve was.

Steve: Yeah, I was there.

Mark: So Sandisk had a query challenge setup where you had to tune. You had to make this query run as fast as possible but you couldn’t actually touch the query itself. You can only do things to the instance. You only had I think 5 minutes for your changes to be done and then you had to run the query and see what your time was. So we actually won that competition by using Resource Governor. Yes. One of the things you had to do is you had to, one of the things that really helped the performance in that case was rebuilding indexes but there wasn’t really enough time to rebuild all of the indexes that you needed to. But using Resource Governor you can set NUMA node affinity so we actually just dedicated a few CPU cores per index for the index rebuilds. So there’s a lot of really weird and interesting things you can do with Resource Governor if you take the time to learn all. NUMA node affinity is definitely one of those kind of edge cases that helped us there. I don’t know where you might use it in production but it was kind of a neat application, the Resource Governor.

Carlos: Yeah, so next year on your PASS you can’t.

Mark: And we might pull out some new tricks next year.

Steve: Ok, so then given that it’s been around since 2008 is that available on all editions of SQL Server or is it an Enterprise only feature, or who’s going to be able to use it there?

Mark: Unfortunately, it’s Enterprise only. I’m lucky enough to work in a shop where it’s Enterprise so that’s not a problem for me but that is an Enterprise only feature.

Carlos: That was interesting. And chatting with Jessica Moss, they were talking about the 2016 Service Pack 1 and how was that kind of opening up additional features. And taking a quick peek here it doesn’t looked like that one got opened up with the Service Pack 1.

Steve: No, I think I can remember when that Service Pack 1 announcement came out there were some people griping and wishing that Resource Governor had been included in Standard Edition after that. But I think it’s not there yet but maybe one day, who knows?

Carlos: Yeah, we’ll see.

Mark: Yeah, I mean even if they could release the subset of the features I think it would be useful for a lot of people. One, actually, this is super important thing to mention about it. And another thing that I wasn’t aware of when we first implement it at bit it’s been kind of useful, when you implement Resource Governor you set up what are called resource pools and then within each pool you setup workload group. The workload group is where the actual request get categorized to. But for every pool that you create it gets its own plan cache. So you can actually use resource governor as a way to kind of partition your plan cache. So cases where that might be really cool is if you’ve got like we do. We got multiple teams executing procedures. They might be running the same procedures with completely different parameters and causing some weird parameter sniffing issues but as soon as you kind of partition the plan cache those type of issues can go away. So that was kind of another neat side effect of implementing this.

Steve: Interesting. Now, you say they can go away but it seems like it might also hide them a little bit where if you have one plan that’s having parameter sniffing issues that’s only for one group of users. That might be more challenging to track down.

Mark: Yeah, it all depends on your workload obviously. And if somebody messes up and changes their connection string or does something weird that stops their work from being classified properly they might get dumped into the default workload group or something like that and we can see some issues there as well. Yeah definitely, when you’re using Resource Governor make sure that whatever you’re using to classify your workloads is something fairly stable.

Carlos: Well then where did you guys start with? I guess take us down that journey and let’s talk about, you know, I guess you mentioned setting it up. What were the first steps and how would you recommend people going about setting up Resource Governor.

Mark: So like, I think I said anyways, you kind of mentioned. You don’t have to setup any limits when you’re using Resource Governor so I think it’s really good like first step you can take is to just setup your resource pools and your workload groups and create your classifier. So the classifier is a function that lives in master and it is executed for basically every request that comes into the box. So you do want to make sure that it’s pretty efficient function because it’s going to be ran a lot. So yeah, if you’re just getting started you want to see if this is something that you want to use you could just setup the pool and the group. Get a classifier that sorts classifying work into the various groups.

Carlos: Now, you mentioned it should run efficiently but I thought it’s going to be like something that I just chose from setup. How am I going to choose an efficient one?

Mark: Oh no, yeah, yeah, there’s definitely nothing built-in. You’re going to have to write it from scratch.

Carlos: Oh, got you.

Mark: Yes, so when I say efficiently just make sure you’re not doing any extra work. There are some limits on what you can in a classifier function like in schema bounds you can’t access objects outside of master.dbo. So I think that’s actually there just to stop you from hurting yourself.

Steve: So then, just a back of a second, was that a stored procedure or a function you said you put in there?

Mark: Actually, it’s a function.

Steve: Ok.

Carlos: Got you.

Mark: The classifier is a function that you define.

Carlos: I feel like we just, I just found the reason why more people aren’t using Resource Governor.

Mark: Yeah, it could be it.

Carlos: You know, if you have to be building that function to do that for you I feel like that’s going to be a big hurdle to jump through.

Mark: So it’s actually not that hard. The function has one job really. So when I knew request comes in the Resource Governor classifier function looks at that request, looks at some different parameters of that request. So for example there is a few like variables that are setup by default, well not variables sorry, they are system functions. So like app_name fro example that we would use. So for your classifier function you would just look at app_name and then set up a case statement. That’s what I like to do. You could do if else wherever you like to do there but really all you’re doing is writing code that will look at that application name, figure out what group it belongs to and then just return the name of that group. So that’s all the logic there really using your classifier function. It’s one job. It’s just to return the workload group that this work should go into.

Carlos: Ok, so when the session comes in it’s just passing the app_name and then I will tell you what group you belong in.

Mark: Yeah, exactly and like I said, application name is just one example. There is a lot of different things that are available to you. Like username for example is another good one so if you’ve got like maybe all the stuff coming from your application you do want to categorize into these groups and put limits on them and all that. But if something comes in from the web interface maybe it’s using a different username you could then classify that to be unlimited and not have limits at all.  

Steve: So then you’ve got your classifier function set up and it’s sitting there in the master database and then how do you go about using that then through the resource governor.

Mark: So when you’re configuring, configuring is a little bit too much of a big word to say when you’re dealing with resource governor. When you’re initially setting it up, to enable it you just run Alter Resource Governor Reconfigure. That’s all, it is running. It’s not classifying anything but it’s running. So then you run another alter statement where you just set the classifier function and that’s it. So once you specify the classifier function, anything that’s coming into the instance is going to run through that function.

Carlos: We still do have limits here?

Mark: No, there are no limits here. So what you have right now is basically the metrics on your various workload groups. So there is a few DMVs you can look at for this information but it will show you the total number of requests that are coming through for instance. So if you were just interested to see how much work all these various groups are doing all that is available to you.

Carlos: And I think that’s a great suggestion.

Mark: Oh yeah, definitely. It’s a good way to get some insight into what’s going on on your instance without really having to do tons of work. You could probably get this running on maybe, you know, 20-30 minutes.

Carlos: And it’s base lining, right. I want to baseline my system and then I’m going to start making some tweaks.

Mark: Exactly. One thing to mention too is those DMVs are accumulative so as requests come in the number just goes up right. So if you want to monitor all you’re going have to, like I do, you’re going to have to setup a table and then just store the deltas in there from last time that kind of thing.

Carlos: Alright, can you purge those or reset them.

Mark: You can, I can’t remember if you, so I know you can reset them obviously like most things when you reboot the instance or restart the services. I don’t know if there’s any like DBCC functions you could use to clear that out. One of the DMVs though which is kind of interesting because I don’t know if many DMVs have this, but one of the DMVs will actually give you a date of when things were last cleared. So you kind of you know how long it’s been gathering statistics.

Carlos: Right. Yeah, I can’t think of another DMV that does that for me.

Mark: Yeah.

Steve: No, I can’t think of one either.

Mark: It would be nice if they did, but yeah.

Steve: Right, right. Yeah, especially those index ones. They’re having problems like the index get rebuilt and their counts and what not will get cleared. You know like, “What!”

Mark: it would be very nice.

Steve: Yeah.

Carlos: Ok, so can you then maybe elaborate a little bit more on sort of the idea of resource pools versus workload groups, and sort of how those.

Mark: Yeah, so this is kind of a piece of over engineering in my opinion.

Carlos: Interesting.

Mark: Yeah, so I like to think of, I like to think of this as kind of hierarchy rights. You got your SQL Server, underneath that you got your resource pools, and then underneath that you’ve got workload groups. The theory being is that you could have a resource pool with multiple workload groups in it. I’ve never seen anybody implement it like that. I don’t implement it like that either. I usually just do for every, again well run with my example of teams, right. For every team, they’ll have a resource pool and a workload group. The difference between the two is just what kind of limits that you can set at which level. So at the resource pool level you can set limits on the minimum and maximum CPU that can be use, the amount of memory that each request can use, you can set your I/O caps and your NUMA node affinity. And then below that at the workload group is where you can set like max CPU time in seconds so not percentage but actual time. You can also setup the concurrent request limits at the workload group and then also maxed up at the workload group. So there are different things you can set in each level but like I said I’ve never set it up where I’m not doing a single pool with a single workload group in it, you know, for all those various type of work that I want to classify.

Steve: Yeah, ok. So then once you’ve done the classifier function and you have things assigned to the appropriate resource pool and workload group as the connections come in. Have you ever run to any issues if you limit something too much that it causes blocking to go bad or I mean something takes longer than it did before and now other things or bottleneck waiting on that?

Mark: Oh yeah, that’s always a big risk. We’ve had issues.

Carlos: And almost by definition you’re asking for that because you’re now saying in the example of this reporting services query or whatever that hey I don’t want you to be taken So that means you’re going to have to take a little bit longer, right?

Mark: Oh yeah, definitely. And if you’re in the kind of place where you’ve got a lot of different applications hitting your instance but none of them are really aware of each other and you slow one of them down and it starts holding locks or something for longer than it was before. Yeah, I mean it can cause havoc. So it’s always best in my opinion with Resource Governor to start with very high limits that you don’t think you’re really going to hit right away and then kind of walk it down a little bit until you find a nice comfortable place. But yeah, it’s always a risk when you’re implementing something like this that you could kind of rick havoc for other people trying to use your server.

Carlos: So we talked about implementing it in production and do you have this in lower environments as well?

Mark: Yes, we do. If anything is awesome on your development environment to implement this with no limits because of all the reporting you get. One example that I like to give people because it kind of helps it kind of sync in. At any given time I can tell you exactly how much CPU time. One of our application teams are using on any of our instances. And that’s all using, you know, just stuff built into SQL. It’s not some special monitoring. I’ve got the application layer just with Resource Governor I can tell you, you know, what percentage of CPU time was spent by what team in our development environment. And that’s really powerful because then you can go to that team and be like, “What you guys are doing here?”

Steve: Yeah, that’s really interesting because in my experience that development servers are usually a bit slower than production. And you end up with developer griping about why it’s so slow? And often times it’s because there is just not enough hardware and horsepower there basically to service everything the developers are doing. But if you could come back and say, “Well, it’s slow because this team over here is using 90% of the CPU. That might give some interesting insight into who should be paying for the upgrade on that SQL Server. Exactly, another kind of good thing you can do with Resource Governor is you can kind of slow down bursting workloads. So this is also come the development environments, right, your QA tests, automated tests running midnight everyday or whatever. You can set up limits on like concurrent requests for example so that when those tests are running at midnight it’s not going to take everything else down with it. It’s only going to be able to maybe execute three or four requests at a time and everything else will be able to run as it was. It might not run as fast as it was before but it will still be able to actually run.

Carlos: Right, and those normally are performance test anyway. Can this execute, will it complete successfully.

Mark: Yeah, does it work. So when you’re in those type of situations, yes, it’s really a great tool. And you can kind of get more out of your lower resource instances because things are restricted a little bit more that can hit it as hard.

Steve: Ok, so can you share with us maybe a little bit more around the monitoring of the stats or the DMVs to better understand like where someone would go to start learning that piece of it?

Mark: Yeah, sure. I mean, not to self promote too much here but actually I’ve got a blogpost not just this topic – m82labs.com/resource-governor. I’ve got a whole post on, even download the procedure you would setup together the statistics. But I can walk you guys through what I do. Might not be for everybody but what we do in our environment is every, I don’t want to say it’s every 5 minutes I do this, but I’ve got a table set up that stores, it’s going to come from all the different stats and things that I’m interested in. Every five minutes a procedure runs that grabs the latest data from the DMVs inserts it into the table and also uses the existing data in the table to calculate the delta, the change in all these different statistics since last time it ran. And then I push all that information into elastic search. And then from there we can graph it out in any number of tools and that’s our primary way of kind of visualizing how things are performing in our instances.

Carlos: Interesting strategy there.

Carlos: Any other links that we might reference here we’ll have available on the show notes page at sqldatapartners.com/resourcegovernor.

Mark: One thing to mention too and this was kind of surprising. Not to brag on Microsoft or anything but the documentation for Resource Governor at Microsoft’s page is amazing. They’ve got very in-depth documentation on really every aspect of configuration and what all the different DMVs are for, you know, what the different field means. This is definitely one of those cases where I head over there to take a look. It’s a wealth of information there. They even give you some kind of like example scenarios where you might use it so it’s very useful.

Carlos: Ok, so as far as then other scenarios that you might have seen where the reason the people are using Resource Governor, maybe a little bit of background story or more information on like why people are using it.

Mark: Honestly, I haven’t run into a whole lot of people that use it.

Steve: Ok.

Mark: Yeah, I mean for our use cases, I would imagine our use cases are pretty standard where you would see most people using it for outside the monitoring piece. But yeah, I haven’t really seen a lot of people that are using it. I’ll see the occasional talk like a SQL Saturday. But yeah, I haven’t really met anybody else that uses it that could also be because it’s an Enterprise feature and not everybody is in Enterprise.

Carlos: Right, now, you said that in a classifier function that you were using the login name which is how you identify who’s on which team.

Mark: Well, both actually. I use the app name and then in some cases we use the login name as well.

Steve: Ok, the app name and login name. Ok, so even if someone is using a shared login amongst multiple applications you can use that app name to be able to classify it appropriately base off of where that app is coming from.

Mark: Exactly.

Steve: Ok.

Mark: Yeah, and there is nothing special about the classifier function really how you’re writing it. So you can do things like, you know, where app name is like this or things like that so you’ve got pretty much your full range of TSQL that you can use within the function there. Make sure it’s not too expensive.

Steve: Right, right, so any that you’ve come across or anything that like you just tried it out and things went really bad and people might want to know about before giving it a try.

Mark: Yes, definitely so the kind of, you know, derby dragons here kind of scenarios with Resource Governor is you want to make sure that you’re really really read about the different limits you can set because they may not behave how you expect. Max CPU is a perfect example. Like what would you expect max CPU to do. If you set max CPU to 50% you would expect that whatever is in that group can only use 50% CPU, right? That’s not the case. How it’s defined is that in cases of CPU contention it can only use 50%. And how CPU contention is defined is kind of a mystery. I’m not sure. I’ve never actually found a good definition. But it sounds like if there’s other workload running on the box it will try to make sure that this can only use 50% but if this task comes through and that’s being governed. If it’s running in the middle of the night when nothing else is going on in the box it could still use 100% of the CPU even if you’ve got that limit set.

Carlos: So it’s almost like a tapering effect, right? So, you know, my process is running 100 miles an hour and then oh somebody else wants to come onto the track. Ok, well, you got to slow down and so this guy can speed up kind of a thing.

Mark: Exactly, so it can definitely still be useful but if you’re expecting it to just hit a hard ceiling of 50% that’s not going to happen. That being said, in the later version they did add a CPU cap that you could put on that would do just that. So there’s actually minimum CPU, max CPU and then a cap. So there are three different values you can set as far as the CPU is concerned.

Steve: Ok, so then if there’s minimum load on the server and my connection gets assigned to a resource pool with a max of 50% CPU but there is no other load O might be able to use of upwards of 100%. But then if I have a long running job on that connection that maybe runs for an hour, hopefully not, but let’s say it does. And someone else comes along and says, “I need 50% of the CPU”, but that connects and starts using CPU is it then going to take my existing connection that was using around 100% and squeeze it down to 50% or is that going to find a new connection?

Mark: That I’m not sure. I could definitely see where it would have the opportunity to limit it, right. But I’m not sure. I’m not sure what happens to that existing stuff that’s running. That’s another thing I would have to test.

Steve: Since like an interesting blog post there Mark.

Mark: There could be a whole blog post just around the various behaviors of these things. Another interesting one is you can set the minimum and maximum I/O. So it will limit the number of I/O operations that can have per second. But I have found in my testing that even if I set the max I/O to something that is impossible like 10 million to 20 million operations per second. Even when I set the limit that high there are still I/O operations that are being throttled based on what the DMVs are showing so that’s something I going to be looking further but that’s really why those metrics are important to look at. So even if you think you know what some of these limits are going to do I highly recommend people set it up in a test environment and even run some load test against it if you can if you’ve got that kind of setup. And just see if the DMVs are responding how you would expect because the I/O thing I didn’t expect at all. And the reason I start playing with it is because when you enable the I/O throttling even if your limits are completely nuts and something you never going to hit. It actually opens up another DMV that you can look at that will show you the I/O operations per volume, per workload group which is really cool because if you’ve got TempDB on a different volume than your data files you can actually look at TempDB utilization per workload group using that DMV. But once I start seeing that’s limiting the I/O even when it shouldn’t I can go of nervous about that one. It’s going to take a little more research.

Carlos: Yes, that sounds like caution flag there.

Mark: Yeah, definitely and that seems to be the case of the lot of these. There is just this little weird kind of scenarios where you might want to do a little more research. Fortunately though like I’ve mentioned before, the Microsoft’s documentation on this actually covers most of this scenarios. Max up would be another interesting one. You can set a max up for workload group that’s higher than the max up for instance and it will go ahead and it will honor that. But if there’s a request that’s coming through that manually defines a max up that’s higher than what the group is it will not honor that. It will only go up as high as what you have set at the group level. There’s a lot of, you know, like tweaks that you can do but, yeah, and see how it gets applied.

Mark: If you’re using like a Developer Edition, since obviously this is an Enterprise Feature it should be available on Developer Edition as well. So even if you’re not going to be using it on production because you’re running on Standard if you could get it on your development environment it could have a lot of use there or even just from the reporting aspects and also potentially from being able to kind of throttle your work a little bit so that maybe you can get a little bit more use out of those testing instances.

Steve: So shall we go ahead and do SQL Family then?

Mark: Yeah, let’s go ahead.

Carlos: So Mark, tell us how you get started with SQL Server?

Mark: I’m pretty much your standard accidental DBA. I was in a company and the DBA left. There was really nobody else left to fill the shoes so I kind of stepped up and started learning about SQL Server. I want to say we are running on SQL Server 2000 at that time. And I just kind of stuck with that ever since.

Steve: So if you could change just one thing about SQL Server, what would it be?

Mark: We actually, I’ve be interested to see what listeners have to say about this one because we would have a use for this feature at ChannelAdvisor but when we talk to Microsoft about it they weren’t. I don’t know if they were entirely sold on the idea. We would desperately love to have a query hint called notindex. So instead of forcing an index you can actually force a query to use any index but this index. We’ve got a lot of cases where we have an index that’s been created for a very specific purpose. But other queries will pick it up and start using it and whenever they do performance just tanks.

Carlos: Sure, because they’re grabbing on the locks on the index.

Mark: Well, not even that. It might just be the index might not just be that great for this other subset of queries.

Carlos: Oh, really but the optimizer is still picking it.

Mark: Yes. I don’t know why it’s picking it. But yeah, we do have cases where the optimizer will pick an index. We’ll hop on the box to see what the issue is and like, “Oh, this index again, great.” But you can’t get rid of it because there are a lot of queries that actually benefit from it, so a notindex hint would be fantastic.

Steve: So basically what whatever index you would normally pick like exclude this from your list.  

Mark: Exactly.

Steve: Yup. I would be interest in that.

Mark: Yeah, we would love that.

Carlos: What’s the best piece of career advice you’ve ever received.

Mark: So I thought about this here. And I honestly haven’t received a whole lot of career advice directly. But indirectly just from kind of absorbing a manager that I had previously. I think what I could take away from that experience with him was to always say what you mean and always ask for exactly what you want. So don’t get caught up in the kind of the games, the political games that happen in the office or whenever you have in a work. Just always make sure people know how you feel about stuff and always be honest and kind of straightforward. Even if it doesn’t always work I think in the long run it is definitely good advice to follow.

Steve: If you could have one superhero power what would it be and why would you want it?

Mark: I don’t remember which superhero this is but there might even be a few superheroes that have the ability to kind of duplicate themselves. Like I may have mentioned before I’ve got four kids so it’s usually pretty hectic inside the house. If I could duplicate myself and get some more stuffs around the house that would be fantastic. And then of course we’ve got SQL Saturday’s every weekend. All kinds of conferences and all that stuff so yeah duplicating myself would be very beneficial.

Carlos: How many SQL Saturdays could you go to if you could duplicate yourself several times?

Carlos: The mental strain I guess of concentrating on all of those different places. It’s also interesting as I think about duplication. It seems like and I guess this is just from my limited experience that those villains generally that have the duplication ability are always using it for nefarious purposes, right? So to have someone to actually be able to use it for noble purposes would be a change of pace there I think.

Mark: There we go.

Carlos: Yeah. Well, Marc thanks so much for being with us today.

Mark: Thanks for having me it’s been fun.

Steve: Thanks Mark, definitely some good information about resource governor today and thanks for being on the show.

Episode 82: Indexes Part 2

In the previous episode we discussed the basics of using indexes, what they are, how to use them, along with some common problems and solutions. One of the problems you will certainly face as time passes is index fragmentation. While indexes will certainly help your database perform well, they do require care and feeding–meaning there is some administrative overhead.

In this episode, we continue our conversation with Randolph West and discuss what we can do to take care of indexes and keep them well maintained over time. There is an option to use default maintenance plans and to do fragmentation and updating statistics, but should you use them? What are the alternatives?  We didn’t forget to consider strategies for deciding which indexes to keep and which can be dropped.

 Episode Quote

“That’s purely Microsoft’s fault because generally, people are using the maintenance wizard are using it because they don’t know how to do it with [other] tools.“

“You still need to update the statistics after you’ve done a reorganization which is something that people tend to forget”

Listen to Learn

  • General maintenance needed to take care of indexes and keep them well structured on an ongoing basis
  • Good strategies for deciding which index is get included and which ones we need to ignore
  • Should you use index maintenance wizard
  • What column store indexes are

About Randolph West

IndexesRandolph West solves technology problems with a focus on SQL Server and C#. He is a Microsoft Data Platform MVP who has worked with SQL Server since the late 1990s. When not consulting, he can be seen acting on the stage and screen or doing voices for independent video games.

Transcription: Episode 82

Steve: So then let’s jump a little bit more into the care and feeding side of things, and I know we touched a little bit on rebuild and a little bit on statistics but if we look at what are the general things that we needed to do to take care of indexes and keep them well structured on an ongoing basis?

Randolph: I’d be interested to hear your side because I’ve been speaking a lot. I’ll open if I had any feedback.

Steve: Ok, I mean, so really I think on that there’s the concept of reorganizing versus rebuilding as you get fragmentation over time. And then there’s statistics and I think that, or let me start with a very bad way of doing it that I see quite often that is if you go and use one of the default maintenance plans. And the maintenance plan you just go and click let’s rebuild my indexes, let’s reorganize my indexes, then let’s rebuilt statistics. One of the things you end up doing is rebuilding which basically rebuilds the entire index from scratch without, basically if you’re in Enterprise Edition you can do that on online mode but if you’re not in Enterprise Edition that’s an offline operation. But what the default maintenance plan people sometimes check is that they say, “Let’s rebuild them.” And that gives you the best structured index you’re probably going to end up with and it also rebuilds your statistics. And then they reorganized them and then they rebuilt statistics with like a 5% sampling or something like that. And in that example you end up with a whole lot of work that’s actually giving you sort of a negative approach or negative impact. Whereas if you just rebuilt them to begin with and then not done the reorg, and not done the statistics rebuild afterwards you would be in a lot better shape.

Randolph: So, you know, those defaults that you’re talking about, the reason that happens is that because the maintenance plan wizard has got rebuilt then reorg, then statistics in that order in the list.

Steve: Absolutely, yeah, people just click the checkboxes or select them and that’s what you get.

Randolph: So when I see that I blame Microsoft for that. That’s purely Microsoft’s fault because generally people here using the maintenance wizard are using it because they don’t know how to do it with say, Ola Hallengren or MinionWare tools so just to name two, there are many. So that’s Microsoft’s fault and every time I see that I do not blame the DBA, or the accidental DBA, or whoever is managing it. It’s not their fault. But it is causing a whole bunch of work that is unnecessary.

Steve: Yes, it may not be their fault but it’s their pain they have to deal with. When I see that I generally going a little bit of education to explain why that’s bad and turn those off. And like you said, replace it with MinionWare or Ola Hallengren scripts. Both which are far better solutions at doing at than the default maintenance plans.

Randolph: So the one thing that I wanted to mention, Steve, is that an index rebuild is, as I mentioned before is size of data operations. So if your table is 10GB and you need to rebuild it, if you’re doing a clustered index rebuild it’s going to require 10GB of space in the data file and then it’s going to require at least 10GB in the transaction log. And if you’re rebuilding a clustered index it is going to by default rebuild all of the non-clustered indexes on that table that referred to that clustered index and that means all of them. So if you’ve got a 10GB table and you’ve got 30GB of non-clustered indexes on that table it’s going to require at least 40GB in the data file and 40GB in the transaction log because it’s going to rebuild everything. So if you’ve got a maintenance plan that says let’s do a clustered index rebuild and then before that you had the non-clustered indexes rebuilding it’s going to do them again anyway so it’s a nightmare. This is why I say it is statistic updates more regularly are going to be a lot more effective than doing index rebuilds over the long term. You still do need index maintenance but let’s not do it every night.

Steve: And that’s where I like the third party solutions that we mentioned there for the ability to rebuild those where you have options to say given a certain level of fragmentation I want to choose to reorganize this or if it’s even more fragmented then we’re going to choose to rebuild it. And some of them even have the option of only do this for a certain amount of time.

Randolph: Can you go into detail what’s the difference between a rebuild and a reorg is?

Steve: Ok, so with the rebuild it’s basically, technically it’s like the equivalent if you drop the index then recreate it, and then it’s going to be completely rebuilt from scratch and it will take up the log and disk I/O like you talked about a moment ago. But with the reorg, what that’s going to do is going to the index and basically restructure or compact the index pages as needed. It will go through and shuffle them around to compact them down to a smaller amount of space. And it will do that page by page as it goes through there and that can be done while things are online in Standard Edition. But it’s slow and the end outcome is not as good as a complete rebuild. It might get you real close but it’s not going to be good as complete rebuild but the benefit is you can keep it online.

Randolph: And you still need to update the statistics after you’ve done a reorganization which is something that people tend to forget as well. The statistics need to be up to date so that the query optimizer knows where the data is and how much of it is in there, and what the distribution of the data is in that index so that it can make the right choice about what type of query to create because a lot of queries is coming down to SORTS and JOINS and that’s where you find your biggest performance bottlenecks. A SORT is a blocking operation so everything in the query has to stop until the sort is finished. So if your index is up to date and it’s sorted correctly then it doesn’t have to do that operation it’s already done. Whole bunch of things about loops, we can talk about here as well which I’m not going to. An up to date statistic is much more valuable to the query optimizer than a defragmented index.

Carlos: There’s another point there even that, we’ve talked about current feeding of the indexes, right, rebuilds vs. reorgs. However, we can also have statistics that are outside of the indexes, right, on the table. Those are also providing feedback to SQL Server to the optimizer. And if we neglect those we’re still kind of the same peril because the optimizer maybe using the statistics but we’re not giving them that current feeding.

Randolph: Correct so, it’s valuable to have a, I hate this word, holistic approach to your current feeding is that you need to look at not just indexes, and not just fragmentation, and not just statistics but all of them. And in a way that is effective and efficient for your maintenance window. You don’t want to go and rebuild indexes every night because it doesn’t make any sense. But you also don’t want to stop doing statistics updates.

Steve: Yup. One thing I commonly see with that is rebuilding of the statistics or updating the statistics where somebody will have a job that goes through and blindly update every single statistic in the entire database every night. And that can have a pretty big performance impact on things depending on the size of your database. And a better approach to do that is to look at which tables or indexes actually have changes and only update those that have a fair amount of changes to them on that frequent basis. And then maybe on a regular basis of once a month or once a week you do an update on across all of them.

Randolph: Yeah, that’s a very good point. The amount of data that changes in a table if you’ve got automatic statistics enabled, we’re going to go a little bit of a segway now, by default automatic statistics are enabled. There are other settings that you can choose in 2016. There’s a whole bunch of new stuff per database which you can choose. Let’s talk about everything before 2016 because that’s what most people are still using. Statistics will only update themselves if a certain percentage of the table has changed. The value has changed across versions recently but the fact is that statistics will only update themselves if a certain percentage has been changed. So if you’ve got a really large table with tens of millions of records or rows and you have only one million that have changed recently statistics will not be updated automatically. So it does pay you to use one of the third party tools we’ve mentioned like MinionWare or Ola Hallengren’s maintenance solution where it will go and check, has any number changed and then you can update the statistics there and that will help your query optimizer as well.

Steve: The other side effect you can get with that is that if you do hit that threshold where it decides it’s time to rebuild those statistics that might be in the middle of your peak load during the day. And right when you’ve got a lot of traffic coming to your database and index or statistics rebuild occurred that can have impact on things too.

Randolph: That in fact is one of the recommendations for SharePoint style databases. There’s a lot of things I don’t like about SharePoint but the fact is a lot of people use it so one of their recommendations is to turn off automatic statistics updates on any SharePoint style database. That includes CRM, it includes GreatPlains, all of those even. What’s the one that I used? Whatever it is, there’s a whole bunch of them where you should turn off statistics updates automatically and then include statistics rebuilds in your maintenance plans. So it’s just to keep in mind each database is different.

Steve: It will probably apply to Navision and Dynamics as well.                     

Randolph: Yes that’s the one. Yeah, Dynamics is the one I’m trying to remember. Thank you! So any SharePoint style database they do recommend turning off statistics updates automatically and to do them in a separate maintenance window. So it pays to do your research to make sure that you’re doing the right kind of maintenance plans for your database and your instance.

Steve: Yup, very good point.

Carlos: So I guess a couple of different things we talked about all of these components. And I guess let’s now talk about some of the benefits right, so all of these things we have to go in, right, kind of the holistic approach, kind of having to know our data, getting more familiar with it. Ultimately to what end are we going to do that? I think, so we’ve talked a little bit about performance. I guess we should probably talk about how that performance gets into the system if you will or the mechanisms that cause the improvements?

Randolph: Ok, so when I do my performance session, what I say is, “You’ve got expensive items like your CPU. You’ve got less expensive items like your RAM, and you’ve got even less expensive items like your hard drives.” So your CPU is the most valuable thing to you because the SQL license is per CPU core and you want to make sure you’re using the most efficiencies of your CPU and memory as you can. What an index does is, we spoke about this before, it is a copy of the data so you want to keep your indexes as efficient as possible so that if you’ve got a large table you don’t want your index to be large as well. You want it to be smaller so that less of it is in memory because that’s what this game is about. SQL Server is all about being in memory as much data as possible in memory. So for Standard Edition up to 2016 even you’ve only got a certain amount of memory that you can access. 2016 Service Pack 1, the limit is still there but that’s your buffer pool that is in memory as opposed to everything. But the fact is that there are limits to the amount of memory you can use for SQL Server specifically with Standard Edition because not everybody can afford Enterprise. So you have to manage the amount of data that’s going into memory as much as you can and that is most effectively done by the right kind of indexes for your queries. And that’s also why you don’t want to have duplicate indexes because it will also be in memory. Also, you don’t want to have wide indexes because they will take up more memory than they need to. And that’s why included columns are very handy way to reduce the size of your indexes that’s why we have filtered indexes. All of these things to try and reduce the amount of data that is in memory so that we can do a lot more with what’s in the buffer pool.

Carlos: So then how do we go about or what’s the best ways to determine, we talked about looking at execution plan that kind of gives the recommended index there. So what are some good strategies to go about deciding which index is get included and which ones we need to ignore?

Steve: One approach I take on that is if I’m looking at an execution plan and it suggests an index. If it’s a development environment one of the things I’ll try initially is just create the index and see how it helps and I’ll drop it, and then go look and see is there a similar index. I mean, if the index that it suggested was beneficial I’ll go and see if there’s a similar index that could be modified or added to that would give the similar and a fact that the suggested index was doing. And sometimes that means you’re taking existing index and just add in an include to it or taking an existing index and add in another column into it.  

Carlos: Ok, so that’s kind of a dev environment, one query at a time.

Steve: Yup, and the other approach that I see there is I can even go and look at some of the missing index DMVs to go and figure out. Actually you look at DMVs for a handful of things, one is to figure out what are the missing indexes and figure out the overall cause associated with those missing indexes. And then come up with what are some good ones to add that are not going to be duplicates of other indexes and then see how that improves performance. You can also, using DMVs, go in and look, you can figure out what are your duplicate indexes. If you have duplicate indexes oftentimes you’ll see that one of them might be not being used at all and the other one is being used a lot. You can go in and drop one of those. However, you need to be careful when you’re looking at unique indexes. You don’t want to drop a unique index versus a clustered index or non-clustered index you want to look at and drop the right one there. Another thing to look at is big clustered indexes, that’s another thing that you track down is what are the clustered indexes that are really wide. And that really wide means it’s going to ripple through all the non-clustered indexes with that extra size. You can also track down unused indexes. What are the indexes that are on your system that are not being used? Now when you look at that you need to be very careful because that’s only going to be the ones that haven’t been used since the SQL Server instance restarted. Also, if you rebuild and index I believe it clears those index usage stats.

Randolph: Sometimes.

Steve: Sometimes, exactly. So it’s something that if you see there’s an index that is not being used you might want to track it over a few weeks to a month and confirm it, yeah it really isn’t being used and then go and take a look dropping those.

Randolph: That goes back to one of the best practices is to have a baseline. So know what your system is doing and track that overtime and then you can refer back to that baseline and say, “Well, this is exactly what’s happening. This is different.” And go from there. Yeah, that’s a good point.

Steve: Yup, and that reminds me of a project, Carlos, that you and I worked on. Were we setup a monitoring component that ran for a month and kept track of unused index details, and log them every few hours, and then we went back at the end of the month and reviewed the ones that over the entire month had no utilization and then suggest those as possible candidates to be dropped.

Randolph: That reminded me of Pinal’s comment as well that he’ll only look at the database that’s been running for a week. The one thing I wanted to mention is the DMVs that you’re referring to there’s one in particular, there’s a group of diagnostic queries that Glenn Berry, Glenn Alan Berry, from sqlskills, he uses and maintains them. He’s got a couple of, and so does Brent Ozar and a whole bunch of other people. But Brent Ozar’s one called as sp_BlitzIndex and Glenn Berry’s one is included in his diagnostic scripts. It’s a very similar query and what it does it waits your missing indexes as according to number of scans, number of seeks and something called Average User Impact. Now that Average User Impact number can be, I think it’s unit less, I’m not quite sure how does that number is calculated but if it’s really in a high impact like tens and hundreds of thousands then usually I will go and look at that first and say, “Well, how many times has this index been used in terms of number of seek and scans.” And if it’s a high usage index that is missing or a high impact then I will usually create that without too many modifications if it’s not too wide or doesn’t have too many include columns.

Carlos: I think it’s the number of times requested multiplied by the number of hits.

Steve: Yup, and just to jump in there I think I’d feel a little bit left out I didn’t have the chance to mention Database Health Monitor on that. In Database Health Monitor there are many indexing reports similar to what Randolph has described there.

Randolph: Certainly, from Steve Stedman’s solutions has a magnificent and free tool called Database Health Monitor which I have used. I don’t even get paid for this statement. It’s a good product. It’s free which is even more amazing. This is the great thing about the SQL Server community. There are a lot of free tools that are out there that are adding value all the time. And all that people asked is that you recognized them and I recognized Steve as a valuable member of our community. That ends the sponsorship message.              

Steve: Thank you for the shameless plug.

Carlos: So it’s interesting, I guess, and maybe I’m being a little bit paranoid but using, so when I’m looking at that impact and I will take that and also using the ones that I’m interested in adding. Of course I want to do what Steve mentioned looking for duplicates or kind of what’s there, right? Then am I going to get over my threshold of ten or whatever just kind of taking into consideration what additional load am I putting on this table by creating the index. And while you want to test that out mostly the environments that I worked on they just don’t have a good mechanism for creating the similar load as I have in production. So when I go to implement that index one of the first things I’ll start to look at is those usage stats, right. Because I want the usage stats in the index that I just created to be going up because I know that, “Ok well, wait a second what did I do? Is this still a really good candidate?”

Randolph: Yeah, that’s a very good point.

Carlos: So I guess, other that’s how are going to go out and creating them, couple of other things that we didn’t really get into some of the other new indexes like column store or even XML indexes. But I guess other thoughts about when to start looking at these other, we even try to approach that now. Maybe I feel like we should punt on some of those.    

Randolph: I can briefly mention about XML indexes that the way they work is they’re going   to be larger than your table or your XML column. The reason being is that it will create an internal table according to the structure the XML document or XML field that you’re indexing. So it actually expands out the XML data into an internal table and then indexes that so you could end up with a column that is say 100kb maximum. That’s a small one and you can end up with an index that is 500kb, or gigabyte, or tens of gigabyte because it’s creating an internal table under the covers. A system table that it is then indexing so be wary of XML columns in that if you’re going to index then make sure that there’s sensible indexes that they’re only indexing certain nodes inside that XML document and be aware of that. And also I’d like to add here that if you use the XML data type, the data going into that XML row or column is not going be the same as what you get out. It does modify the XML data going in for whatever reasons.

Carlos: Say that one more time.

Randolph: If you have XML data that you put in for auditing reasons for example. If you pull it out it’s going to have been modified somehow. Either the tags will be slightly different or the spacing will be slightly different so do not use an XML data type to store audited records of XML types. If you want to keep an absolute record of what your XML looked like as it came out of a web service for whatever reason store it in a varchar or nvarchar column instead because in it it is identical. If you put it into an XML data type it will modify the data. It is still the same data but it will slightly modify. The tags will be slight different or whatever.

Steve: So then as far as column store indexes I think that’s something we could probably dedicate an entire episode just to talk about.

Randolph: We definitely can.

Carlos: I think you’re right.

Randolph: A column store index is neither an index nor clustered so it’s so complicated. In fact, there is a series of post. I think it’s Nico who has done it over one hundred posts on how column store indexes work. We could talk for days on that. They are fascinating and completely subverts what you think you know about indexes, and data, and row level storage and all that kind of stuff. It’s fascinating stuff.

Steve: Yeah, and just a few notes on that I think that it’s one of those things that is great for data warehousing or OLAP type things. And may not always be the best option for your OLTP side.

Randolph: At the same time you could have reporting style queries in your database and with 2016’s optimizations for column store you could have which can change and can be clustered and all sorts of stuff. You could have some stuff that in your OLTP environment that could be OLAP style indexes. Yeah, there’s so much to think about there.

Carlos: We saw that in Episode 78 with Brian Carrig. They talked about using their column store index in a transactional environment.

Steve: Yup, and then one of the things that I’ve seen that is one of those misunderstandings around column stores is that if you, because column store does the compression on each column, is that people think of it as I don’t have to have any non-clustered indexes if I have a column store index. And that’s not true at all. And that if you’re looking at a bigger table and you’re using column store you may still need to have some non-clustered indexes on it as well.

Randolph: Oh yeah, so do you research. Read all one hundred and whatever post. Become an expert then implement them.

Carlos: So we’ve talked a little bit about, so implications, how do we know what to use. So space considerations we talked a little bit about more in the rebuild process, right? That we’re going to need additional space in our data file and our log file things like that. I think we’ve had a pretty good conversation there. I guess one last idea I’ll throw out that we can bat around a little bit. We talked a lot about having multiple TempDB files and so one of the thoughts that I’ve seen out there is if you’re potentially, I won’t say, so what’s the value there is that if you’re looking to spread out or move your indexes from where your data might reside. One way to do that is to create another data file potentially on a separate storage and then rebuild those indexes.

Randolph: Ok, there are two things that I can talk about there. The first thing I wanted to mention is if you do need to move your table into a different file or file group all you have to do is do a clustered index rebuild and target that file group as the new location for the clustered index because remember the clustered index is you data. So that’s a handy way if you need to move your table into a different file group that’s how you will do it. The other thing is by virtue of that definition there is that you could have your non-clustered indexes in a different file as your data and you might want that for improved I/O performance or if you’ve got your read-only stuff in a different file group or all sorts of reasons for that. It’s very handy for splitting your load on the I/O level. Less of a problem these days but it’s a nice thing for large tables to split your non-clustered indexes from your clustered index so that’s not reading from the same portion of the disk or the I/O subsystem or both.

Carlos: Right.

Steve: Just to note on that index rebuild for a clustered index to move it to a different file group although that will move the table and the entire index. One of the things that could be left behind when you do that is any of the varchar max or nvarchar max or other large items that are located outside of the data pages for that table.

Randolph: Yeah, off row data will be affected. Steve, I don’t know because I’ve never tried. What happens if you tell it to do the index rebuild with the log compaction enabled.

Steve: You know, that’s a good question. I have not tried that for a long time.

Randolph: I smell a blog post.

Steve: Yup.

Carlos: Ok, very good. Well, awesome. I think great conversation and obviously there’s a lot more that we could talk about indexes but I think this was a noble attempt at covering some of the basics and getting into some of the nitty gritty as well.

Randolph: Yeah, the fact is that indexes don’t stand by themselves. They are a very important part of everything including statistics and everything else so don’t think that once you became an index expert you become an index performance tuning expert because that’s not true. You have to have a very broad knowledge of how things work in a number of different fields used upon to get the best performance out of your system. And there’s nothing wrong with good enough. You don’t have to have 100% defrag indexes. You don’t have to have indexes rebuilt. You can have them reorganized. Don’t have to have them reorganized at all if your statistics are up to date in certain contexts. There are a lot of tradeoffs that you have to think about when doing your maintenance plans and indexes form just a small part of that.

Steve: Yup.  

Carlos: Great.

Steve: Very good point.

Carlos: So shall we do SQL Family?

Steve: So how did you first get started with SQL Server, Randolph?

Randolph: Steve, that’s a good question because when I originally got this question I had to think very hard about it. I have been fascinated from a personal level about organizing data. Back in my youth I would create lists and lists of things CDs, books all that kind of stuff and I would do them in the paper equivalent of Microsoft Excel. And then when Lotus 123 came along I started playing with that because my dad worked at the bank and he was a 123 expert. If I’d know about VisiCalc I probably would have used it. And then I got into Microsoft Access because it was there. It was a very old version of Access. I think it was version 2 that I started with and then I started working at a PeopleSoft implementation partner in South Africa where I am from. And that’s where I first work with SQL Server and Oracle at the same time, and I was not a fan of SQL Server. I have to be honest. At that time it was version 6.5 and version 7 that’s around the time I joined so there were issues because as some of your listeners may know SQL Server’s engine was rebuilt around 7 time, 6.5, 7 time. In fact, if you look on MSDN you can download 6.5 and you can download 2000 but you cannot download version 7. It is very difficult to come by. There are maybe good reasons for that. I got exposed to SQL Server that would have been in 1997, around there. So yeah, that’s where I first got exposed but then I didn’t really play with it until I got to the bank. I was more of an Oracle guy. Got to the bank in 2006 so there was a large gap of not playing in SQL Server and then I couldn’t figure out why DESCRIBE, the keyword DESCRIBE wasn’t working in SQL Server. For all of you Oracle office you notice there I prefer SQL Server now because as I said before a lot has changed. It is a much better product than it was in 1997. In fact, I think it has surpassed Oracle. I think it’s better than anything else as well and that’s because I’m trying to keep abreast of all the new stuff. I don’t want to be stuck in the past and have assumptions about the product. I want to play around with the new stuff so. That was a long way of saying 20 years or so.

Steve: Ok. Well I know that in the late 90’s Oracle was a much better product than SQL Server.

Randolph: It really was.

Steve: Yup, and that has changed. That has changed significantly in the last 17 years.

Randolph: Yeah, it has.

Carlos: If you could change one thing about SQL Server what would it be?

Randolph: I had a pot for this but I’m going to change my mind. The pot answer is in-memory processing on SQL Server, in-memory OLTP which was called Hackathon, that was the code name for it. There is a transaction log operation that happens even for stuff that you don’t want kept. It’s still being backed by disks somewhere. It may not be in the transaction log itself but it is still being backed by disk somewhere. I would like to have in-memory objects that do not have anything on disk backing them at all. They must just be there, and if the server goes down, oh well tough luck.

Carlos: Alright. Yes, I remember seeing you in the Bob Ward presentation at SQL PASS talking about the in-memory OLTP.

Randolph: There’s some very very cool stuff with in-memory OLTP that I’ve been playing with. I’ve been blogging about it as well on my blog bornsql.ca. That speaks about some in-memory ops and I found. Interestingly I found another issue that I’ve spoken to Jonathan Kehayias about where Temp tables are slower than Temp variables in certain cases. So I’m investigating that at the moment and it had to do with my in-memory investigations.

Steve: Oh interesting, I’d like to see what you find there.

Randolph: If I could change, that was my pot answer about SQL Server what I would change. What I would change is having a TempDB per user database as a configurable extra.

Steve: That would be nice.

Randolph: In other words it doesn’t have to be default but if I need a TempDB for a particular database I would like to have that as a separate entity so that I could manage it separately and keep track of the objects that are happening for that particular database. The architecture for that is going to be really difficult to do so I don’t know if Microsoft is going to do that but that’s what I would like.

Steve: Alright, so what’s the best piece of career advice that you’ve received.

Randolph: Oh, it’s going to be along answer as well, sorry. “Let your work speak for itself” is the first and foremost one. So it doesn’t matter what people say about you if your work can speak for itself then you don’t have to say anything. That’s the best piece of advice that I had. But the other is “Everything you say and everything you do is being recorded somewhere” so please treat people with respect. Treat people as you would have them treat you and don’t say things on voicemails that can be played back to you at meetings with the CEO and saying Randolph don’t do that.      

Steve: I like that too.

Carlos: We won’t ask for the personal experience that led you to that advice.

Randolph: I have at the end of the year or every now and then when it’s a quiet time on my blog I say career limiting moves and that was one of them. So there are a bunch of career limiting moves that I can recommend to you. Don’t do because I’ve done them not because it’s good advice but it’s because I’ve done them so don’t do that because I’ve done it, and I will be one of them.

Steve: Alright.

Carlos: Our last question for you today. If you could have one superhero power what would it be and why do you want it?

Randolph: I thought about this and I honestly don’t know. I liked Pinal’s when where he could touch a server and know what’s wrong with it. The problem is whenever I touch as server if it’s going to break it’s going to break then and there. Maybe I want the reverse of that superpower.

Steve: To just know without touching the server.

Randolph: Yeah, you know per cost of maintenance is one thing. But if I’m in a room and it is going to break it is going to break while I’m there which is good I guess in way because you then you know that we can figure it out then and there. But I’d like the reverse of that that would be cool.

Steve: Ok.

Carlos: Well, awesome. Randolph thanks so much for being with us today.

Randolph: Well, thanks for having me it’s a topic near and dear to my heart indexing and maintenance and stuff like that so I’m glad I got an opportunity to speak to you guys about it.