Episode 89: Lone DBA

One of the neat benefits of putting the podcast together is talking to a wide spectrum of folks with varied backgrounds.  As we get more feedback from listeners, the idea of the Lone DBA emerged.  The Lone DBA is someone who is the only person responsible for database administration at their company.  These tasks may have been assigned or they may have simply been required to help support an application.  Steve and I thought we should explore this a bit and to help give us some context, we invited Monica Rathbun to chat with us about some of the challenges she’s faced.

Besides the challenges, Monica shares tactics and techniques to juggle many tasks while being a lone DBA. This includes using social media as a way of building connections with different experts, getting the right tools for the organization’s needs and making your boss your supporter. Have your own perspective on this?  Let us know in the show notes for the episode.

Episode Quotes

“To be a lone DBA, you’ve got to have 48 arms attached to you, doing 20 things at a time”
“It is important for the leaders in the community to mentor and pull people in”
“Making network connections gears you towards your future career”

Listen to Learn

  • Techniques, tactics and challenges of being a lone DBA
  • Using social media in building network connections
  • Managing huge amount of task as a lone DBA
  • Setting the right tools for the organization

US News DBA Job Outlook
Monica on Twitter
Monica’s Blog

SQL in the news
Near-real-time replication from SQL Server to cloud

About Monica Rathbun

Lone DBAMonica is currently a Sr. Database Administrator/BI Architect at Massimo Zanetti Beverage, USA in Norfolk Virginia. She has been a Lone DBA for 15 years, working with all aspects of SQL Server and Oracle. She is co-leader for the Hampton Roads SQL Server User Group and can be found on Twitter daily. When she’s not busy with work, you will find her playing taxi for her two daughters back and forth to dance classes.

Transcription: Lone DBA

Carlos: This is Carlos Chacon.

Steve: I’m Steve Stedman.

Sean: I’m Sean McCown.

Carlos: Sean, welcome to the program.

Sean: Thanks, guys.

Carlos: Yes, nice to have you back. Episode 41, you were with us and since then you made a couple of changes, added Steve to the program, and it’s nice of you to come back and chat with us.

Sean: Is this my second or third time on?

Carlos: Oh, I thought it was just the second time.

Sean: It’s possible.

Carlos: Yeah, you’re just a popular guy. You’re just everywhere, here and there, everywhere.

Sean: Oh, God, I wish.

Carlos: Well, so today, we’re going to go backwards a little bit and we’re going to be talking about a command and then we’re going to back into the why we want to be using it and some of the issues around it, so our conversation is going to focus on DBCC CheckDB.

Sean: Oh, one of my favorites.

Carlos: There you go.

Sean: An oldie birdie.

Carlos: That’s right, and the infamous or what made, maybe not what made him but definitely what added to what Paul Randall is today. He likes to frequently give credit for building the CheckDB command, at least, the original one, I think it has gone through some iterations since then.

Sean: I mean he rebuilt it on 05. He didn’t build the original one that I’m aware of. He just did the major overhaul in SQL 2005, right?

Carlos: Oh, the way I remember him telling the story was that he and his team built it originally.

Steve: I think what I’d heard was closer to what Sean was saying, that he redo of it in 2005.

Sean: Yeah, because he was always talking about how it was really messy and how bunch of the error messages were kind of fubarred and they didn’t really give much info on how he went through and straighten all that stuff out in 2005.

Carlos: Got you. Ok, well then, there you go. So I stand corrected. So he made it a little bit better and so ultimately the question is why would we be wanting to run the DBCC CheckDB command?

Sean: Wow, that’s a big question. The general answer, the lightest answer is because stuff happens when you’re dealing with media and typically magnetic media. All kinds of things can go wrong in those rights whether those things that go wrong come from the magnetic media themselves or some kind of memory corruption, sometimes CPU errors can cause things since you can’t physically see it, right? It’s not like sending out a tray of food where you can see that the order served is in disarray, right? You need something to tell you when thing aren’t right or you’re going to be making very bad decisions and you need something that helps you fix on to. I mean, people take that kind of thing for granted but it’s just as important now as it ever was.

Steve: So then with the CheckDB process when you say when things aren’t right, I mean, what type of things are going to pop up that aren’t right that you are going to know about because of CheckDB?

Sean: Did you say what are some of the problems that you wouldn’t know about because of it?

Steve: No, no, what are the things that you’re going to find out about when you run CheckDB if you had problems that you maybe hadn’t run it in the past?

Sean: Right, so you can have problems like out of range errors, so you get data that is out of range for a specific column. That used to be really common in the lower version of SQL. I haven’t seen that kind of thing as much, so the data purity. So that used to be a really big thing when I was doing upgrades from like 70 to 2000, and from 2000 to 2005, and maybe even a little bit on 2005 but it was a lot better then. Where you got an upgrade and you start getting all these PK violations because they didn’t do such a good job in checking primary keys and there are things just flat out like page linking problems. Same thing happens, right, two pages don’t get linked properly and therefore when you go to run a query it can’t find the data it needs and you get an error because the link that it has on the page doesn’t point to anywhere. You know, and of course they are the same problems with index pages as well. Those are clearly the easier ones to find. And then of course then your disastrous ones where you’ve got problems and pages for, and the system pages for the database itself. There are different classes of  those problems. I mean, Steve, you’re a big CheckDB guy so you know all about that kind of stuff.

Steve: Oh yeah. So I guess one of the things, I mean, when I see that oftentimes people are either not running CheckDB at all or they’re not running it frequent enough. And I guess frequent enough is one of those sort of variable terms. But, I mean, how often should people, and from your perspective, should people be checking their databases with CheckDB?

Sean: Well, you know, there’s actually a third one there. People, they don’t run it at all, they don’t run it often enough, or they run it and they don’t check the results. That’s a really big one. Well, I’m sure we’ll talk about that more later. But as far as how often should they be running it? Man, that varies so much. I mean, what does often enough mean? I would say if you’ve got a small database where you’ve got plenty of resources and really you’re not in danger of affecting a workload. Hell, run it two or three times a day. I mean, you’re not going to hurt anything, right? It’s not like re-indexing where it’s actually going to, or you could actually mess stuff up.

Steve: So then, like let’s say, though you’ve got a much larger database and it’s really not feasible to run it that frequent.

Sean: As often as you can.

Steve: I had conversations sometimes where people say, “Well, we can only run it, we can only afford to run it once a month on each database because things are just too big.” One of the things I commonly see there is that well they’re maybe running it once a month but they only have three weeks of full back ups. If they’re doing something like that they might not have back up history far enough to go back to where that last CheckDB was. How would you come back, I don’t know, how would you address that?

Sean: Are you baiting me?

Steve: No, no, those just come up in conversations. I was just saying what would you take beyond that when somebody can’t run it. I mean, it’s so infrequent that they don’t have a back up period long enough to go back in the database.

Sean: I know that. I’m just saying, I’m just asking if you’re baiting me because that’s one of the big situations that I was adamant about handling in Minion CheckDB. I wondered if you’re baiting me and to see how long I could go without saying Minion CheckDB.

Carlos: So let me jump in here I guess just for a second as the knuckle dragger Neanderthal of the panel here. We’ve talked about some of these scenarios. I don’t think we explicitly called out corruption although I think all of the scenarios that you listed could be classified as a corruption type issue. But the constraint or the reason people don’t want to run it, you talked about impact, because it has to read everything don’t we kind of get into a memory constraint issue? Because generally our databases are going to be larger than the memory that we have available so we’re going to be causing pressure because we’re going to be flushing things out of the buffer cache.

Sean: You know, usually the pressure I’ve seen for CheckDB isn’t as much memory as it is disk and some CPU running there as well depending on how much CheckDB is going to multithread of you’re on Enterprise. But I would say that the bigger issue is probably going to be something like snapshot space for your disk. I haven’t personally ever run out of memory when running a CheckDB.

Carlos: Well then what I mean by that is that the reason I can’t run it I’m going to affect my applications because now I’m flushing things out of the buffer cache that I need for my application because I’m running CheckDB.

Sean: No, because it’s going to take an internal snapshot and it really runs it against that internal snapshot. So really when we say it’s going to affect external application it’s going to be through CPU and disk. Let’s say that, one of the big scenarios especially on a really busy system with a really big database is that it’s going to run for quite some time. Let’s say it takes 5 or 6 hours to run a CheckDB which not only isn’t all that uncommon, it’s really not all that long for a really big database. And if you don’t have enough disk space then you’re going to expand that snapshot. It’s going to keep on expanding until it fills up the disk and then either it’s going to crash or it’s going to keep your database from being able to expand when it needs to and then you’re going to halt your processes and start rolling stuff back so that and some CPU area. Maybe Steve know something about this that I don’t but I just don’t see flushing the buffer as big of a deal because it’s running against that snapshot.

Carlos: Ok, well so then, I guess I need to ask another question because obviously there’s something about this process that I did not understand. When I hear snapshot, right, I generally think about I’m making, not necessarily a copy but as things change I’m going to keep the history of that, and then the database continues, and then my snapshot grows larger as more changes happen to the database.

Sean: Correct.

Carlos: This is what happens when I’m running DBCC CheckDB?

Sean: Yes. Only it’s internal snapshot that SQL itself runs. It creates a snapshot behind the scenes for you.

Carlos: So depending on the frequency of change in my database then that snapshot is going to increase and that is what’s going to cause me the disk outage that you mentioned.

Sean: Correct, and Steve was trying to say something in the background that I’m really interested on what he’s going to say.

Steve: Right, so what I was going to get at there was around the whole squeeze on memory like Carlos is talking about. The thing I’ve seen is when you’ve got a system where the database is significantly larger than what can fit in memory and queries generally being run that are doing the best they can to deal with the page life expectancy and keep, the right things are staying in the memory for them to run. Oftentimes, what I see with that is sort of if you chart the page life expectancy you’re sort of see the chart pattern where it grows and grows and grows until that CheckDB runs. And then that pushes everything out because everything is needed to be loaded in to be scanned, and the page life expectancy drops to zero when that CheckDB runs. I look at that usually as an indication to be good to add more memory to the system. However, that’s not always feasible in all environments.

Carlos: Ok. So it could affect the three biggies, right? It could affect memory, CPU and disk. But disk obviously because we’re pulling that whole entire database off of the disk and be the snapshotting process I’m getting reads and writes so that’s probably the biggest concern, the first concern potentially.

Sean: I really don’t think it pulls in the entire database at once. It probably goes on time.

Carlos: Oh sure.

Steve: No, it will bring it in page by page just as pages would be normally used. But as it’s doing that it’s pushing out something that hasn’t been used for a little bit through the normal buffer pool process. And you end up pushing a lot of pages out in that process that may have been reused had CheckDb not been run.

Sean: Sure, sure but that doesn’t necessarily mean it’s going to lead to memory pressure.

Steve: No, and I guess, I wasn’t saying that it’s going to push it to the point you’re running out of memory. I’m just saying it’s pushing to the point that you’re causing a lot more I/O because things have been pushed out of memory to make room for what’s being brought in when CheckDB gets run.

Sean: Sure.

Carlos: Yeah, exactly. I mean, so in a certain scenario so would it be fair to say in a sense you could compare that similar to almost like a restart? Because all that stuff just gets pushed out. Maybe not that extreme.

Sean: Sure, let’s go with that.

Steve: Ok, so then one of the things, I mean that comes up is a lot of people say, I can’t run DBCC CheckDB in my whole database. It’s not just good enough there. It’s not going to work. I mean, what are the main reasons you see that people say that or people are looking at they just can’t do it.

Sean: Right, so I’d say there are two big reasons I would say that I see for that. One of them is that that disk filling up that we talked about. Right, Because their databases is just too big and too busy and it won’t run in significant time that the snapshot doesn’t fill up. And that’s a really big problem. I’ve talked to lots of people who just flat out don’t run CheckDBs anymore because of that alone. And another one is that it just takes so long for it to run even if they can withstand the disk, it still just takes so long to run that you know, that’s a drain on resources, right, I mean, ultimately that is a drain on resources. And they just can’t have that resources drain for that long, or you know, they have a thing where they need to reboot now and then or something, something, something, right? They just can’t have a process that runs for literally two days.

Steve: Yeah, you got to exceed your maintenance window, right?  The shops talk about, okay, you can do all your maintenance kind of in this window, and then, always extra things, you’re taking away from the system, therefore, can’t do it.

Sean: Exactly.

Carlos: Okay, so earlier, one of the things you mentioned was that one of the problems is that people who are running CheckDB and they’re not checking the results. I know that if you click off CheckDB and then SQL Server Management Studio Window, you’re going to get results, are all going to show up red on your screen if you’re interactively running it. However, if you’ve got it running as a job what are your best options for knowing the failure or success of that CheckDB process?

Sean: Right, so you’ve got to look at the job itself, right, the job could fail itself and all this depends on how you’ve written your CheckDB routine, right? If you’re like on built in maintenance plans doesn’t it push it to the error log? I think you see those errors in the error log.

Carlos: That’s normally what I see, you run it as a job and that’s how I’m going to know that it failed, is in my job failed.

Sean: Right.

Steve: Right, and I know that whether it’s a maintenance plan or a job or just interactively running it, whenever a CheckDB hits a problem it does right to the error log.

Sean: Right, but.

Steve: There’s also going to be things that show up in your job history as a failure if it’s running as a job or a maintenance plan there.

Sean: Right, and now, I don’t know about you guys but I come from a lot of big environments where we’ve had, you know, a thousand servers, 2 thousand servers and tens and tens of thousands of databases. I don’t know about you guys but I really really want to search the log for something vaguely CheckDB related every week or every day or every something. And then not only that but then go inside of there and parse the results out and try to figure out exactly what the problem was, right? So that’s what I see most, right? The guys who have a little bit more on the ball will run it with table results flag which is better because now you can at least get the table results but you have to build on a whole process for that thing too because that whole table results thing has to be managed and it didn’t help. For the longest time table results was undocumented. Like, somebody was going to bring down the server with table result or something. I never did, I still don’t know why it was undocumented for so long. But the ones that are really on the ball we will run with table results and at least get in a table but like I said, you know, you have to build the whole process around that and it’s not just as easy as, “I’m going to run this with the table results flag and that’s it because I’ve got tens of thousands of databases now.” And to me, any process that I develop has to work against one and two thousand servers with tens of thousands of databases or it’s not going to do me any good for the five databases I have. Right, I mean, we’re being asked as DBA’s to do more, and more, and more and, you know, to spend my time going around and checking all of these things and then parsing the results. Who’s got time for that crap? You know?

Carlos: Yeah, and that’s really an interesting point there because the thing, I mean, I look at the DB, people talk about, “What’s the DBA job going to be in the future?” And I think that it’s something that’s always changing where we’re going with it. But I look at the people who maybe haven’t change in the last 15 years of SQL Server and they’re the kind of people who their job consist of everyday looking at error logs. And mainly scanning those error logs and determining are there any problems that you have to deal with. Whereas the DBAs who were evolving and going forward in the future are the ones who were building the automated processes so you don’t have to manually do that. That you can focus on real issues rather than reading error logs.

Sean: Yeah, because you know, there are two aspects. There’s discovering the problem, reading the problem, solving the problem. And so you don’t want to have to go out. You want this stuff to be as easy as possible especially with corruption. You want to know about it as soon as you possibly can and you want to know the nature of the issues as soon as you possibly can as well. I think that sending CheckDB results to the log is ridiculous. I think sending deadlock information to the log is ridiculous. The hell am I going to do with deadlock information in the log. I mean, I’ve got this XML graph kind of thing and taking up 37 lines in my log. What am I going to do with that?

Carlos: Sure. You know, it will be a little bit more proactive.

Sean: Just proactive but Microsoft needs to do a better job at making the stuff that we need to monitor accessible. I mean, one of the worst things they did in x events was make the whole thing XML driven. I was one of the worst they have done because the average DBA doesn’t know really anything at all about XML. And so it’s great that I can take that stuff and save it to a file. Now, I’ve got all this XML in a file what the hell am I suppose to do with it. Now, DBAs have to go to a 2-week course just to be able to query the XML data out of there. Thanks for that. I could at least import server side tracing profile data into a table and query it. But here you got to go through some major hoops to get it in the same format. They haven’t done a good job about taking the stuff that we absolutely have to have and making it really accessible to people that are managing on a large scale.

Carlos: So Steve, I’m sensing that we should have asked our SQL Family question little sooner.

Steve: Yeah, that’s good point. We’ll come back to that at the end then.

Carlos: That’s right.

Steve: Yup, so then I guess back to the question around, “Are we baiting you?”, and that’s the time we might start baiting you on. I mean, now that we’ve talked about CheckDB and some of the pros and cons and difficulties and challenges. You’ve come out something here with your Minion CheckDB.

Carlos: Is there something better that we can run?

Sean: Yeah, I like to think it something better. I targeted Minion CheckDB for those large shops with DBAs that are super busy they don’t have time for searching through the logs or for building their own process with table results, and also for the super large databases where you don’t have time to run CheckDB, you can’t run CheckDB because your database is too big. So it was specifically designed with those guys in mind.

Carlos: Well, so now is it different from DBCC CheckDB? Or is it using the same command just kind of slicing it up and giving us more options.

Sean: Yeah, no, it’s all about the process. When the rubber hits the road I’m still running CheckDB and CheckTable but it’s all about the process behind it. For instance, let’s talk about one of the things that I’m particularly proud of process was. We’ll talk about the large database that Steve mentioned where they just don’t run CheckDB because their database is too big. So one of the things you can do with Minion is you can set it up to do a rotating CheckTable instead. So you can say, I want to do 5 CheckTables a night or 500 CheckTables a night until all the tables are done and then start over again. So you’re doing all of the objects. Because what is a CheckDB really? Is it a CheckDB or is it the equivalent operations?

Carlos: Right, all the pieces underneath.

Steve: Although it’s more than just CheckTable and all the tables.

Sean: Of course, it is.

Steve: I think CheckTable and all the tables get’s you started on a big amount of the areas where you’re most likely to have issues.

Sean: Yeah, but you can also run the other smaller operations like as well. But if you want to check your objects and you just can’t because it takes three days then do a hundred at a time, 500 at a time.

Carlos: So how do you get around that? I’ll use smaller numbers so I can wrap my head around it. So I have 10 tables and I can only do 5 at a time. So my maintenance window let’s just say is I can do 5 tables once a week. So it’s going to take me two weekends to get through all ten tables.

Sean: Yes.

Carlos: So when the change happens, so do 1 through 5, weekend number #1. Weekend #2 I’m doing 6 through 10 but corruption happens in Table #1. I’m not going to find that until the next day. I guess going back to Steve’s point, I just take that into consideration and make sure that my backups exceed the time that I can get the total database done.

Sean: Correct.

Steve: Go ahead Sean.

Sean: Another scenario that I do with that is I allow you to automatic remote CheckDBs. So you can set it up to every single night. It will take the latest backup and restore it to a remote box and run a CheckDB on it and send the results back to the prod box.

Carlos: Got you, which is will be awesome particularly maybe not if you have thousands but if you’re in the 20s, 50s, maybe of databases would be a nice way to do it.

Sean: Well, you don’t do that for every database. I mean, every database isn’t that big. But you do it on the ones that can’t do and the other ones you do local. And, you know, so I give you three or four options. Three options I think for restoring your backup remotely and I give you a number of options for the type of remote restore or the type of remote job you want to do. I mean, it’s incredibly configurable with just three, or four, or five columns worth of data in a settings table and then it just goes. I’ve got people doing it and they set it up and once they got the permissions work out and everything that was the biggest thing, right, it was the permissions between the two servers. Once they got that worked out they found that the process itself just runs.

Steve: So then Carlos, going back to your example of small 10 tables but you can only do half of them each week. And I think around that scenario if you’re only able to do half of them per week because that’s your maintenance window. If you were doing full CheckDB previously I think the likelihood of be you never going to actually do CheckDB so even if you’re only finding out a week behind or two weeks behind that something has been encountered that’s a problem, corruption for instance. It’s better to know about it within two weeks rather than never knowing about it.

Carlos: Or when it usually tells you, “Hey, I’m getting this error.”

Steve: Right, right, and you found out you had corruption 18 months ago and it just sort of crept through that amount of time.

Sean: Well, the cool thing is you can even set it up to run your important tables every night. So you could say, I want to run these guys on a schedule but I also want to run my really important tables every single day because these are the guys that I have to have, right? And it’s really easy to exclude tables or exclude schemas or something like that so you can really target it. And really importantly you can order the databases and order the tables as well. So in your maintenance window even if you’re doing all of your tables at once which is another scenario, right? We’ve got the concept of what I call, what the hell do I call it, dynamic snapshots, so getting around that problem of your snapshot file filling up and just killing your CheckDB operation. Well, you can put your on snapshot on there so can create your own disk and put your own snapshot in there but if you don’t have a disk laying around that is big enough, you can switch to CheckTables. But even if you’re doing your own snapshot and CheckTables you’re still going to fill that up. So with dynamic snapshots, what we do is we allow you to say, you know what, every 30 minutes I want you to drop the snapshot and create a new one, and then continue doing the CheckTables.  So if you’re going to do a thousand tables and it’s going to take you say 7 hours to get through it. Every half hour it’s going to drop that snapshot assuming that it’s after an operation or if one table takes an hour and a half there is nothing I can do about that. if it’s over that half our it will drop that snapshot, create a new one, and then carry on with the rest of the tables and that’s going to keep you from filling up that snapshot space, that disk drive. That’s another scenario that we threw in there for this big database problem, right.

Steve: So one of the scenarios that I’ve come across a couple of times that has been challenging has been when you have a larger database that takes a while that run CheckDB against it, and so you’re trying to do CheckTables or other options. But then that one large database, although it’s got a few tables in it there’s one table that really takes up like that 95% of the entire database. So then you run CheckTable against that one really big table and it ends up taking almost as long as CheckDB would have taken to begin with. So are there any options or anything that you see of how you would deal with that differently? And then perhaps run it on another server?

Sean:  Well, in that case, I mean how important is that table?

Steve: Well, usually when it’s that size it’s something that’s pretty darn important and it’s just that big because some software along the way wasn’t architected to correctly warehouse things or break it out into another tables.

Sean: Right, well I mean it could just be an archive table that’s sitting there and it doesn’t get data but once a day, or once a week, or once a month, or something. It’s not really that important of a table functionally. The data just needs to be there so there are two different scenarios that you’re talking about. You’re talking about the archive table that’s just sitting there getting bigger and bloating your CheckDBs. And the one that is an actually active table that they just refused to archive and, you know, we’ve all seen it, right? So in that first scenario it’s really just to exclude that table from your CheckTable process and only run it let’s say once a month or once every after two months or something. I mean, that’s really easy to do. The second one is a little bit harder because that is an active table and so you’ve got a couple of things you can do there. You could put it in its own file group and run CheckFileGorup. I like that solution a lot less because I don’t think you really gain anything from that. But yeah, you would either run the CheckDB remotely or you could as well run CheckTable against all of the other tables every day and only save that one for like the weekend or one a month or something like that if it’s going to take a really long time then do that one by itself. Or since it is 95% then run a full CheckDB or full Checktable process on the weekend or once every two weeks or something. But for your dailies, or even better yet you can give it, I want to say a perfunctory. But you could run it with physical only more often than doing a full data check. Just make sure that physical only will just make sure that all the page structures and everything are proper. It won’t bother checking any of the other data types stuff. So it’s a much faster check because it’s a lesser check. But at least you’ll know that the tables themselves have integrity and not necessarily the data. And I’ve got somebody doing that and by the way that’s really easy to do. And Minion is to say on Monday through Friday or on Weekdays I want to run this with physical only, on Saturday I want to run a full CheckTable on it.

Steve: Yup, ok. So another area that’s come up in the bait here and there as I’ve done things with the database corruption and what not is around doing DBCC CheckDB on TempDB. And that’s one that I’ve seen a lot of really intense arguments or one or the other whether you should do it or whether you should not do it. And I’m really just curious of what your thoughts are on checking TempDB.

Sean: Of course you should do it. You know, TempDB is central to everything that you do. I mean so much stuff uses TempDB these days. And if you’re pushing all of your Temp tables through there and a lot of your spill results and you’ve got corruption in there or more importantly you’ve got it on a disk that is causing corruption in there. And you’re going to now rely on the results from these operations. You need to know if TempDB is corrupt. I mean, what you do about it is for debate depending on what’s going to happen. I mean, of course, easiest thing or the only thing that you can really do is delete the files and restart SQL. You’ve got no choice, you can’t really run a pair on TempDB. But yeah, I think because that the data isn’t persistent it doesn’t mean that it can’t be creating corrupt results and your entire environment and results and what not that are coming through.

Steve: Perfect. And although I tried to ask that question in a very middle of the road state of trying to not waste my opinion one way or the other. I completely agree with your answer there.

Sean: You did a good job.

Steve: And I think that next time I’m in an argument over this with someone I’ll say refer back to Episode 90 of the sqldatapartners podcast and there’s a brief discussion on that.    

Sean: You can tell them I said so, that’s right.

Carlos: I of course agree with you too Steve if that matters but I’m not sure I’ll get a vote.

Sean: You will get a vote we just may not count it.

Steve: Alright, so then I guess back on sort of the topic of the Minion CheckDB side of things. I mean, there’s a lot of people running CheckDB, there’s a lot of people running their own custom CheckDB scripts, there’s people that are running the Ola Hallengren’s database integrity script. So really what I’m trying to understand is who out there is going to get the most value out of the Minion CheckDB product versus one of these other ways of doing it.          

Sean: Dude, I hate to sound like that way but absolutely everybody. The amount of logging that we’ve thrown into this product is obscene. And I don’t mean obscene in a figurative. I mean it is obscene.

Carlos: So you mentioned some of that, so kind of downplayed it a little bit the role or the usefulness of going to the error log. So when you talk about reporting what does that mean? I’m getting an email message, there’s a dashboard, what does that mean?

Sean: So, it means that we store everything in log tables instead of in files or in the log. So we make it really easy to get to.

Carlos: So you can then query it. I mean, is there built in processing that’s going to help notify me like, “Hey, your biggest table has corruption in it.”

Sean: So we don’t get into the alerting business at this level. We save that for our Enterprise product. Because I don’t know how you want to alert. But I’ve made it as simple as possible. I mean, there’s even a flag that says to fail the job because we handle all the errors the job won’t fail so you have to tell us specifically I want to fail the job. Some people don’t want the job to fail. Some people have a separate process that go out there so there are network monitors that can consume SQL queries. Or maybe you want to add a separate step to the job that queries the table and says, “Email me if any errors occurred.” But the point is that we give the errors in a very consumable state. We even tell you how many allocation and consistency errors. We tell you how long it took. We tell you what the last status was. We tell you whether it was a remote job or not. We tell you what server it ran on. We give you an obscene amount of information. There are probably 40 or 50 columns in an hour log details table, logs for absolutely everything. And I would say that’s the biggest advantage that we have over everybody else is again in that process, in the processing of the information and the jobs because we even allow you to do things like, let’s say, that you’ve got even, you know, that you’re using Ola and you’ve got 300 database on a server. And they’re all relatively medium size, right? You have to do all those serially. Or you have to create different jobs and then put them in, 10 or 20 at a time in the parameters.

Carlos: Do the leg work to carve it up yourself.

Sean: To split that up so that you can run those concurrently. You can just, I mean, Minion is already multithreaded so you just say run 10 at a time and it will split them up and it will run 10 at a time.

Carlos: That’s bags the question, when do you start seeing, were you introducing problems by doing so many at the same time. Have you even seen any?

Sean: Like resource wise?

Carlos: Yeah. When is too many too many?

Sean: That’s going to depend on your server and everything involved.

Carlos: That’s fair, too shady.

Sean: I gave you a fully automatic machine gun. What you choose to shoot with it is your problem. One thing I despise is tools that try to be smarter than me. I may not ever use a multithreaded process because this one big server won’t support it most of the time. But then there’s one time when we’re down for our quarterly maintenance and they say, “You’ve got 2 hours to do anything you need while we do all of these other stuffs in our app boxes that hit the SQL box.” And I’ll say, “Ok, I’m going to run 5 CheckDBs at once and get them all done in 2 hours.” You know, because I’ve got the box to myself. You may not use that normally but you may use it three or four times a year when something like that comes up. And you know there’s just so much other stuff in there, right? You can do the same thing with that same rotation we talked about with CheckTables. You can do a set of tables and then do another set of tables and then do another set of tables. You can do CheckDBs that way too so you could rotate those 300 databases; 10 a week or 10 a night. Or you could say you can also run it base off of time. You can run CheckDBs for an hour a night and then just pick up where you left off with the next database tomorrow so it will just roll through all those guys on a timed basis as well.

Carlos: In that scenario is it keeping a snapshot. I’m getting a new snapshot, right? If it has to break the database up so I guess one hour a night, my process takes two hours, which means I need two nights to get through the whole database. I’m getting a new snapshot, right?

Sean: It depends.

Carlos: Oh, ok.

Sean: If you’re running time to CheckDBs which only work off of a rotation which makes sense of think about it then it’s only going to run the CheckDBs that it can do in that time. So you can’t CheckDB half a database.

Carlos: Got you, ok got you.

Sean: But if you’re running CheckTables then it will run through as many of the tables in an hour as it can. Then in the snapshot settings if you’re creating your own snapshot that is, right. If it’s an internal snapshot you have no say there. But If you’re creating your own snapshot then you can tell it to leave the snapshot or to delete the snapshot when it’s done, and then you’ll just get another one tomorrow so you’ve got options there.

Carlos: Another question I had and this goes to something. I want to say it was Grant Fritchey said it but it’s giving me a little bit of heartburn and that is when you have corruption his comment was, “You want to find out why it happened.” And this kind of goes back into their TempDB discussion and you mentioned the disks. So my question is, how do you normally go about identifying why corruption happens.

Sean: I would say that depends. There are a number of things that can cause corruption. Almost in my experience, most of the time it’s going to be a spinning disk. But it can be cause by you copying a database over the wire and it gets corrupted and the bits get corrupted over the wire. Or you’ve got bad memory and the memory is starting to go out and that will do it. I would say disk and memory are the top 2 but I see disk pretty more often.

Carlos: I think I’ve also seen one scenario, I guess, in that a client called me up. They were having some corruption and they’re trying to do like a big bulk import. I don’t know exactly what happened but it sounded like somebody just pushed a button and powered down, you know, hard stop the machine. Something like that caused the problem as well.

Sean: Right, and that can usually be boiled down to probably something being written incorrectly to the disk.

Carlos: Right, so some kind of transaction error or something.

Sean: Right. So to answer your question directly, I kind of preface it there. So to answer your question directly usually if I find like a corrupt table I’ll go on and fix it, and just kind of move on but watch it. Because things happen, stuff happens. But if I start seeing another corruption within a reasonable time say another month, maybe two then I’ll say, “Ok, we need to figure out because we’ve gotten two corruptions inside of a few weeks. And this is just too often.” So I’ll have one of the guys run a disk check and then I’ll have one of them run a memory check. You need to specifically check the disk, and specifically check the memory to make sure that they’re ok. Because you’re right, you do need to know where it is coming from. And I want to say most of the time that’s going to be disk. The thing is I’ve had them come back before and say, “Well, the disk came back clean.” I’m like, “Well, that’s great but you’ve corrupted three tables this week.” I had this conversation with somebody six months ago. “But you’ve corrupted three tables this week. The disk is not ok.” We just ran a full disk check. Well, either run it again or run a different check because you’re not ok. And after going back a few rounds with them they would get the vendor who would either show them a different way to do it or give them a different tool, and bam, the bad sector show up.

Carlos: Sure. There are some flag or some option that they didn’t have before that.

Sean: Check the controller. It could be the controller as well. But something there in the disk process, in the disk write process is not right. And if you stick to your guns, I’m saying this to the DBAs who are going to be listening, right. In this case, you got to stick to your guns. Data doesn’t just corrupt on its own. It just doesn’t get tired so you need to stick to your guns and don’t let them come back and say, “Oh well, we didn’t find anything.” Because you know what, I have found quite often that when they say they do something. Like these guys are network guys, “Oh well, we didn’t find anything.” They actually didn’t look.

Steve: That brings up an interesting story. I mean, I work for a client years ago where. Yeah, there was no reason that they could that they were getting corruption and it just happened. And after looking into it a little bit more, well, it turned out that it just happened because the SQL Server was running a desk without any kind of UPS or back up or anything like that. And every time there was a lightning storm, or a power outage or any kind of problem, it would basically just completely power the whole thing off. And surprise, surprise, you have magically a corruption at that point.

Sean: I had that same problem. One of my first DBA gigs, true DBA gigs was at Pilgrim’s Pride and they had SQL boxes and these were 65 boxes, but still. They had SQL boxes in the kill rooms and so it was just a SQL box sitting on a milk crate with a piece of plastic over it covered in splatters of blood.

Carlos: What?

Sean: And I was just like you’ve got to be kidding me.

Carlos: When you’re talking about kill rooms you’re talking about they’re killing turkeys in this.

Sean: Pilgrim’s Pride, well they process chicken.      

Carlos: Chicken, oh my God.

Steve: Oh boy.

Sean: Enterprise wide they process back then about 600,000 chickens an hour, 24/7.

Steve: Oh my gosh! So the server was in the same room.

Sean: The server was in the same room because they have to weigh the chicken at every stage so they want to know what’s the weight of feathers is, and what’s the weight of the guts is, they shift all that stuff out, right? The weight of the feed is, they got to get all that stuff out of this so they got to get the weight and find out exactly what’s what. Yeah, you’ve got to have a SQL box right there by the scales.

Carlos: There you go. So because I feel like I’m going to be left out if I don’t tell a story. Now, I can’t tell you directly back to corruption however being from Richmond, Virginia. Philip Morris, this is the headquarters for tobacco industry and it used to be, I think they’ve finally changed this policy because they were pro tobacco people you could smoke anywhere even in the server rooms. They had pretty good ventilation systems supposedly but that was always the fun story that people would tell is that, “You could walk anywhere and smoke and whatever tobacco product you could use it anywhere in the building.”

Sean: How gross? So to go back to your question a little bit we kind of got sidetrack a little bit and it’s just the nature of how I talk. I would say back to who would benefit from using Minion. You know, it’s out of the gate when you install it. It installs what the job and it installs with schedule. So if you’re not a very good DBA, you just hit the installer and it does everything and it will be up and running and you’ll have CheckDBs. But it’s also configurable for like the world’s greatest masters to come in there and tweak all the bells and whistles and get absolutely everything out of it they want. And one of the big advantages you get especially with all the logging is you get a history at your fingertips of all of the corruption you’ve ever had on that server in that database on that table. And you can say, “Has this ever happen in this table before?” And you can back and look. It gives you a huge advantage of something that you don’t get elsewhere. If you run the query to look at the last time the database was CheckDBed, SQL stores that information in a page. But it doesn’t store anything about CheckTables. So if you’ve got a database where you’re even rotating CheckTables every month and it takes you like 2 weeks or 4 weeks to get through all the tables. That database will show that it has never been CheckDBed eventhough it has. So having all of these here you could easily query, “Oh well, these tables were done here.” And you know when all of the tables have been done you can count that as a CheckDB.  So you could write and SP that tells you the last time it had a full CheckDB which was did all the tables get done last month then this was the last time the tables got done. SQL won’t ever know it but Minion will and we’re going to do more for that as we go through the versions. I mean, this is just version 1 and it was a very ambitious version. I mean, we’ve got so many features in there. It’s ridiculous. One of the things I like about it the most is how, I’d have to say something as corny as we crossed the finished line on a lot of these features. But say you’re doing a remote CheckDB and you’re restoring last night’s backup. On the prod box, I’ll show you in the log, in the status column that it’s restoring to that remote server and I’ll even give you the preset complete of the restore. And as it’s doing the CheckDB on that remote server I’ll even give you the preset complete of that CheckDB on the remote server here. So if you have 5 databases on Prod 1 then they are all being remote CheckDBed to five different servers. You will see the status and get the results back on prod one. You don’t have to go around to all those different servers and check all that stuff manually. You get it back on Prod where it belongs because that’s not a CheckDB of Dev 1 or QA 1. It’s a CheckDB of the Prod 1 database. It’s just being run on over there. So the results belong back on Prod 1. And when I want to see status that status belongs back on Prod 1. I shouldn’t have to go all over BFE to find this information. I think we’ve crossed the finish line really well on some of that stuff like that. We really made it as user friendly and as complete as we possibly can.

Steve: Ok, I think that’s a very in-depth answer to the question I asked but definitely good stuff there. 

Sean: You know I never shut up about my product. You bought that. Think I was a snake when you picked me up?

Steve: I think that wraps it up for all the questions we have to cover. Anything else you want to hit on before we move in to the SQL Family questions?

Sean: No, I better not. I’ll just go and go and go. I’m still so incredibly in love with CheckDB. Well, I’m in love with two things. I’m in love with CheckDB as a product and I’m in love with not having to be developing it anymore. It took me so long to write this and I’m just thrilled to be working on something else.

Carlos: I thought it was available earlier, and you mentioned like shipping it out like last week or.

Sean: February 1.

Carlos: February 1, yeah, so was that actually the first release?

Sean: Yes.

Carlos: Oh, got you. Ok, I guess I’ve just been seeing the stickers or whatever previous to it.

Sean: Yeah, we tease it a little bit. We had the stickers printed a long time ago because we know it was going to be released. I just didn’t know. It took me like a year, a year and a half to finish it, something like that. I mean, you know when I first started it was like, “Hey, CheckDB. This will be easy because there’s not nearly enough in there. There isn’t really much in there as there was in backup. What could possibly go wrong?” But then when you start adding multithreading and mind you this is T-SQL multithreading. I mean, I’m not breaking out in the .NET or Power Shell or anything. This is all done in T-SQL. So we’re doing multithreading, and then the rotations, and then the remote stuff and it adds up. I way over complicate everything so.

Carlos: Well, I heard an interesting thought and talking with one of the Microsoft guys, and all this rapid release things they talked about. And his comment was, “Releasing is the feature.”

Sean: Yeah, I like that.

Carlos: I’m like, you know, yeah because you just get kind of a trap, “Oh, it’s not good enough, it’s not…” You know what, let’s just release and will fix it later.

Steve: Alright, so let’s go on to the SQL Family questions then.

Sean: Do it!

Steve: So how did you first get started with SQL Server?

Sean: Man, I’ll give you a shorter version of the story as I can. So I was a French chef for many years and followed a French chef all around the world, and the latest bistro that we were in closed.

Carlos: This is in France or this is in the States?          

Sean: This was in the States. We were back here at that point. And I was like, “Man, I don’t know what I’m going to do. I’m not really looking forward to moving again.” Because chefs have to move around if you want to get any good experience. And I was kind of tired of moving around and I didn’t want to end up in a subpar hotel or something opening cans on the line. So one of the waiters said, “Well, you know, my sister works for Stream. And I was like, “Ok, great, a field magazine. Great.” And he goes, “No, no, they do tech support.” I was like, “What the hell do I know about tech support? I’m a chef.” And he says, “You know, they’ll train you.” And I was like, “Oh, really?” So I had my own computer and ended up knowing a little bit more that I thought. And being in French kitchens, and being who I am a linguist and all that, I ended up working for HP on the French line. So I was French tech support for Windows for HP. And so I stayed there for a while and then left there and went to this young start up called Health Web and it was a SQL based product for health care where you could put all your patient information up online. And they were like one of the first ones that did this and now it’s everywhere. But they were one of the first ones who did this and I was their tech support. Well, the application was written really well because the guy really knew what he was doing. And so most of the problems that we had ended up being SQL related and this was like SQL 60 and 65. We had one customer at that time. They were hiring me ahead of time and so I had nothing to do. And there was this whole wall of shelves in the back room with all of these computer parts on there. So I built 4 or 5 boxes and put them under my desk and network them and just started learning SQL, and IIS, and Windows. And I didn’t really know that much, right, but I had a guy there who was willing to take the time to show me some stuff. He really mentored me. So I just sat there all day long with the SQL training kits, you know, the official training kits they give for the cert exams back then, and just went through those things and work all the examples and did them different ways. And every day when I came in I would break all my boxes down. All the way down from Windows, and reinstall Windows, reinstall SQL, create databases, load them with data, do their exercises and do the same thing tomorrow because repetition is king. I worked there for like a year and a half or so and every problem we ever had was always SQL related. It was never the application. So they got sold and they offered me a DBA job in Denver. And I was like, “I just bought a house 2 weeks ago. I’m not moving in Denver. I just bought a house. You could have told me that you were selling the company 2 weeks ago.” But it did kind of thought in my head because I had then looking for, you know, when we find out about the sell we have been looking for help desk gigs because I was help desk guy. That’s what I was. But when they came a couple of weeks later and said, “You could move to Denver and be a DBA.” I can’t move but then I was like, “Hey, why don’t I start looking for DBA gigs here.” And I ended up getting hired on at Pilgrim’s and being a helpdesk guy with a lot of lab time under my belt and just a little bit of actual and no real production trouble shooting, no real production database running just running it as a help desk guy answering a question every now and then and I was really pretty green if you ask me. I mean, looking back on it I was a new.

Carlos: We are all were.

Sean: We were all dumb in the mid 90s, right? But he hired me on the spot; he said that I was the best candidate they had ever seen. And I was like, “Really, me?” I guess because I read the book. And the rest is history, I kind of fell into databases. I wasn’t going to be a DBA. I wasn’t even going to be in computers.  I was going to be a chef my whole life.

Steve: Very interesting, French chef to SQL DBA.

Sean: Right, so that’s my story.

Carlos: Now, we talked a little bit about this earlier and perhaps we can revisit it. If you could change one thing about SQL Server what would it be?

Sean: Oh jeez, man, I would say the thing that’s most important to me most of the time is they don’t make information as easy to get at as they think they do or as they’d like even when they do something to help somebody, like the new clone database thing. If you ask me that kind of thing, that thing was half asked. I don’t like the way they do most of the stuff. But anyway, I would say make the information easier to get at. Like there is no reason whatsoever why last CheckDB date shouldn’t be in sys databases. There is no reason why you can’t have table properties and say the last CheckTable date or something like that. Some of this information is just so incredibly hard to get at. Why is it so hard to get table size? Why is it so hard to find some of this information? Even the last time an SP was updated or the last time a table was written to. They don’t make the stuff easy. You got to jump through some major hoops to get some of this stuff. So why isn’t this stuff easier to get at? Why do we have to jump through so many hoops or basic information about the database, right? And I think CheckDB information being the theme here is the best example of that. Why do I have to do this ridiculous DBCC page thing and parse and all of that crap to find out when the last time a database was CheckDBed. You know when it was. Put it in the sys databases and move on. Another one is why is it so hard to find out the last time a database was used. The last time somebody used the database or use a table or something like that to find out if the database is even still in use. SQL should be automatically keeping the list of all the applications that hit it so that you could easily query which applications have hit it and from what server? That stuff shouldn’t be a process that I have to write, SQL should just monitor that stuff and keep a history of it for you.

Steve: Yeah, definitely agree on that one.

Sean: And it should automatically have a firewall. You should be automatically be able to say. I only want these applications to be able to query SQL. Be able to lock anybody who’s not DBA out of every single access except through the intended application. I shouldn’t have to buy a third party product or write something really thick to do that for me. You should just have a list of applications that have connected. Let’s say, click, click, click, add one, I want you to do this, I want you to add this one to it and then that’s it. For anybody who’s not a DBA, DBAs can do whatever they want, but don’t let anybody hit it with Excel. Don’t let anybody hit it with Crystal Reports. That’s what the other boxes for. This one is for this application and that’s it.

Steve: Alright, so we’re into our last SQL Family question.

Sean: So you get me riled up about things.

Steve: We’re not trying very hard either.

Sean: I know, right, it’s so easy.

Steve: Yup. If you could have one superhero power what would it be and why would you want it?

Sean: Oh, God, what did I say last time, Carlos? Do you remember?

Carlos: I’ve had to look it up.

Sean: You know, right now, I’m going to say I wish. It’s split between two. And I really can’t decide which one. I would either want to be able to read minds.

Carlos: Oh yes, I’m pretty sure that’s what you chose all the dirty laundry that comes along with it.

Sean: Or I would want to be able to see details by the way. Not just as vague BS that you see in movies but details into the reasonable future. I don’t want to see vague shapes and colors and find out what that means as I get there, that’s ridiculous. I want to be able to see a day or a weekend to the future and know what’s going to happen and be able to be prepared. Imagine in our field how cool that would be, to know that a problem is coming and have seen the fix already and to be able to do something about it. You’d be a superstar in every sense of the world.

Carlos: Awesome, very rich stock picker as well. 

Sean: Yes exactly, that hadn’t crossed my mind.

Carlos: Warren Buffet it your heart out.

Sean: I know right.

Carlos: Well, awesome. Sean thanks so much for coming on the program today. It has been a blast as always.

Steve: Definitely been fun. Thanks, Sean.

Sean: I love this topic. I’d like to see you guys again for re-index. I’m starting now. And we didn’t even get to talk about the new stuff in Minion back up but we’ll save that for another one.

Carlos: There you go. You guys got another 45 episodes or so.

Sean: Yeah, exactly. You can’t make this the Sean show. Thanks a lot guys!

Steve: Alright, thank you.

Carlos: Thanks again, Sean.

Episode 88: SharePoint

Why can’t we just get along? There can be a real love/hate relationship between those of us with data responsibilities and those who are responsible for SharePoint environments.  We want to provide good service to those environments, but there are SO MANY things we can’t do on a SharePoint database.  To see how the other half lives, we invited Todd Klindt onto the program to give us his thoughts on how we can all get along a bit better.

Because Todd suggests the “SQL and SharePoint communities should understand each other a little better”, we will try to tackle some of the obstacles the teams face so you can be better armed to support your SharePoint team.

 Episode Quotes

“SharePoint is kind of like that pretty girl in High School…”
“A SQL alias is like fire or gunpowder. I have the respect for it. It’s a good tool but don’t play with it.”
“Bad things happen, stuffs is unfair, and how you handle it is the key.”

Listen to Learn

  • Why the SQL Server and SharePoint teams can’t seem to get along
  • Dos and Don’ts with SharePoint Server
  • How you can still save the day
  • The vocabulary you might use to approach your SharePoint team
  • What an interesting sense of humor Todd has

Todd on Twitter
Todd’s Blog
SQL Server best practices for SharePoint
Todd’s SharePoint Netcast

SQL in the news
Mongo API for DocumentDB
Microsoft Data AMP Conference

About Todd Klindt

Todd KlindtTodd has been a professional computer nerd for over 20 years, specializing in SharePoint for the last 14 years.  His love affair with SharePoint began one slow day at the office when he discovered SharePoint Team Services 1.0 on the Office XP CD that was holding up his coffee cup, and decided to install it. The rest is history. In 2006 he was honored to be awarded the MVP award from Microsoft for Windows SharePoint Services. He has had the pleasure of working with SharePoint farms both large small. He has written several books and magazine articles on SharePoint. Todd has presented sessions on SharePoint at many major conferences both in the United States as well as Europe and Asia and does the user group circuit, SharePoint Saturday events, and the occasional children’s birthday party as well.

Transcription: SharePoint

Carlos: So Todd, welcome to the program.

Todd: Thanks for having me, Carlos.

Carlos: Yes, it’s a pleasure having you here and when we reached out and chatted with you I knew we were going to have very interesting conversation. Ultimately our conversation is about SharePoint today. But your response was, and we wanted to talk about the interactions between SharePoint and SQL Server. And your response was, “Yes, I’d be happy to talk with you about the database lucky enough to host the SharePoint application”.

Todd: Yes, there are many database platforms out there but SQL is fortunate to be the one that gets to support SharePoint. It’s an honor above all honor. I hope SQL appreciates it.

Steve: Now, one thing you’ve said is that you’ve taught some classes in the past around this and around SQL for SharePoint admins and around the opposite of SharePoint for SQL DBAs.

Todd: Yeah.

Steve: And I think that’s one that would be very exciting to our listeners in that a lot of DBAs may not have the SharePoint experience.

Todd: Yeah, so the way that works is we’ve all heard the phrase, “In the land of the blind the one eyed man is king.” And so when it comes to SharePoint DBAs, not SharePoint DBAs, SharePoint Administrators, they need to know a bunch of things. They need to know SharePoint obviously but they also need to know IIS, PowerShell, and they need to know DIS and SQL. And all of those things SQL is the most complicated, the least forgiving and just the most easiest to screw up. And so I came in with a little bit of SQL knowledge. Just a little bit more than the x guy and so I became one of this people. When speaking to TechEd and things like that I would speak about, you know, for SharePoint Administrators the SQL part. And I would cover just the dumb, the stuff that’s just embarrassing for SQL folks but the SharePoint folks didn’t know it. And so I had somebody, I can’t remember what the event was like, “You know, we’ve got some SQL folks they would like it the other way around.” And I’m like, “Absolutely, this is easy.” And so I kind of twisted it on its head and what really it ended up being was almost like a counseling session for SQL DBAs because I was saying all these things that SharePoint was doing that they hated. And you know, like the ever popular using as a primary key. SQL DBAs love that. So it end up being, here’s all the ways that SharePoint is going to try to destroy your SQL Server and all the things that it doesn’t go along. When I would teach the class to SharePoint Administrators I was almost like a god because, you know, because I just knew all these crazy things. Like I knew when a transaction log was and how to keep them from filling up my hard drives. To the SQL guys that was just bad news. I was just like, “Here is another thing that SharePoint is going to that you’re going to hate. And here is another thing.” I think I already did that one once. I couldn’t take it. 

Carlos: When they got the pitch forks out that’s when you knew.

Todd: Yeah, I knew, by halfway through the session they turned on me. That’s the moment of pitch forks. Yeah, it was horrible.

Carlos: So ultimately I think we’re going to get to a little bit of that, right? We want to talk about. I have some questions about why SharePoint does the way the things it does and maybe you can help shed some light there. I guess another opinion and then of course the flip side is we would like to get some feedback on what the SharePoint Admins wished we were doing a little bit better there.

Todd: I have a long list of grievances. I think I can represent all the SharePoint Administrators when I tell you DBAs these things that you’re doing wrong. So I hope somebody got some pen and paper, yeah. I’ve got all that. So I’m happy to obliged you.

Steve: Well, it’s great to know you can represent all of the SharePoint Admins from that perspective.

Todd: We are one large homogenous group, yes. They elected me as their spokesperson so very likely yeah.

Carlos: Ok, so the first thing I guess just to talk about, I mean is that, so we’re talking about SharePoint, we’re talking about a couple of different databases and logging is probably the one that is going to get the most attention only because it’s so easy for that thing to get so big so quickly.

Todd: Yeah, there are a lot of ways that SharePoint Administrators screw up SQL. I mean there is just — Like if you guys had a list of all the ways that you can screw SQL up, SharePoint Administrators have done them all and they’ve invented a couple of new ones, and the logging DB is one of them. And the logging DB came from a good place. But the underlying SQL structure for SharePoint is incredibly rigid and incredibly fragile. And so from Day 1 when SharePoint set on top of SQL. Microsoft said, “Keep your hands out of SQL. Don’t write to those databases. Don’t read from those databases. Don’t even look at those databases. Pretend they are not there.” And that was because there is not a lot of referential integrity and so SharePoint assumes the things in certain way inside of SQL. And so the administrators that’s great but there’s a really good information in there that I would like to have. It’s only place that exist and you yell at me if I read it from there so let’s figure something out. So Microsoft, I think in SharePoint 2010 introduce this logging database where they just for everything that anybody would ever want in that database and they said, “Go wild. Read from it, write to it, delete it, add columns, whatever horrible things you could think of. You can do it to this database and this database only.” So that thing can get huge and depending on what kind of recovery model you’re using, the database can get big, the logs can get big. And most SharePoint Administrators if they installed SQL once they do that they forget about it. And so they don’t think to model those drives and look at the logs and all that. That database has taken down many SharePoint forum because it fills up all the drives and just takes everything down. So you’re absolutely right, that’s one that the DBAs need to keep track of. That’s one point that SharePoint Administrator should keep track too but they don’t.

Carlos: Sure.

Steve: So are there like standards or best practices out there for how to go about cleaning that logging database up or purging all data out of it or anything?

Todd: I’m not sure if there are any official ones. Microsoft stands on it is you can do whatever you want. So you can delete it, rename it do whatever things that you want to do to it. You can because they don’t read for anything so they don’t care.

Carlos: Sure. Correct me if I’m wrong but I believe that the default is that there is no expiration or there is no kind of archiving there. So going in and setting that would be like a good first step. Alright.

Todd: Yeah and that’s one of those things that Microsoft, at least the SharePoint team has done is anytime, no I can’t think of an exception to this but I’m sure there are. But anytime there is a fork in the road and one answer means that something might get removed and the other answer is it won’t. They always go to the do no harm so things like, you know, it will be great if we archive after 30 days because that will save disk space and all that. But they’re like, “No I don’t want to delete anything.” You’ll never know when someone is going to need something 31 days ago. You know, when the most important spreadsheet ever got deleted or something. So they let that stuff run wild. So I think, yeah that’s an important first step to figuring out. You know, if you’ve got a data warehouse somewhere you’re going to keep these things in. Four of you you’re going to setup your policies the same you can’t look back any further than 60 days or 90 days. That’s definitely something that SharePoint administrators should figure out or handle that database.

Carlos: And shame on us, you know, I guess if we’re a database administrator there for having those drives fill up. When you think it’s part of the core of what the SQL Server database administrators responsibilities are beyond backups is kind of keeping the system up and knowing how much drive space you have I think is king of critical to that.

Todd: I couldn’t agree more, shame on you DBAs. This is not the SharePoint administrator’s fault whatsoever. We are the collateral damage. No, I think a part of that comes from, you know DBAs have an idea of what their roles and you cover some of the basic stuff back-up, and monitoring and thing like that. But optimization, you know, figuring out the way to get the best performance out of the databases and those kinds of things. That’s just something that every DBA does and they go in every situation expecting to provide that service and the SharePoint guys come along. It’s the longest time SharePoint was the sexy beast out there. The SharePoint guys walk in. They had a swagger about them. It was just obnoxious and I say that as a SharePoint guy that has that swagger about it. And so we come in we’re like, “SQL guys, that’s adorable do you think you could help but don’t touch anything. Don’t look at anything. Don’t change. I know you think you’re smart. But don’t touch anything.” And after that happens the DBAs couple of time they’re like, “Fine. You told us not to touch anything then we’re not going to touch anything.” Yeah, I want to go back to playing Solitaire in my desk. And so that’s how kind of this dynamic thing got worked out. And either side really understood the other side very well. And SQL guys are told to keep their hands of and a lot of them did. I do have a one funny story with a customer. They did have a SQL team, a SharePoint team and SharePoint was running fine and then all of a sudden things started breaking and I can’t remember exactly what the symptom was because this was 8-9 years ago. But after asking everybody, “Did you change anything?” “No, we didn’t change anything.” Talked to one SQL DBA who has kind of a stroke of honesty about him. He was like, “Yeah, I was playing at the public role a little bit and then I took a bunch of things out. Is that bad? Is that wrong?” Well, yeah it was. And in his defense that industry with a very secure industry and that was just his job was to make sure that nothing had permission that they didn’t need to have and he didn’t know for sure. And he was trying to do the right thing. And we are like, “Yeah, don’t touch that, that DBA.” But that kind of stuff happens all the time. So I agree the DBA should keep their eye on that kind of stuff but there are a lot of things going on and there is not always a DBA out there. You might have real SQL things to worry about and that SharePoint might got stored up in some place where you didn’t know about. Kind of off the record there and that kind of stuff happens.

Steve: So as the DBA oftentimes we are put in a position where we are responsible for that server but then we’re told, “Don’t touch it, don’t change it.” And then things happen like databases got big because a lot of login and what do we do? I mean, we can’t touch it, we can’t change it but we’re running out of disk space so we just throw unlimited amount of disk space at it or more memory.

Todd: All of the above. I keep hearing how cheap disc space is and I hear that from people all the time. When I was a server admin like, “Why does it cost so much for server space? I can go to best buy and buy.” So a couple of things so Microsoft has some documentation to kind of helps with that. And I can give you guys the links to that. The SharePoint product team understands the tough position DBAs are put in so they have hard guidance on here’s the things that you can do to secure SQL. Here are the things that you can and can’t touch in SQL to give you guys sort of a framework on things that you can do. And so I’ll get you the links for those listeners, those of you in the shownotes these guys have or I can blog, toddklindt.com, either way. So a couple of things about that, one of the things that I would bring up is that when you create a SharePoint farm and out of the box I forget where it is now but there is like 19 databases to get created with just a regular run of the mill SharePoint farm. SharePoint makes some really bad assumptions about how things should be configured for instance it doesn’t understand putting databases in different drives. It doesn’t do that automatically. It sets your recovery model to simple. All these kind of things so as a DBA you can’t assume the things were setup in a way that makes sense because SharePoint doesn’t always do that. One of the things that you can do is go in and look at the recovery model and say, “Ok, we don’t need. Sets with full recovery model.” You would probably don’t need that unless you’re Probably doing some other kinds of smart backups truncating your logs and doing all these things. SharePoint doesn’t always do that correctly. It’s going to put everything on the default data drive, there’s a default log drive, there’s obviously some databases that you can move around that makes sense. I’ll put them on different tiers of space and different sizes so there’s some of those things. Really looking at the documentation and finding out is the best way to do it. One of the things that I have talked about when I speak to SharePoint administrators is I don’t expect them to be SQL experts. There’s just not enough time to be a SharePoint expert and a SQL expert. And so they should know a few things, they should know some vocabulary, they should know some of the terms so that when it comes to you guys or when you guys come to them that you can kind of speak to them and saying the same things. So if you come to somebody and say, “Hey, these drives are getting full what can we do?” There’s situations where content databases can have huge amount of white space in them and so then you need to have a conversation, “Should we shrink this databases, should we not shrink this database”, things like that. So it’s kind of a two way deal.

Steve: So even in that default setting, Tod, you talked about like simple mode. But even back at least in, and this is dating me a little bit. We couldn’t backup the databases and use that as a restore mechanism, right? It had kind of go through the SharePoint console. Is that change at all?

Todd: So that is kind of, let me go back and say that by default SharePoint creates the databases of this full recovery model. Not simple because full is the one that fills up the hard drive and makes everything crash. But as you can treat your model settings and play with that. But if you don’t know enough, if you’re a SharePoint guy and don’t know enough that can bite you. As for databases with the exception of one database for the most part all of the databases inside of SharePoint. When it comes down to as known what kind of database is that and what you’re trying to do with it. So the one that you really can’t touch is the config database and that is the one that has all of these form specific and machine specific settings. You can’t move that thing. You can’t move it from farm to farm. You can’t really even move it from location to location side of SQL. It’s just a vey fuzzy thing. There’s a lot of hard coded things in there and that’s the one that really destructs all the SharePoint servers. So if something is wrong in that database SharePoint can’t start. Outside of that though like your content databases where all your documents, and your lists and all that but those things are completely affordable to a point where I tell folks, when I’m talking to SharePoint folks, if you do nothing inside of SQL. Get SQL level backups of your content databases because if everything burns down, if your SharePoint Server is exploding, your SQL Servers explode. If you got copies of those databases I can’t make sure you don’t get fired because that’s where all the stuffs at. I’ve got this great story where I used to work at a company and I left and like 10 years later the guy that was my boss calls me on a Friday night at 6 o’clock. I’m like, “Hey, Todd, how it’s going? Been a while.” And he’s like, “Well we got this big SharePoint environment and we’re moving it to a hosting company. And we get everything set up and we have all these VMs, all these Hypervisors and all these SharePoint farms and the hosting company did a reverse migration.” Now I’ve never heard that term reverse migration but I think I had a pretty good idea what had happened. When they were set to copy their local virtual machines and databases and all that to the hosting provider the hosting provider did it backwards and copied the blank environment over top of that production environment.

Carlos: Oh man.

Steve: Wow.

Todd: Yeah, so it gets worst, so he’s telling the stories like, ok. So we talked to our SQL team, they had databases, they had all the databases backed up the SQL team rocks. Everything is good to go. He’s like, “Go DBAs!” And I talked to the Windows team and I said, “Ok, we need to restore all the VMs.” And the Windows team said, “No, we don’t take care of that. The virtualization team does.” I said, “Ok”, so I went to the virtualization team and I said, “Hey, I need to restore all these Windows VMs.” And the virtualization team said, “We don’t do that, the Windows team does.” So this guy was in this situation where he had like I think 9 SharePoint farms and close to 30 SharePoint Servers and all the servers, all the Windows images are gone. All he had was the SQL images and databases. That’s all that he had. And so I was able to take nothing but the SQL images and SQL databases. And the SQL team rebuilt SQL Servers and monitor everything. And I brought all that up full fidelity everything that he had the week before because all the good stuff, all the important stuff is in SQL. All of the data is in SQL. And the fact that they had just SQL backups, SharePoint backups. The fact that they had good SQL backup so I was able to rebuild everything. Bolt it all back together and as far as I know we didn’t go far. So where does the SharePoint part comes in is there are a bunch of SharePoint piece that aren’t in SQL and so if you don’t do SharePoint backups those things font get back up. And there’s also unit of scale inside of SharePoint. And a content database again which has all your important documents and list items and calendars and all that. They can have one or more site collections which just kind of the unit of scale inside of SharePoint. So SQL can’t backup anything in the databases. But if you do SharePoint level backups you can backup individual site collections and give some plenty of business there. But the SQL backups are the very first thing I tell anybody to do.

Carlos: There you go so companeros if you’re getting pushed out of the SharePoint love and make sure you’re still taking your backups so that one day you too can be the hero.    

Todd: Yes, recover from that reverse migration.

Carlos: So one of the things that I was going to ask about in the database itself. There are some settings that kind of draw these bunkers, right? And one of the things that it doesn’t want us doing is creating statistics. And these are statistics, again, so you have these very wide tables, of all these list, right, column 1, column 2 and we look at we go crazy. And then it’s like now you’re not going to let me choose, you know, create statistics at the database itself is going to create. I’m not going to create any on my own. I’m going to let your database create it and SharePoint is like, “No, no, don’t do that.” Where you get in trouble particularly even if you decide to go and to do is then when you want to upgrade it will say, “You’ve touched my database. I’m not going to let you upgrade.” My question is creating automatic statistics, is that still a no no? Is that something that we can turn on and then again assuming that we know enough and then we just remove them before upgrade time? Or what’s your choice there? Which side do you coming down on.

Todd: So I think that is covered in the best practices document that I promise that I would link to you guys. So the upgrade of the database stuff I’m not sure what the last version of SharePoint trying to upgrade from or to us. So I’m not sure if that particular one is still gets caught or not. These days when upgrade SharePoint so all you can do is database attach so basically you build a new SharePoint farm as you got SharePoint 2013 database as you built your SharePoint 2016 farm. There is a command lock that you can use to test the database and it will tell you all the things that you’ve done wrong and all the things that it does right. But more importantly it will tell you whether that particular thing will block the upgrade or not. And there are precious few things that actually block an upgrade. Now, play with the database schema that might be one of those things. But I don’t remember if the statistics, one if it just yells at you or if it blocks the upgrade or not anymore. I know when going from SharePoint 2007 to 2010 it was super mean about those kinds of things. I don’t know if that’s still the case or not, best practices thing, I think it’s in there. But yeah, you’re right and that’s why I feel for the DBAs is because there are so many things that you guys do to keep things run then SharePoint comes along and says don’t do that. Now I’m looking to the best practices document right now and it says do not enable auto create statistics at SharePoint content database. We will find you, we will pull your fingernails, put cockroaches in your underwear that seems unnecessary. I don’t know why they didn’t do that. Yeah, so don’t do that particular thing now that is the SharePoint 2013 best practices document.

Carlos: So you would still follow that as a best practice then?

Todd: Yeah, I would. And I’m sure your listeners are used to talking about things in Microsoft Support. And there is support in a capital S and support in the lower case s. So depending on what you’re talking about. There are different levels of what you can and can’t do. I’ve absolutely done things that I know that Microsoft hates. But they can’t necessarily catch me up and I’ve got plausible deniability. If this is one, I don’t know enough SQL to know if there is a way you could turn that on and then you call Microsoft support and they notice that if you could turn it off and undo potential damage. I don’t know that one well enough to comment on.

Steve: Then when we’re looking at the SQL Server best practices or the normal things that we’re doing for performance tuning and then you compare that to what SharePoint is going to allow you to do. What are the kind of things that we should avoid or that we can get into trouble if we do. Or what are the things that we can do and get away with and work well?

Todd: Yeah, SharePoint is kind of like that pretty girl in High School. She’s really pretty and you want to demanding. SharePoint is kind of like that when it comes to SQL so again that best practices document that I’m going to link and that you’re going to link that will walk you through basically that question as a well seasoned DBA. You got a list of things in your head that you want to do and here’s the list of the things that you can do which is very short and here is the list of all the things that you cannot do and that really guides you. So a couple of things that we talked about you can’t mock with things like indexes and fill factor and those kinds of things. There are some harder things that can’t do. I know one of the ones we talked about that you can’t auto create statistics that’s a big no no. A gigantic no no is you can’t mess with the maximum degree of parallelization or maxed up to its friends that has to be set to one and only one or SharePoint 2013 and greater will just, depending on when it catches you changing that it will just fall down and throw up its hands and refuse to serve anything. So if you don’t have maxed up set to one on your instance when you create your farm SharePoint will refuse to create the farm. It just won’t do it. There are different maintenance processes and stuff that will check for that. So if you try to change it afterwards it will be very angry with you. While that’s obnoxious it is less obnoxious not checking and not getting angry with you and corrupting your data because it thinks everything is running and one threat that’s not. I send that to that best practices document to SQL DBAs all the time. it normally brings them into tears. Microsoft has done a pretty good job because this question comes up all the time. SQL DBAs are better way longer than SharePoint administrators have. So you guys honed your craft a lot better than we have so we have to provide you with this improvision.

Steve: Sure and we just don’t have that cool swagger that you mentioned earlier.

Todd: It’s true. I mean, I’ve met some of the coolest SQL folks and you got nothing on some of the coolest SharePoint folks. It’s true story.

Carlos: And here I was starting to warm up to you Todd.     

Todd: I’ll take care of that.

Carlos: A Chill just entered the conversation here.

Todd: So that’s plenty, many years ago. So I’m Microsoft MVP and I have that for 10 or 11 years and along one of this flight to the MVP Summit I was sitting next to Tom LaRock. I know you guys have deal with him. I do mean deal with him, not meet him, he is a special individual. So on that plane, you know, it’s all nerds. That plane right out to Seattle just have the extension nerds. I’m sitting next to Tom and he pulls out his laptop and he starts doing SQL-y thing. And so I was like, “Alright, he’s probably an MVP and I start talking to him.” And to this time, and this is probably 10 years ago, Tom and I still talk all the time because when that conversation started up I’m like, “Hey, you’re a SQL guy.” And he’s like, “I’m a SharePoint guy.” And he kind of looked at me and he gave me the stink eye. I’m like, “No, no, it’s cool. I’m on your side.” What I can do is SharePoint guy to make SQL better. Help me understand the performance things that I can do and we kind of bonded over that. So there’s some amount of, we can mix the chocolate and the peanut butter but it works out.

Carlos: Now getting back into the reverse because basically out hands are tied and don’t stick your hands into the cookie jar, right, get out don’t do anything. So on the reverse side there how are we driving our SharePoint admins crazy and how can we do a little bit more and less yelling.

Todd: Yeah, well I think there’s a bunch of stuffs you guys can and a bunch of stuff that SharePoint people don’t understand. SharePoint administrators everything in SharePoint is the can’t to them. It’s the word docs, it’s excel spreadsheets that they are laser focus on that that they can’t see anything else. And as a DBA you understand that by the time a piece of data end up in a content database has probably gone through TempDB. It sat in a lot of cloud for a while. There are a bunch of things that happened before it gets into that database. Whereas to a SharePoint admins, you know, it goes from my mom’s computer to the screen directly into that SQL database, you know, no stops between. So one of the things you guys can do is on the backend thing about where database go, which drives TempDB goes on, how many TemDBs have, where to put the log, where to put the databases and things like that to make SharePoint shine. That’s one of the things and educate those SharePoint folks on that and why you’re doing it, and help them understand the value of that. That’s makes so much difference. Making sure that your SQL Servers have enough memory, have enough RAM to do the things that they need to do. And there’s some guidelines out there but again you guys, again it’s almost more art than science at this point. You guys can feel those things out and you understand the different metrics to look at and see where SharePoint is come flowing down and help guide those folks out.

Steve: Yup, so on that point then around moving data and log files around. I mean generally we see with SQL databases that if you can have your log files on a different drive than your data files. And you’re having I/O contention you usually can improve performance with that. And one of the things I’ve run into with SharePoint or SharePoint admins that you make recommendations say this is what we’re going to do, we put in a new drive, we’re going to move the log files over here, we’re going to move the data files whatever and there’s a lot of push backs saying, “oh you can’t do that with SharePoint.”

Carlos: Yeah, you can’t touch it, you know.

Steve: That will invalidate the warranty and the entire system will blow up.

Todd: Yeah, I think that comes from two things. I think one of the things is SharePoint administrators are very sure of themselves and so they have heard that you shouldn’t touch things in SQL and so they repeat that with great authority and feeling. And the other thing is I think most SharePoint administrators are scared to death of SQL. I mean just terrified of it. And so they’re afraid and if you go back there and fiddle some bits that it might just kill SharePoint and SharePoint will never recover. And so I think it’s SQL parts in those two things. SharePoint absolutely is fine with the transaction logs and databases being on separate drives. It’s absolutely fine with if you’ve got a content database that has really fast filling logs and you need to move those. It’s fine of taking that database offline in SharePoint which is different than dropping it in SQL the way SharePoint uses offline for databases is maddening. But it actually understands that moving the logs or backing it up it actually covers all these things. You can actually do all those things.

Carlos: Just to clarify there. So I go to my admin and say, “We need to make a change here.” We’ve gotten to a point where they will trust me and I’m not going to break everything. Am I using SharePoint to take the database offline or can I do that through SQL Server Management Studio and take the database offline and move the files around?

Todd: Yes. It depends on the database and what you’re doing on the SQL backend. So the content databases and the content databases alone have this idea of being offline and detached and things like that. A content database in SharePoint if you set it to offline what does that not mean is it’s not accessible to SharePoint. You know, the way offline means in every other of the tech world every time ever except that one instance. And to make it even more confusing there are two screens that that’s on. In one place the actions are offline and ready for our database, the content database. In the other screen the actions are stopped and started.

Carlos: Oh, interesting. Okay.

Todd: And it is the same setting. It’s been that way for 10 years. So as a DBA you hear that the database is offline and you’re like, “Hey, the database is offline.” When SharePoint, what it means for SharePoint is I can’t create new site collections there. This is what that means. In content database you can detach them and then SharePoint doesn’t look for it anymore. And them what they’re doing in SQL doesn’t matter. You can do whatever. But you can have different file groups and things like that for databases and depending on what you’re doing on the backend you may be able to move those transaction logs and create another log on a different drive. You know, SQL does it thing it tries to balance things and you can move things that way as long as that database is accessible. SharePoint doesn’t care.

Steve: We don’t recommend creating additional logs and files but moving them would be a preference there.

Todd: Yeah, if you could do that. If you could say this content database because they’re the ones that are going to get really big. Say, detach this in SharePoint so it doesn’t show up. You are going to have little bit down time but that’s ok. And then take your transaction log moving at the database back in. Yeah, SharePoint doesn’t care.

Carlos: Well, they shouldn’t be getting that big anymore because now we have permission to go and take the backups to the databases and we’ll help manage that size from that perspective.     

Todd: Yeah, and that’s so I’ve got, I don’t know how many hours I’ve spent with the SharePoint admins when I was consulting and they call and say, “Hey, SharePoint is down everything is ruined.” Never would be because the transaction log file was My favorite one, and this is like in 2008, and we are talking almost 10 years ago and a guy calls up all freaked out, “SharePoint is down. SharePoint is down.” I imagine him running around his office waving his hands in the air, his hair is on fire that ordeal. And so I jump on there, sure enough it’s another one of those farms that gigantic transactions logs had taken down. And this transaction log was 350 GB something like that. I don’t know how big transaction logs I know you guys have seen this days. But back in 2008-2009 that was a pretty big file. So I walked with my typical SharePoint swagger and I’m like, “Don’t worry about this man. I got this. I clean this up all the time.” And I went in, changed it from full to simple, and I know in the background what the database engines are going through and marking transactions and all that. So I did a little chit chat with the guy and told the administrator that I’m going to try and shrink the transaction and I told him what causes it’s going to be. We shrunk it like 20GB. Which 20GB that’s a huge. This guy is seeing 350GB file and now let’s say 330GB file and he’s freaking out. But it just goes to show that he didn’t understand the back end of SQL. He understood SharePoint but he had to think through all that. And I’m like, “Good luck on your database engines working as fast as it can.” But that’s kind of stuff happen all the time. Another fun one that I had, I was working with a customer we’re doing test migration and we talked earlier in this podcast about database are being affordable and do SQL database level backups. So I jump on his SQL box and make backups of all his databases. And I look at the SQL box and it’s got the amount of space I need times 1.5 free. So him and I start to running the backups and all of a sudden it shuts down. And he’s like well is the backup still going? I’m like, “Well, that depends a lot on why I doesn’t it?” Because your connection to the server went away probably still going on because if the server burst into flames probably not. So he tries to connect and he can’t connect it and he’s like, “I don’t know what’s going on?” And then he gets a knock on his door and everything is down, email is down, the whole thing. And he’s like, “Oh, sounds like our hypervisor went down.” Okay, so he has investigation. They have been thin provisioning their drives. And I sort of this multi gigabyte backups and filled up the drive on the hypervisor everything paused. Ok, good enough so he goes some stuff. Couple of minutes later we’re back up and going. We jump back on the server of course the backups didn’t finish because the machine shut down. And set backups again. We did about 10% in. You need to figure out some space for this. I can’t keep your taking your old company down like this. I feel bad. And I remember back in those days when I would speak in SharePoint conferences I will tell people, “Don’t virtualized SQL. Just don’t do it.” There’s a performance penalty. Of course today in 2017 everybody virtualizes everything everywhere. But that’s one of those things back then. I’m like, “This is just another reason not to virtualize SQL, going to have that drive space, going to have good I/O.

Carlos: You know, that’s right. If you don’t have drive space you don’t have too much. Especially if you want to add something to it. And I guess if you want it to be read-only that’s one thing.

Todd: I keep thinking and I feel bad about it.

Steve: Alright, well so great information there I think for the SQL DBAs as well as the SharePoint people.

Todd: Yeah, one of those things that I want bring up to DBAs and this is the story of SQL aliases. And do you guys use SQL aliases much?

Steve: Not often.

Todd: Ok, that is a debate inside of the SharePoint community a lot. And the way that I feel and again it’s not my information on this is dated but I first started touching SQL aliases probably in 2008-2009. And it was like black magic to me. I mean it was just like. I don’t remember the first time I pulled the hard drive and everything kept working. I’m like, “This is black magic there is no way this works.” I felt the same way about the first time I use SQL aliases. And once word got out that those were there people were like, “We should use SQL aliases all the time.” But for the longest time when I would get up in front of a room speaking to SharePoint I would say, “Who here knows what a SQL aliases is?” And like 3 people out of 50 or 100 will raise their hand. And I would say, this is why you shouldn’t use SQL aliases. Because SQL aliases work so well that if you don’t know what you’re looking for you’ll never find it. You’ll never know why things don’t work. So my guidance to folks has always been not to use SQL alias unless you need it. Unless you can articulate the reason why you need to use a SQL alias don’t bother with it. Because you’re going to go on with your career hopefully and get big promotion or win the lottery whatever the next guy might not be as smart as you and he might not know about SQL aliases. You can drop one in anytime. There is no penalty for using that when you build you SharePoint farm. That’s one thing I told folks not to do unless they need it is use a SQL aliases.

Carlos: Yes, it’s one of those things because the people are migrating. They want to go from server A to server B and they’ll introduce that alias but I kind of feel like, you know, it just bite the bullet and get that SharePoint setting reset. Like change your applications unless it’s really going to be that difficult it’s worth not using the alias because you’re then kind of giving control if you will to who owns that. Whereas with like even DNS there’s an owner like you know where that is going from. You don’t have that quite with an alias so I think that’s kind of the tricky part with there.

Todd: Well there is one very specific reason to use aliases with SharePoint. Well there are two but it’s the same reason. We talk earlier about that config database. SharePoint does not support moving that config database, and so if you have to move servers or migrate or something there is no supported way to move where that config database is. So if you want to move it you have to use an alias. There is just no way around it. And the other thing is if you’re using any kind of non-standard ports because you can’t do anything with non standard ports with DNS because that’s just a hosting level thing. So that’s the two reasons that I tell people. And again it’s very valid. If you’re doing one of those things rock on use an alias. But the thing that I tell people is there is no benefit in using the alias when you build the farm. You can always drop it in. Because aliases I’m telling still today I’ve got half of the SharePoint administrators don’t know what an alias is or know where to find it, or know where to configure it or know that there is two of them. Know that they need to do the 32-bit one and the 64-bit one. So I tell people to be afraid. It’s like fire or gunpowder. I have the respect for it. It’s a good tool but don’t play with it.

Steve: Yup, very good advice. So shall we go on into the SQL Family questions now?

Todd: Sure.

Steve: Can you tell us how you first get started with SQL Server and or SharePoint?

Todd: Yeah, so I got my first SQL Server cert in December of 1997. I got my administrating SQL 65 certification back then. It was one of those things I worked for a couple of company that use a little piece of a kind software. Maybe you’ve heard of it. Great Plains Dynamics and this was way before Microsoft purchased them. And the company that I worked for is a consulting company and they wanted to use Dynamics that was really expensive to buy. But what they found was if you’re a Great Plains reseller you got to use it for free. And so they said, “Congratulations everybody we’re going to become a Great Plains reseller.” And to get that you have to have somebody who’s certified on the software and that person Microsoft because that stuff also run on You had to be an MCSE and you had to be certified with SQL and then you go to this class. And I was already an MCSE so they said, “Congratulations Tod, you’re about to become the SQL Administrator. Go pass this test because you’re going next month.” That’s how I got started with SQL 20 years ago now. And then SharePoint, my introduction to being a SharePoint administrator was very similar. I was a server admin at that time. This was like 2002 and Windows Exchange that kind of stuff and my boss told me to setup a webpage that had all of our servers worldwide and put it up on a webpage. And I can’t program, I can’t design webpage, I can’t mesh my clothes for god’s sake. And I look into this task and I’m like, “Well, huh, I got my resume so that’s good. I’m going to need that.” And then I was drinking some coffee and I had like what the Office XP CD and it had a trial or a thing of SharePoint Services 2001. And I’m like, “I never seen something like this in Mac.” And I installed it and put this database out there and held up there and I was hailed as a hero and it was all great and then the company decided that we’re going to write some software on top of SharePoint 2003 when it came out. And my boss said, “SharePoint we’ll I’ve got a SharePoint guy.” And that was the day I became the SharePoint guy.

Carlos: That’s when the enter swag, right?   

Todd: Exactly. I stood up a little straighter, shoulders are a little back.

Steve: Wow, nice.

Carlos: Now, if you could change one thing about SQL Server what would it be?

Todd: Boy, there’s a lot of it. This I will tell you that SQL has gotten better with a bunch of things. I haven’t played with the SQL 2016 PowerShell stuff as much but I know with the previous versions the PowerShell was shall we say odd. As a guy that uses PowerShell for Windows and for SharePoint and for Office 365. The SQL PowerShell seems to dance through its own beat a little bit so that’s one of the things. I think for the most part of SQL are really matured products. And they’ve made a lot of mistakes but they made them a long time ago. So some of the SharePoint stuff maybe around is database sizing, the recovery models and things like that. One thing that always gets me for whatever reason the way I think is security piece is backwards. In my head if I want to give a user permission to the database that should be a function of the database. But it’s not, it’s a function of the user. I have to go in this security find the user you give them an access to the database. In my head it should be database, the other way around. Does that make sense?

Carlos: Got you. Sure.

Todd: Little things like that because there is a bunch of places in SharePoint we have to do that. And then because of that security stuff is in, the master database and the database is running I can see that coming through with the backups. But just little things like that but for the most as a SharePoint admin SQL treated me fairly well. Again like fire and gunpowder I have a healthy respect for it. It has been pretty good to me.

Steve: Ok, so what is the best piece of career advice that you’ve received?

Todd: Professional football is not for you maybe you should be a computer nerd. I think that was probably pretty good.

Steve: Was professional football an actual option at some point?

Todd: No, not even a little.  I was a nerd from the day I was born. So back in High School wasn’t really cool. I spent a lot of time hiding in my locker. Spent a lot of times wondering why girls wouldn’t talk to me and things like that. Turns out being a nerd works out for me. As I look back now it’s cool to be a nerd. No, there’s never a real option for me at all. I think I had a couple of them. One of them something we talked about before we go into air was just it’s all about people. It’s all about who you know and not in an equitism sort of way. But it’s all about whom you have relationships with so never burn bridges. Always give the other person the benefit of the doubt. One thing that I found over the years is anytime I have this idea who’s the idiot who thought of that? The answer to that question is usually a well intentioned smart person who just sees things you haven’t thought about yet. It humanizes some of these down decisions that you see people make. They see other aspects of the problem that you haven’t seen. And so you give other people the benefit of the doubt. And another piece that I got was about 5 years ago when I start working for I was part of an acquisition and I was a remote worker. I wasn’t in San Antonio. I wasn’t with everybody else. And I was like my first review. I had my goals and I didn’t get them all done. And I talked to my boss and he’s like, “How come you didn’t get all the stuff done?” And I’m like, “Well, I’m not in the office and this guy wouldn’t call me back and I couldn’t get hold of this guy.” And he’s like, “Yeah, that’s the truth. Every bit of that is true. Figure it out.” I’m like, “What?” He’s just like, “Yeah, try harder. I don’t care about what they’re doing I care about what you’re doing. Yup all these obstacles are in your way figure it out. Make it happen.” And I was like, “That’s kind of…” But it was actually the advice that I needed. You just got to go through. Bad things happen, stuffs are unfair, how you handle is the key. And this guy wasn’t very empathetic he was kind of a jerk. But it was a good advice.

Steve: Alright.

Carlos: Tod, our last question for you today. If you could have one superhero power what would it be and why do you want it?     

Todd: So it’s funny I was asked this question a few years ago and for my superhero power I wanted to be a SharePoint administrator and I got it.

Steve: Wow, I think you’re the first person on the show that’s had that aspiration.

Todd: And so I think it’s a self explanatory as why I want it. It gives me out on shows like this. No, a superhero power, you know I watch a lot of Sci-Fi and I’ve found almost every superhero power comes with a really negative side effect. I don’t think I have the morals to have any of those powers and use them for good. So I think, maybe the superpower will be I’ll always know whether or not so that I would know when to put that down. Maybe something like that because I swear to God I would screw up any superhero power I had. I cannot be trusted with superhero powers.

Carlos: Not too well. Ok, so we can trust you with our SharePoint but we won’t trust you with powers.

Todd: Yeah, that is it.

Steve: Alright, well it has been great information, Todd. I’ve learned a lot during this episode.

Todd: I’m glad. I think there is not enough done to get our communities together. I think the SQL community and the SharePoint community they need to understand each other a little better. I think that would be good thing.

Carlos: Well, awesome. Thanks again for being with us today.

Todd: And if any of your SQL folks have questions or whatever I’ve got a blog toddklindt.com and I’m on Twitter @toddklindt. I’m not very imaginative with my names obviously. But yeah, tell your folks to reach out to me. I’m happy to hear from anybody.


Episode 87: Resource Governor

Why do some features get all the fun?  I would like to be a fly on the wall at some of the Microsoft feature development meetings. There are many very cool features that didn’t get quite enough love to get the adoption they deserve.  In this episode we are talking about Resource Governor and before you roll your eyes, you may be interested to know some of the side benefits of using the feature.  We are joined by Mark Wilkinson from channeladvisor to give us an overview of the feature and how they use it to manage SQL resources and workloads.

One interesting story is how their team won the query challenge last PASS summit using Resource Governor so it is pretty powerful. Mark will also share his experiences using Resource Governor in different working environments. He will also be discussing the basics and tips about building the classifier function of Resource Governor. Furthermore, we will go through setting up I/O caps, resource pools and workloads, and the NUMA node affinity.

 Episode Quote

“I mean, It’s just a really cool tool to use”
“Say what you mean and always ask for exactly what you want”
“Always make sure people know how you feel about stuff and always be honest and straightforward”

Listen to Learn

– Managing SQL Resources using Resource Governor
– Resource Governor Classifier function
– NUMA node affinity
– Setting up minimum and maximum limits using Resource Governor
– Resource pools and workload groups
– DMVs and I/O operations

Mark on Twitter
Mark on LinkedIn
Mark’s GitHub repository
Mark’s Posts on Resource Governor

About Mark Wilkinson

Mark is a Database Administrator at ChannelAdvisor solving interesting problems on large databases.

He is a father of 4, and lives in Raleigh North Carolina. In his spare time he likes to read, speak, and learn new things.


Transcription: Episode 87

Carlos: Ok, so Mark, welcome to the program.

Mark: Thanks for having me.

Carlos: Yes, it’s nice to have you back after a good amount of time being with us in Episode 14. Today, we’re going to be talking about Resource Governor, and always to me at the, you guys over ChannelAdvisor using all the cool toys it seems.

Mark: Well, the cool toys, it’s kind of, Resource Governor is a little bit old but yeah.

Carlos: Oh yeah, well you know, but I’m impressed by all the features that you’ve been able to implement in regarding to SQL Server. And Resource Governor is another one of those features and that’s kind of the topic of our discussion today.

Steve: Seems like with Resource Governor, it’s one of those things that people either know use it and love it or they don’t know an awful a lot about it and never tried it. There’s not an awful in between there and the people I’ve talked with.

Mark: Well, it’s funny actually we were in Redmond for a week in Microsoft Labs doing some testing. And part of the gig when you go there is that you have to talk to their SQL developers and telling them the things that you like, things that you don’t like and stuff like that. When we listed the features that we use, I mean there were some features on the list that we had to explain what they were because they were so like, yeah we use CMS. What’s a CMS? Central Management Server. People definitely knew what Resource Governor was but, yeah, we use a lot of features and there’s definitely some stuff in there that not everybody is aware of. Yeah, I like to tell people about Resource Governor because we got a lot of use out of it and some need to kind of things we didn’t expect that came up as well.

Carlos: Sure, well I think that classic scenario and ultimately I guess the definition of Resource Governor is it allows you to manage SQL Server resources and workloads by putting limits on what either logins, or users, applications can use and have access to, right?

Mark: Yeah, anything that’s basically related to the session. You know, so like you said the application name, the user. There are a lot of different things you can classify on but pretty much anything that’s related to the session itself and you can use to determine which group or something it’s classified until limits is put on it.

Steve: Now can you use that for just limiting or can you also use it for some type of monitoring. Yeah, that was when I’ve mentioned that there’s some kind of neat side effects we didn’t expect. The monitoring capabilities, not really capabilities but the metrics you can get out of Resource Governor are kind of amazing. It keeps track of anything that goes through each of the workload groups, from CPU time, to amount of reads and writes, I/O operation issued, total number of requests, all kind of things like that.

Carlos: And you don’t always have limits on those. It’s just keeping track of them because it’s part of the session.

Mark: Exactly.

Carlos: Well then why did you want to implement Resource Governor?

Mark: Yes. It kind of came up because we’re having issues where a certain, actually it might make sense to go kind a level higher and then talk very vaguely about our architecture at ChannelAdvisor. It’s not going to seem alien to most people probably have experienced something like this but we’ve got a set of SQL Servers and then we have got a set of application servers and basically any team can submit work to these application servers. And the work is kind and all those servers communicate with the database instances. That being said we don’t have a lot of control over how many like requests can be made from a given application server or set of application servers. So we were starting to run into situations where maybe one team is using all of the instances and there wasn’t really much left resources for anybody else to use. So we had a find a way to kind of limit that, reduce that a little bit and that’s when we start using Resource Governor.

Carlos: Ok, so these are actual teams then, so you kind of separated by team or I guess by application or these are by login?

Mark: We’re very strict in our application naming in the connection server when you connect to SQL. So any application that communicates with any of our instances should have, I say should, we would love to evolve into it but, you know, sometimes things They should all have an application naming associated with them and the team name is part of the application name so, you know, Team A, their application with the Team A, you know, file process or something like that. So we can use that application name then to push them into the Team A workload group.

Steve: Ok, so if somebody is listening and now they’re thinking, you know, I want to give Resource Governor a try. What do they need to know about SQL Server versions or editions or anything like that as far as trying it out?

Mark: So as long as you’re on a version 2008 and higher you can try it out. Now the features you can try out are going to depend on which version you’re o. Obviously, 2016 has all new bells and whistles which that’s kind of good to mention as well. It’s not one of these technologies that seems like it’s going away. I would imagine they probably use Resource Governor in Azure to kind of separate workloads if you’re using like the Azure database or something like that. But they’ve been adding features to Resource Governor as of SQL 2016 so it’s definitely something they’re continuing to work on. At a minimum you’re going to be able to put some limits on CPU memory and then the concurrent requests. Concurrent request is actually where we started. Like I said we don’t really have a lot of control over the amount of work that’s coming from our application server and hitting the database. A lot of it is customer driven. So the way we originally kind of went at it was that we would reduce the concurrent requests per team so that everybody had some resources to use there. But once you get into later versions you can start doing things like limiting I/O so you can set a minimum, maximum amount of I/O. Like in that operations per second so it’s not like, you can’t limit like throughput or anything like that just the number of I/O operations per second.

Carlos: Right, and start creating additional bottlenecks, right. You’re like, you can only return this many records.

Mark: Yeah, one really actually kind of neat thing you can do, were you guys can’t remember it was kind of a whirlwind but you guys were ate PASS, the summit the 2016, yeah?

Carlos: Steve was.

Steve: Yeah, I was there.

Mark: So Sandisk had a query challenge setup where you had to tune. You had to make this query run as fast as possible but you couldn’t actually touch the query itself. You can only do things to the instance. You only had I think 5 minutes for your changes to be done and then you had to run the query and see what your time was. So we actually won that competition by using Resource Governor. Yes. One of the things you had to do is you had to, one of the things that really helped the performance in that case was rebuilding indexes but there wasn’t really enough time to rebuild all of the indexes that you needed to. But using Resource Governor you can set NUMA node affinity so we actually just dedicated a few CPU cores per index for the index rebuilds. So there’s a lot of really weird and interesting things you can do with Resource Governor if you take the time to learn all. NUMA node affinity is definitely one of those kind of edge cases that helped us there. I don’t know where you might use it in production but it was kind of a neat application, the Resource Governor.

Carlos: Yeah, so next year on your PASS you can’t.

Mark: And we might pull out some new tricks next year.

Steve: Ok, so then given that it’s been around since 2008 is that available on all editions of SQL Server or is it an Enterprise only feature, or who’s going to be able to use it there?

Mark: Unfortunately, it’s Enterprise only. I’m lucky enough to work in a shop where it’s Enterprise so that’s not a problem for me but that is an Enterprise only feature.

Carlos: That was interesting. And chatting with Jessica Moss, they were talking about the 2016 Service Pack 1 and how was that kind of opening up additional features. And taking a quick peek here it doesn’t looked like that one got opened up with the Service Pack 1.

Steve: No, I think I can remember when that Service Pack 1 announcement came out there were some people griping and wishing that Resource Governor had been included in Standard Edition after that. But I think it’s not there yet but maybe one day, who knows?

Carlos: Yeah, we’ll see.

Mark: Yeah, I mean even if they could release the subset of the features I think it would be useful for a lot of people. One, actually, this is super important thing to mention about it. And another thing that I wasn’t aware of when we first implement it at bit it’s been kind of useful, when you implement Resource Governor you set up what are called resource pools and then within each pool you setup workload group. The workload group is where the actual request get categorized to. But for every pool that you create it gets its own plan cache. So you can actually use resource governor as a way to kind of partition your plan cache. So cases where that might be really cool is if you’ve got like we do. We got multiple teams executing procedures. They might be running the same procedures with completely different parameters and causing some weird parameter sniffing issues but as soon as you kind of partition the plan cache those type of issues can go away. So that was kind of another neat side effect of implementing this.

Steve: Interesting. Now, you say they can go away but it seems like it might also hide them a little bit where if you have one plan that’s having parameter sniffing issues that’s only for one group of users. That might be more challenging to track down.

Mark: Yeah, it all depends on your workload obviously. And if somebody messes up and changes their connection string or does something weird that stops their work from being classified properly they might get dumped into the default workload group or something like that and we can see some issues there as well. Yeah definitely, when you’re using Resource Governor make sure that whatever you’re using to classify your workloads is something fairly stable.

Carlos: Well then where did you guys start with? I guess take us down that journey and let’s talk about, you know, I guess you mentioned setting it up. What were the first steps and how would you recommend people going about setting up Resource Governor.

Mark: So like, I think I said anyways, you kind of mentioned. You don’t have to setup any limits when you’re using Resource Governor so I think it’s really good like first step you can take is to just setup your resource pools and your workload groups and create your classifier. So the classifier is a function that lives in master and it is executed for basically every request that comes into the box. So you do want to make sure that it’s pretty efficient function because it’s going to be ran a lot. So yeah, if you’re just getting started you want to see if this is something that you want to use you could just setup the pool and the group. Get a classifier that sorts classifying work into the various groups.

Carlos: Now, you mentioned it should run efficiently but I thought it’s going to be like something that I just chose from setup. How am I going to choose an efficient one?

Mark: Oh no, yeah, yeah, there’s definitely nothing built-in. You’re going to have to write it from scratch.

Carlos: Oh, got you.

Mark: Yes, so when I say efficiently just make sure you’re not doing any extra work. There are some limits on what you can in a classifier function like in schema bounds you can’t access objects outside of master.dbo. So I think that’s actually there just to stop you from hurting yourself.

Steve: So then, just a back of a second, was that a stored procedure or a function you said you put in there?

Mark: Actually, it’s a function.

Steve: Ok.

Carlos: Got you.

Mark: The classifier is a function that you define.

Carlos: I feel like we just, I just found the reason why more people aren’t using Resource Governor.

Mark: Yeah, it could be it.

Carlos: You know, if you have to be building that function to do that for you I feel like that’s going to be a big hurdle to jump through.

Mark: So it’s actually not that hard. The function has one job really. So when I knew request comes in the Resource Governor classifier function looks at that request, looks at some different parameters of that request. So for example there is a few like variables that are setup by default, well not variables sorry, they are system functions. So like app_name fro example that we would use. So for your classifier function you would just look at app_name and then set up a case statement. That’s what I like to do. You could do if else wherever you like to do there but really all you’re doing is writing code that will look at that application name, figure out what group it belongs to and then just return the name of that group. So that’s all the logic there really using your classifier function. It’s one job. It’s just to return the workload group that this work should go into.

Carlos: Ok, so when the session comes in it’s just passing the app_name and then I will tell you what group you belong in.

Mark: Yeah, exactly and like I said, application name is just one example. There is a lot of different things that are available to you. Like username for example is another good one so if you’ve got like maybe all the stuff coming from your application you do want to categorize into these groups and put limits on them and all that. But if something comes in from the web interface maybe it’s using a different username you could then classify that to be unlimited and not have limits at all.  

Steve: So then you’ve got your classifier function set up and it’s sitting there in the master database and then how do you go about using that then through the resource governor.

Mark: So when you’re configuring, configuring is a little bit too much of a big word to say when you’re dealing with resource governor. When you’re initially setting it up, to enable it you just run Alter Resource Governor Reconfigure. That’s all, it is running. It’s not classifying anything but it’s running. So then you run another alter statement where you just set the classifier function and that’s it. So once you specify the classifier function, anything that’s coming into the instance is going to run through that function.

Carlos: We still do have limits here?

Mark: No, there are no limits here. So what you have right now is basically the metrics on your various workload groups. So there is a few DMVs you can look at for this information but it will show you the total number of requests that are coming through for instance. So if you were just interested to see how much work all these various groups are doing all that is available to you.

Carlos: And I think that’s a great suggestion.

Mark: Oh yeah, definitely. It’s a good way to get some insight into what’s going on on your instance without really having to do tons of work. You could probably get this running on maybe, you know, 20-30 minutes.

Carlos: And it’s base lining, right. I want to baseline my system and then I’m going to start making some tweaks.

Mark: Exactly. One thing to mention too is those DMVs are accumulative so as requests come in the number just goes up right. So if you want to monitor all you’re going have to, like I do, you’re going to have to setup a table and then just store the deltas in there from last time that kind of thing.

Carlos: Alright, can you purge those or reset them.

Mark: You can, I can’t remember if you, so I know you can reset them obviously like most things when you reboot the instance or restart the services. I don’t know if there’s any like DBCC functions you could use to clear that out. One of the DMVs though which is kind of interesting because I don’t know if many DMVs have this, but one of the DMVs will actually give you a date of when things were last cleared. So you kind of you know how long it’s been gathering statistics.

Carlos: Right. Yeah, I can’t think of another DMV that does that for me.

Mark: Yeah.

Steve: No, I can’t think of one either.

Mark: It would be nice if they did, but yeah.

Steve: Right, right. Yeah, especially those index ones. They’re having problems like the index get rebuilt and their counts and what not will get cleared. You know like, “What!”

Mark: it would be very nice.

Steve: Yeah.

Carlos: Ok, so can you then maybe elaborate a little bit more on sort of the idea of resource pools versus workload groups, and sort of how those.

Mark: Yeah, so this is kind of a piece of over engineering in my opinion.

Carlos: Interesting.

Mark: Yeah, so I like to think of, I like to think of this as kind of hierarchy rights. You got your SQL Server, underneath that you got your resource pools, and then underneath that you’ve got workload groups. The theory being is that you could have a resource pool with multiple workload groups in it. I’ve never seen anybody implement it like that. I don’t implement it like that either. I usually just do for every, again well run with my example of teams, right. For every team, they’ll have a resource pool and a workload group. The difference between the two is just what kind of limits that you can set at which level. So at the resource pool level you can set limits on the minimum and maximum CPU that can be use, the amount of memory that each request can use, you can set your I/O caps and your NUMA node affinity. And then below that at the workload group is where you can set like max CPU time in seconds so not percentage but actual time. You can also setup the concurrent request limits at the workload group and then also maxed up at the workload group. So there are different things you can set in each level but like I said I’ve never set it up where I’m not doing a single pool with a single workload group in it, you know, for all those various type of work that I want to classify.

Steve: Yeah, ok. So then once you’ve done the classifier function and you have things assigned to the appropriate resource pool and workload group as the connections come in. Have you ever run to any issues if you limit something too much that it causes blocking to go bad or I mean something takes longer than it did before and now other things or bottleneck waiting on that?

Mark: Oh yeah, that’s always a big risk. We’ve had issues.

Carlos: And almost by definition you’re asking for that because you’re now saying in the example of this reporting services query or whatever that hey I don’t want you to be taken So that means you’re going to have to take a little bit longer, right?

Mark: Oh yeah, definitely. And if you’re in the kind of place where you’ve got a lot of different applications hitting your instance but none of them are really aware of each other and you slow one of them down and it starts holding locks or something for longer than it was before. Yeah, I mean it can cause havoc. So it’s always best in my opinion with Resource Governor to start with very high limits that you don’t think you’re really going to hit right away and then kind of walk it down a little bit until you find a nice comfortable place. But yeah, it’s always a risk when you’re implementing something like this that you could kind of rick havoc for other people trying to use your server.

Carlos: So we talked about implementing it in production and do you have this in lower environments as well?

Mark: Yes, we do. If anything is awesome on your development environment to implement this with no limits because of all the reporting you get. One example that I like to give people because it kind of helps it kind of sync in. At any given time I can tell you exactly how much CPU time. One of our application teams are using on any of our instances. And that’s all using, you know, just stuff built into SQL. It’s not some special monitoring. I’ve got the application layer just with Resource Governor I can tell you, you know, what percentage of CPU time was spent by what team in our development environment. And that’s really powerful because then you can go to that team and be like, “What you guys are doing here?”

Steve: Yeah, that’s really interesting because in my experience that development servers are usually a bit slower than production. And you end up with developer griping about why it’s so slow? And often times it’s because there is just not enough hardware and horsepower there basically to service everything the developers are doing. But if you could come back and say, “Well, it’s slow because this team over here is using 90% of the CPU. That might give some interesting insight into who should be paying for the upgrade on that SQL Server. Exactly, another kind of good thing you can do with Resource Governor is you can kind of slow down bursting workloads. So this is also come the development environments, right, your QA tests, automated tests running midnight everyday or whatever. You can set up limits on like concurrent requests for example so that when those tests are running at midnight it’s not going to take everything else down with it. It’s only going to be able to maybe execute three or four requests at a time and everything else will be able to run as it was. It might not run as fast as it was before but it will still be able to actually run.

Carlos: Right, and those normally are performance test anyway. Can this execute, will it complete successfully.

Mark: Yeah, does it work. So when you’re in those type of situations, yes, it’s really a great tool. And you can kind of get more out of your lower resource instances because things are restricted a little bit more that can hit it as hard.

Steve: Ok, so can you share with us maybe a little bit more around the monitoring of the stats or the DMVs to better understand like where someone would go to start learning that piece of it?

Mark: Yeah, sure. I mean, not to self promote too much here but actually I’ve got a blogpost not just this topic – m82labs.com/resource-governor. I’ve got a whole post on, even download the procedure you would setup together the statistics. But I can walk you guys through what I do. Might not be for everybody but what we do in our environment is every, I don’t want to say it’s every 5 minutes I do this, but I’ve got a table set up that stores, it’s going to come from all the different stats and things that I’m interested in. Every five minutes a procedure runs that grabs the latest data from the DMVs inserts it into the table and also uses the existing data in the table to calculate the delta, the change in all these different statistics since last time it ran. And then I push all that information into elastic search. And then from there we can graph it out in any number of tools and that’s our primary way of kind of visualizing how things are performing in our instances.

Carlos: Interesting strategy there.

Carlos: Any other links that we might reference here we’ll have available on the show notes page at sqldatapartners.com/resourcegovernor.

Mark: One thing to mention too and this was kind of surprising. Not to brag on Microsoft or anything but the documentation for Resource Governor at Microsoft’s page is amazing. They’ve got very in-depth documentation on really every aspect of configuration and what all the different DMVs are for, you know, what the different field means. This is definitely one of those cases where I head over there to take a look. It’s a wealth of information there. They even give you some kind of like example scenarios where you might use it so it’s very useful.

Carlos: Ok, so as far as then other scenarios that you might have seen where the reason the people are using Resource Governor, maybe a little bit of background story or more information on like why people are using it.

Mark: Honestly, I haven’t run into a whole lot of people that use it.

Steve: Ok.

Mark: Yeah, I mean for our use cases, I would imagine our use cases are pretty standard where you would see most people using it for outside the monitoring piece. But yeah, I haven’t really seen a lot of people that are using it. I’ll see the occasional talk like a SQL Saturday. But yeah, I haven’t really met anybody else that uses it that could also be because it’s an Enterprise feature and not everybody is in Enterprise.

Carlos: Right, now, you said that in a classifier function that you were using the login name which is how you identify who’s on which team.

Mark: Well, both actually. I use the app name and then in some cases we use the login name as well.

Steve: Ok, the app name and login name. Ok, so even if someone is using a shared login amongst multiple applications you can use that app name to be able to classify it appropriately base off of where that app is coming from.

Mark: Exactly.

Steve: Ok.

Mark: Yeah, and there is nothing special about the classifier function really how you’re writing it. So you can do things like, you know, where app name is like this or things like that so you’ve got pretty much your full range of TSQL that you can use within the function there. Make sure it’s not too expensive.

Steve: Right, right, so any that you’ve come across or anything that like you just tried it out and things went really bad and people might want to know about before giving it a try.

Mark: Yes, definitely so the kind of, you know, derby dragons here kind of scenarios with Resource Governor is you want to make sure that you’re really really read about the different limits you can set because they may not behave how you expect. Max CPU is a perfect example. Like what would you expect max CPU to do. If you set max CPU to 50% you would expect that whatever is in that group can only use 50% CPU, right? That’s not the case. How it’s defined is that in cases of CPU contention it can only use 50%. And how CPU contention is defined is kind of a mystery. I’m not sure. I’ve never actually found a good definition. But it sounds like if there’s other workload running on the box it will try to make sure that this can only use 50% but if this task comes through and that’s being governed. If it’s running in the middle of the night when nothing else is going on in the box it could still use 100% of the CPU even if you’ve got that limit set.

Carlos: So it’s almost like a tapering effect, right? So, you know, my process is running 100 miles an hour and then oh somebody else wants to come onto the track. Ok, well, you got to slow down and so this guy can speed up kind of a thing.

Mark: Exactly, so it can definitely still be useful but if you’re expecting it to just hit a hard ceiling of 50% that’s not going to happen. That being said, in the later version they did add a CPU cap that you could put on that would do just that. So there’s actually minimum CPU, max CPU and then a cap. So there are three different values you can set as far as the CPU is concerned.

Steve: Ok, so then if there’s minimum load on the server and my connection gets assigned to a resource pool with a max of 50% CPU but there is no other load O might be able to use of upwards of 100%. But then if I have a long running job on that connection that maybe runs for an hour, hopefully not, but let’s say it does. And someone else comes along and says, “I need 50% of the CPU”, but that connects and starts using CPU is it then going to take my existing connection that was using around 100% and squeeze it down to 50% or is that going to find a new connection?

Mark: That I’m not sure. I could definitely see where it would have the opportunity to limit it, right. But I’m not sure. I’m not sure what happens to that existing stuff that’s running. That’s another thing I would have to test.

Steve: Since like an interesting blog post there Mark.

Mark: There could be a whole blog post just around the various behaviors of these things. Another interesting one is you can set the minimum and maximum I/O. So it will limit the number of I/O operations that can have per second. But I have found in my testing that even if I set the max I/O to something that is impossible like 10 million to 20 million operations per second. Even when I set the limit that high there are still I/O operations that are being throttled based on what the DMVs are showing so that’s something I going to be looking further but that’s really why those metrics are important to look at. So even if you think you know what some of these limits are going to do I highly recommend people set it up in a test environment and even run some load test against it if you can if you’ve got that kind of setup. And just see if the DMVs are responding how you would expect because the I/O thing I didn’t expect at all. And the reason I start playing with it is because when you enable the I/O throttling even if your limits are completely nuts and something you never going to hit. It actually opens up another DMV that you can look at that will show you the I/O operations per volume, per workload group which is really cool because if you’ve got TempDB on a different volume than your data files you can actually look at TempDB utilization per workload group using that DMV. But once I start seeing that’s limiting the I/O even when it shouldn’t I can go of nervous about that one. It’s going to take a little more research.

Carlos: Yes, that sounds like caution flag there.

Mark: Yeah, definitely and that seems to be the case of the lot of these. There is just this little weird kind of scenarios where you might want to do a little more research. Fortunately though like I’ve mentioned before, the Microsoft’s documentation on this actually covers most of this scenarios. Max up would be another interesting one. You can set a max up for workload group that’s higher than the max up for instance and it will go ahead and it will honor that. But if there’s a request that’s coming through that manually defines a max up that’s higher than what the group is it will not honor that. It will only go up as high as what you have set at the group level. There’s a lot of, you know, like tweaks that you can do but, yeah, and see how it gets applied.

Mark: If you’re using like a Developer Edition, since obviously this is an Enterprise Feature it should be available on Developer Edition as well. So even if you’re not going to be using it on production because you’re running on Standard if you could get it on your development environment it could have a lot of use there or even just from the reporting aspects and also potentially from being able to kind of throttle your work a little bit so that maybe you can get a little bit more use out of those testing instances.

Steve: So shall we go ahead and do SQL Family then?

Mark: Yeah, let’s go ahead.

Carlos: So Mark, tell us how you get started with SQL Server?

Mark: I’m pretty much your standard accidental DBA. I was in a company and the DBA left. There was really nobody else left to fill the shoes so I kind of stepped up and started learning about SQL Server. I want to say we are running on SQL Server 2000 at that time. And I just kind of stuck with that ever since.

Steve: So if you could change just one thing about SQL Server, what would it be?

Mark: We actually, I’ve be interested to see what listeners have to say about this one because we would have a use for this feature at ChannelAdvisor but when we talk to Microsoft about it they weren’t. I don’t know if they were entirely sold on the idea. We would desperately love to have a query hint called notindex. So instead of forcing an index you can actually force a query to use any index but this index. We’ve got a lot of cases where we have an index that’s been created for a very specific purpose. But other queries will pick it up and start using it and whenever they do performance just tanks.

Carlos: Sure, because they’re grabbing on the locks on the index.

Mark: Well, not even that. It might just be the index might not just be that great for this other subset of queries.

Carlos: Oh, really but the optimizer is still picking it.

Mark: Yes. I don’t know why it’s picking it. But yeah, we do have cases where the optimizer will pick an index. We’ll hop on the box to see what the issue is and like, “Oh, this index again, great.” But you can’t get rid of it because there are a lot of queries that actually benefit from it, so a notindex hint would be fantastic.

Steve: So basically what whatever index you would normally pick like exclude this from your list.  

Mark: Exactly.

Steve: Yup. I would be interest in that.

Mark: Yeah, we would love that.

Carlos: What’s the best piece of career advice you’ve ever received.

Mark: So I thought about this here. And I honestly haven’t received a whole lot of career advice directly. But indirectly just from kind of absorbing a manager that I had previously. I think what I could take away from that experience with him was to always say what you mean and always ask for exactly what you want. So don’t get caught up in the kind of the games, the political games that happen in the office or whenever you have in a work. Just always make sure people know how you feel about stuff and always be honest and kind of straightforward. Even if it doesn’t always work I think in the long run it is definitely good advice to follow.

Steve: If you could have one superhero power what would it be and why would you want it?

Mark: I don’t remember which superhero this is but there might even be a few superheroes that have the ability to kind of duplicate themselves. Like I may have mentioned before I’ve got four kids so it’s usually pretty hectic inside the house. If I could duplicate myself and get some more stuffs around the house that would be fantastic. And then of course we’ve got SQL Saturday’s every weekend. All kinds of conferences and all that stuff so yeah duplicating myself would be very beneficial.

Carlos: How many SQL Saturdays could you go to if you could duplicate yourself several times?

Carlos: The mental strain I guess of concentrating on all of those different places. It’s also interesting as I think about duplication. It seems like and I guess this is just from my limited experience that those villains generally that have the duplication ability are always using it for nefarious purposes, right? So to have someone to actually be able to use it for noble purposes would be a change of pace there I think.

Mark: There we go.

Carlos: Yeah. Well, Marc thanks so much for being with us today.

Mark: Thanks for having me it’s been fun.

Steve: Thanks Mark, definitely some good information about resource governor today and thanks for being on the show.

Episode 86: TsqlT

Based on listener feedback we got from episode 72, a request was made to have a conversation about testing frameworks and in this episode we are going to focus on tSQLt.  We are pleased to have Sebastian Meine, the creator of the tSQLt unit testing framework, on the podcast with us.  We talk about the basic principles of tSQLt Framework when it comes to unit testing. Sebastian will also be discussing about the do’s and don’ts as well as the different aspects to be considered when using the framework. Sebastian makes the case for why you should take the time to create unit tests and we discuss the challenges culture might play into the use of the framework.

 Episode Quote

“Anybody who earns a living with writing SQL modules should test those modules…and tSQLt is the best way to do that”

“For a contribution to be accepted it’s important that it is well covered with tests”

Listen to Learn

  • What the tSQLt framework is
  • The case for using unit testing in database projects
  • How the framework has been integrated into third party projects
  • How you can participate in building the framework

Sebastian on Twitter
tSQLt framework
tSQLt project on GitHub
Sebastian’s Website

About Sebastian Meine

tSQLtSebastian Meine, Ph.D. is the SQL Stylist behind sqlity.net llc. In 2010 he founded sqlity.net to fully engage in his passion for SQL Server.

Armed with a Ph.D. in Computer Science from the University of Linz in Austria and years of experience in architecting DBs for a global Fortune 50 company, Sebastian now provides training and consulting services for his clients, specializing in SQL Server Security, Test Driven Development and Performance. Sebastian is an experienced public speaker, published author, president of the Philly PASS Chapter, PASS Regional Mentor and coauthor & lead developer of tSQLt.

Transcription: TsqlT

Carlos: Well, Sebastian, welcome to the program.

Sebastian: Hello!

Carlos: Yes. I think we have some interesting things to talk about and not only technology but how you decided to go ahead and out this together. I guess, first our subject is tSQLt and so why you won’t give us the nickel tour here on how you explain to somebody what tSQLt Framework is.

Sebastian: The tSQLt Framework here that is a unit testing framework for SQL Server. And what I mean was that a unit testing really means you go down to the smallest unit of code so in SQL Server that would be stored procedures or functions of use and you’ve write test for them in isolation. tSQLt, of course, a lot of functionality around that that makes particular the testing in isolation easier.

Carlos: Ok, I guess can you give an example there of a way that someone would might create a unit test around a stored procedure for example.

Sebastian: How detailed do you want me to get it’s early on?

Steve: Maybe we could take a little bit different approach there of higher level of like around test driven development. One of the things that I’ve seen with that is the sort of strategy of you first create a failing unit test and then you write your code to pass, and then you write another test and you write the code pass that test. Is that what people are doing with tSQLt?

Sebastian: Test driven development and I have to prefix this, I love it. Test driven development has a better reputation of that. A lot of people think it’s scary and it takes too much of work and we can talk about that a little later. Yes, tSQLt allows you to do test driven development. In fact, tSQLt itself was developed a kind of bootstrap using itself completely test driven. Important in test driven development is that you take a really small piece of functionality. You write a test such as that functionality isn’t there. And you implement a functionality and then you run the test again and it hopefully shows you now that the functionality is there that really does a few things. First, writing the test for us making sure that it fails shows to you that the test actually is testing something. It’s really easy in unit testing in any language to write a test that doesn’t really test anything. A test that can fail, so by writing that test first have the assurance that you have written your test correctly. And then by implementing just the minimum piece, minimum amount of code required to make that test pass. You end up with very modular code which in turn is really easy to maintain and you also have to test that tells you everything was still working.

Carlos: Ok. And you’ll forgive me right, so being a little bit of knuckle dragger that I am, you know, I guess it makes sense to me that you could, you know, a piece of functionality is not there and you want to test for it so let’s just say I’m adding a parameter, right. Or adding another column that you would return or something like that, right. Ok, that seems fairly straight forward. Am I using this also to test the results or to interrogate what gets returned from a store procedure or from a view as well, or is this just limited to the object itself?

Sebastian: Let’s talk about the view example. So I have a view and I want that view to return some aggregated data so I might have a count column and a sum column let’s say.

Carlos: Ok, right.

Sebastian: And so, tSQLt allows you to compare result sets. It actually compares tables not result sets so the content of tables. So what you do is insert your result set into one table and then you insert the expected result into another table and then you use set equals table to compare the two tables and it will tell if there is a difference or if they are equal.

Steve: Ok, so then in that case you would know based of the contents of the table that would be for instance being inserted from a view if the view is producing the right result or the wrong results.

Sebastian: Correct.

Steve: Based off of comparing that to a known expected outcome.

Sebastian: Correct.

Steve: Yup, ok.

Sebastian: And when starting out in general I recommend that you follow test case heuristics and you write your test cases. And for testing results sets, for sets in general you want to follow the zero one sum heuristic. So you start out with zero rows and the table that the view is accessing. Then make sure that the view is returning the right thing. So say you’re starting from scratch you haven’t written a view yet. The first thing you would do is you write a view that contains a select column say the sum column from the view and put that into, I tend to use temporary tables into a Temp table, and now the view isn’t there. The table that the view is accessing to be assumed that they’re ready, the view isn’t there yet so when you compile a store procedure you get a warning. And then if you run the test case you get an error since the view isn’t there. Let’s you first call it call to action to now implement the view. Now, what the test at this point is asking for is you have one column called sum, probably a little more expressive than sum. And the test cases are accessing the view as a name so the view has a name and the two things that are required at this point, so you implement the view. You don’t have any logic at his point because it has not ask for anything yet so you might just return a hard coded empty result set.

Steve: Ok, so then once you’ve done that you’ve confirmed that the zero case that if there are zero rows return that you’re getting back the empty set that you’re expecting.

Sebastian: At this particular example I guess you shouldn’t make up examples on the fly and sum as an aggregate so you will get one row back. In this case we would expect zero in the sum or the count column input is empty.

Steve: Right.

Sebastian: That is by the way another thing that test driven development really good at. It makes you think for you requirements so you catch something like this early on whereas often this particular case you want no row to be returned or you want a zero to be returned goes unthought-of and then the result is whatever it is but that might not match what the stakeholder actually requested.

Steve: Ok, so then once you’ve got the zero rows situation figured out I assume you move into the one row.

Sebastian: Correct. Zero one sum so the next test is the one row so you write a new test case. You insert one row into the base table then you expect the sum column to return the value that is in that row in the column that we are looking at.

Steve: Ok, and then, so then you would have your, what was the set method you mentioned that compares the tables?

Sebastian: Set equals table.

Steve: Ok, so a set equals table would make sure that your two tables are the same for your one row and then I assume you would just continue on with that for the more than one rows, the sum.

Sebastian: The sum rows, yes. And sum really means stick with the few. I usually recommend three or so. I rarely run into a case where you really need more than ten rows. And important to note here is unit testing is not performance testing. So you want to set your test data as minimal as possible. You want your test cases to run as fast as possible. In fact, you want to run all your test cases every time you make a change. The only way that is going to happen because we all are impatient and lazy. If executing a test case is easy and fast.

Steve: Right. We had the same recommendation in Episode 72 with Lin. We were talking about test automation for data warehouses. Now is kind of here thing is like find a representative sample, right. Don’t need the whole database, find your sample, couple of customers that you’ve worked with and start there. It sounds like you’re saying the same thing.

Sebastian: Absolutely.

Carlos: Ok, so then one of the next questions that we had was around who was using this? We’re not really looking for specific company names, what we are looking for is like what type of people, what type of roles are the people? Are they software developers, or are they DBAs, testers? Who is really using the product and getting the most value out of it?

Sebastian: Anybody who is writing SQL modules should be using it. And I don’t have company names so I couldn’t even answer that. Anybody who earns a living with writing SQL modules should test those modules. While I’m extremely biased I would say SQLT is the best way to do that.

Steve: You know, a good example is that I was working on a store procedure for a client recently where after the 1st of January it behaved differently than it did in the months that we’re near the end of the year, and after seeing the problems that we ran into there because of somebody doing month addition or subtraction. It would have been really nice to have a tSQLt test case for that so that we could have run through and done extensive testing there beyond what it originally done with. That’s something that I could think would be great use for it.

Sebastian: Yeah, that’s a boundary condition that crossing from one month into another one. There is a heuristic for that too. You want to make sure that code is behaving correctly at those boundaries and following test driven development and heuristics would have caught that problem most likely before it actually became a problem.

Carlos: Yup, ok, so if somebody is listening and they think, “Oh, I want to give this a try.” Where would you recommend the good place to get started or steps to follow or where to learn about using this? How do you jump in using tSQLt?

Sebastian: It’s called tSQLt and the website for it is tSQLt.org. It is open source so you can just go there download it and start playing with it. There is a small introduction manual if you want to call it, that too also a complete functional manual almost complete. There are few functions I haven’t gotten around documenting yet.

Carlos: Yup, ok. So even more than I think just to jump in a little further there is, you talked about, so the case, you made the case that gosh if I had unit tested this I could have avoid the problems down the line. But it’s almost a mentality shift because, you know, just use that example. I have a store procedure that needs a view. Well, I’m going to be thinking, “Ok, let’s see what my view needs to have so that I can feed this procedure.” And it sounds like I need to reassess what it is that I’m asking or what it is my requirements are. And so how do you begin that mind shift to say, “Ok”, instead of just jumping in and creating the view let me think about how I could setup this use cases or what questions I need to ask myself to get them setup. Does that make sense?

Sebastian: Yeah, that make sense. So unit testing, automated unit testing is a complete mind shift. It is hard for people to get use but once you have done that a little bit regularly. It’s not actually hard. It’s scary at first but you get use to it very quickly.

Carlos: Yes, I guess you can tell where I fall into my automated unit testing practice.

Sebastian: Yeah, just think about what does this view need to do for me then pick the thing that seems easiest for you right now to write a test for that functionality. And then write the test and then implement the functionality and then pick the next thing.

Steve: Ok. Now, I know I’ve seen at a couple of SQL Saturdays. I’ve seen Steve Jones with Red Gate presents on tSQLt and the Red Gate tools that integrate with it. Can you maybe share with us what part of that are the Red Gate products versus what part of it is the tSQLt framework?

Sebastian: Red Gate provides that easy button. tSQLt is a bunch of SQL scripts and to write your test cases you write store procedures that follow a naming convention. Actually, you need to create a what I call a test class. The reason I call it a test class is because that’s a naming convention that JUnit Framework, the first unit testing framework put in place and tSQLt is following the JUnit convention. So a test class is really a schema that has been marked by the framework as a schema that contains test cases and then to mark that test case store procedure in that schema you just need to start their name with let us test. That allows you to put other help a store procedure in that same schema. That makes it very flexible even though naming convention seems rigid at first. Now, to execute your test cases you have to run the store procedure tSQLt run all. That runs all the test cases in the database. A run takes parameters what you specify what you want to run. And that can be at times a little cumbersome that you have to actually call that store procedure. What Red Gate did is they provided a log in for Management Studio that makes that easy to execute test cases. You just have to select the test case you want to run and click the run button, and there’s a nice GUI on top of tSQLt. And I have to say Red Gate isn’t alone anymore. There are now I think three other vendors that offer similar functionality, but Red Gate was the first.

Carlos: Interesting. Now, so you’ve mentioned the documentation. Is there an example, I mean, so I think again, a knuckle dragger that I am It would be helpful for me to see some examples. Like, “Ok, here’s a view, here’s a procedure.” Let’s walk through that. I know you mentioned you had the procedures there to kind of get going. Are there good examples out there that people could reference to kind of get a handle on this?

Sebastian: Yeah, so there’s the user guide which is really the manual of the procedures and the processes. There’s also a quick start guide on tSQLt.org which is a very basic introduction into how to use the framework and the unit testing in general.

Carlos: Very good.

Steve: So for the unit testing to work, is this something that the people use and need to have, like if it’s developer sharing a database that they have to have their own individual instance or database they’re working with. Or can multiple developers work against a shared database using tSQLt.

Sebastian: As a general best practice, you want to have your own development environment. And SQL Server Developer Edition now being free there’s really no reason anymore to not have your own development environment.

Carlos: Very good point. Yeah, ok.

Sebastian: And the database code has to be, it’s not even an option anymore, has to be on source control and it’s easy to run, install your own copy of the database.

Carlos: So when you say it has to be in source control does the tSQLt do something to integrate to pull things from source control?

Sebastian: No, tSQLt is only for testing. But Red Gate has a nice plug-in, Red Gate Source Control, that even manages things like migration scripts.

Carlos: Yup, ok. So with tSQLt being an open source project. Is that something that you have other people contributing with or is it currently just you or where’s that at?

Sebastian: It started out with Dennis and me. And Dennis got lost somewhere on the way, alright, now it’s only me and every once in a while I get somebody asking if they can help out but they lend us up here too for whatever reason. So yes, I would love to have help on the project and the project is on GitHub. You can find it there if you search for tSQLt. I have been stretched a little been over the last few months so there has not been a lot of progress on tSQLt. I always doing a little bit but I haven’t actually put out a new version in a while.

Steve: Yeah, I’m seeing this, there are some issues and it looks like people are contributing some issues. 

Sebastian: for additional functionality for problems to be fixed.

Carlos: With that, part of the beauty of GitHub and being open source is that when people asked for things to be fixed they can also contribute.

Sebastian: Yes. And for a contribution to be accepted is it’s important that it is well covered with tests.

Carlos: That’s a great point since it is the SQL testing tool. It would be silly for someone to submit a change or a feature that isn’t well tested, right.

Sebastian: Correct and you will be surprised what I sometimes get.

Steve: So one thing that you have said is part of the story there was and I think you mentioned earlier, was that tSQLt or unit testing is not intended for performance work but it seems like it’s a great way when you’re doing the performance work to now if you broke anything or if you’re given the right results.

Sebastian: Correct. Anytime you have to change your code you want to be sure that you haven’t broken anything. And have you ever been in that situation where you change a little bit of code and of a store procedure because of a requirement change and other feature that you thought was totally unrelated suddenly broke?

Steve: Oh no, that never happens.

Sebastian: Unit testing is really good at showing you those scenarios while you’re still are in the code. So instead of hearing about it months later, “What did happen there? I don’t remember.” You get that feedback within a few minutes so you still know what you did just change and then you can easily understand hopefully easily as a connection figure out how to actually make it work. And that’s really one of the things behind Agile Development that we haven’t talked about yet. Unit testing is part of that Agile Development mantra. Agile development is all about reducing the lengths of your feedback loops. If something breaks I want to hear about it as soon as possible. The longer it takes, the more time passes the more the cost, call it exploit, it is actually exponential.

Carlos: The risk increases of your ability to quickly resolve that.

Sebastian: Yeah.

Steve: And it’s far cheaper to fix the bug that you find when you run your unit test than it is to test the bug that a customer reports after you released the code.

Sebastian: Correct and so far has like three 0s.

Carlos: So a lot of people like to pay lip service to that and you know particularly organizations, you know. I mean a lot of people say they’re running Agile and they really just kind of have a lot of meetings, right. What kind of numbers are you seeing from the tSQLt side, I mean, just from downloads. You’ve been working at this for a couple of years now at least from the GitHub persepective.      

Sebastian: It is still very low. I would like to add a zero to the number of people who downloaded it.

Carlos: So you need a few more people to get in there. And again just for me I think there’s still a pretty tall barrier to entry.

Sebastian: That’s the thing with unit testing in general. Test driven development in particular but even if you just do unit testing makes it a lot easier to maintain your code in the long run. You spent about 80% of your time maintaining existing code and only 20% of that writing your code. That makes sense to figure out a way to reduce the cost of maintenance. And if you have a good set of test cases it makes maintenance so much easier. It actually makes you more likely to implement a change too because you’ve learned now that it didn’t break anything. It reduces a fraction too good thing there. But anyway and to write that code of unit test that’s a lot of work so the actual writing of the code becomes more expensive. The problem there is that that cost us upfront. Now it’s hard to get into developer’s minds, it is really worst investing the time now because it will save you later. But it’s also hard to get that into the management mind because a few showed that in the old way would have been, let’s call it done, tomorrow is now going to take three or four days to be done.

Sebastian: Yes, so management needs to buy and it’s also hard to get them agree to it because it’s not going to take one day to finish the code but it’s now going to take four days. But the thing is you have the code done in one day and it breaks in a month from now and it is installed in a customer. You cannot really call it done in the first place. From my perspective code is done if it is proven to be correct. And that’s when you achieve this automated unit testing.

Carlos: Sure, and often times though when management buys in and say they want to be an Agile company or business unit. They don’t always buy in to the extended time that it may take to make some of those things happen and that’s where the challenges come up.

Sebastian: That is correct, yeah. Management has always more than one first priority.

Carlos: That is so true.

Steve: That’s right.

Carlos: So shall we go ahead and do the SQL Family?

Sebastian: Sounds good.

Steve: Alright, so Sebastian can you share with us how you first got started with SQL Server?

Sebastian: Yeah, that was the same company so when I was done with my Ph.D. in Germany. The job market had just tanked so I decided to try my luck in the US and ended up getting a job with a single phone call. That was luck. But the one thing that I told the guy in the interview was that I really like programming but I don’t want to have anything to do with databases.

Steve: Yeah, look at you now.

Sebastian: So he said, “Ok, that’s great.” And he ended up putting me in the database team.

Carlos: It must have been an accident, right, the accent, you know, it didn’t quite get across.

Sebastian: And to be fair my first assignment was C++ code there. I was working for the driver of those 128 store procedures. But then at some point, as our team member said, maybe we should ask the Ph.D. to help us with this performance issues and that’s how I got pulled into actually working on SQL Server. Somehow I learned liking it really quickly.

Steve: Very cool. Now, if you could change one thing about SQL Server, what would it be?

Sebastian: Yeah, a lot. What we really didn’t talk about today yet is testing an isolation. And to test and isolation in other languages you use things like test doubles, mark objects, so replacements of the actual object that behave in an expected way so that you can write your text around that. tSQLt provides that functionality by providing fake tables and spy procedures and fake functions and to make that work you have to go really deep into the internals of SQL Server and two things SQL was not written for. I would really love if that could be easier. By example the feature that’s most requested right now that I haven’t been able to implement completely yet is to deal with schema bound object.

Carlos: Alright, yeah, I noticed that on the request list.

Sebastian: To get a schema bound object replaced was a test double I need to first replace a schema bound object with a non-schema bound object. So after that I need I need to script out the object which we know SQL Server has that functionality built in but it doesn’t make that accessible to see for our code. I have to hand code that script thing again while there are really not a lot of mistakes or bugs in tSQLt. The ones that keep sneaking in are things like, I forgot to account for, it was a scale in the new date at DATE_TIME data type because I hadn’t use it.

Carlos: Got you.

Sebastian: And honestly I shouldn’t have to think about it because SQL Server should provide that functionality. And another thing that would be great is there would be a way to measure code coverage. Which there is one framework out there now that is doing something like code coverage based on extended events but It would like to get that a lot more fine grains in that solution.

Carlos: If you would forgive me, what would you mean by code coverage?

Sebastian: Code coverage means if I have my code and I have a bunch of test cases. I want to be able to pin point the areas in the code that have not been covered by a test.

Steve: So for instance a chunk of dead code that’s part of an IF statement that never been hit or something.

Sebastian: Part of an IF statement but likely will be hit in real life but I’ve forgotten to write a test for it. If you follow test driven development very closely that will not happen. Most people don’t and then the code coverage really becomes handy.

Steve: What is the best piece of career advice that you’ve received?

Sebastian: Keep learning.

Carlos: Yup, never stops.

Sebastian: Don’t stop learning, you know.

Carlos: Sebastian, our last question for you today, if you could have one superhero power, what would it be and why would you want it?

Sebastian: Yeah, so there was this movie ones very long time ago. It was about this Robot #5.

Steve: Short Circuit.

Sebastian: That goes back to keep learning. I love to read and I suck at it. I would really like to be able to read that fast.

Carlos: Ok, speed reader, very good. Well, Sebastian thanks so much for being on the program today.

Episode 85: Speedbumps Upgrading to 2016

Have you done your migration to SQL Server 2016 yet? Don’t worry—you aren’t alone. With all the glory of new features and improved performance, not all upgrades are equal. In this episode we talk with Javier Villegas who returns from episode 2 to talk about some of the issues he ran into on the upgrade to 2016. He stumbled on a process that normally runs for 7 minutes prior to the upgrade and then 26 hours after the upgrade. While it all worked out, he shares the issues he had and gives us a reminder that testing has not been deprecated.

 Episode Quote

Listen to Learn

  • Javier’s experience migrating to SQL Server 2016
  • Why using different production environments during migration is a good thing
  • Trace flags and cardinality estimators features and hurdles
  • Database settings and configurations in the new version

Javier on Twitter
Javier on LinkedIn
Javier’s Blog
Javier’s MVP Profile

About Javier Villegas

2016 upgradeJavier is a Microsoft MVP Data Platform and a member of the MSDN since July 15, 2008. He is also a PASS member since 2008. He works at Mediterranean Shipping Company as a DBA Manager. Javier has been dealing with SQL Server since version 6.5. He is from General Pacheco, Argentina. Besides being an SQL Server expert, Javier is a soccer fan cheering for his team the Velez Sarsfield.

Transcription: 2016 Speedbumps

Carlos: Javier, welcome to the program.

Javier: Thank you! Thank you, Carlos. Thank you, Steve for hosting me one more time.

Carlos: Yes, that’s right. Special thanks to you as being one of the first, the guinea pigs if you will. When I had this idea way back in May of 2015, maybe even before April, and went down to Buenos Aires and I said, “Hey, anybody want to record a podcast with me?” And Javier raised his hands so we’re glad we could have you back.

Javier: I remember that time, yeah.

Steve: And I know since that point in time we’ve down quite a bit to improve the process and how it all works to get the show out so we should have some good quality here. Also, didn’t you re release, re-mastered versions of some of those old episodes Carlos?

Carlos: We did. We’ve re-mastered them talking with Julien our editor and you can took a peek at them. Taking all the “uhms” and what not that I’m so consistently doing and helping balance some of that particularly in the beginning, the sound balance wasn’t all that great. Well so, Javier, our ultimate topic today is on migration to 2016. Some of your experiences with it and some of the road bumps that you kind of ran into. But first, before we kind of get into that, tell us a little bit of background about your organization and what you’re doing with SQL Server?

Javier: Ok. I am the DBA manager of a group of 4 DBAs at Mediterranean Shipping Company. What basically my team responsibilities are basically have the OLTP environment running fast, running always and running in good shape all around the world. Recently, we’re also introduce, we are also taking care of some analysis services in particular to tabular model for PowerBI which is something that we are beginning to so those are the tasks that I’m doing these days.

Carlos: Very nice, and yeah, probably from that Episode 2 I remember the story you told me that you had a data center in Chile. I don’t remember where you had to failover to but there was an earthquake there in the country and you ended up failing over. How many, there was like 2 minutes something like that you were able to failover, right?

Javier: Yeah, in a matter of a couple of minutes we made the failover from one location to another. Same in New Zealand for another earthquake and at that time we were using database mirroring. Now, the product evolved to availability groups always on and all the beauties that we know today.

Carlos: Sure. You must be quite the failover artist then to have done it again.

Javier: Yeah, you know, as every DBA we just try to avoid that to happen.

Carlos: Yeah, that’s right. To have to done it twice in two different countries, I don’t think there are too many people that can say that. 

Javier: I don’t know but the key is to once in a while do a failover test. Have a bunch of power users who help you on a Saturday morning so you can do the failover and recheck that the whole failover process works smoothly, that’s the ultimate goal. And you know, on every exercise that we do of this testing failover process we found stuff to improve basically the speed of doing this switch.      

Carlos: So ultimately our topic is on that conversion to 2016 and you made the change but it wasn’t always a smooth transition. You ran into a little bit of bumps. So why you tell us about that and some of the hurdles you had.

Javier: Yeah. Well, our main OLTP platform around the work was a mixture of SQL 2012 and 2014, right. And failover cluster instances with always on and all these things for higher availability and Disk I/O recovery. My company was part of the top program so we got involved with the bits very early in the process of the SQL 2006 release. So we were going back and forth with the practitioners with all these features that they planned to release. At some point, we have to validate our business core application with the releases that Microsoft was giving to us. We found some unexpected behavior. I mean just to give you a quick example, one of our most critical SQL shops for the business it runs every three hours and it generates some sort of a cache table to get some critical reports and the tables are used by other processes, right. As I said, that runs every three hours and normally takes 7 minutes to complete, right. When we were on this testing process suddenly those 7 minutes went into one hour.

Carlos: Wow, so what’s the problem there?

Javier: You know, the study gets even more choosy because I said, “Ok, maybe I need to do an updated statistics because this is a new environment. I just restore my databases.” So I took some time and I made an updated statistics for all the tables, and instead of one hour the end result was 26 hours. 

Steve: Wow. I’ve heard SQL Server 2016 just runs faster but apparently there are some edge cases that we need to work around. What did you end up doing with that?

Javier: Yeah. Well, you know, it was something that I spoke with the product managers at Microsoft and we raise the hand as soon as we found this because it’s unacceptable for our business, right. One day, or even more than one day just to get this table wants, this cache table wants so we send them all the call, we send them a copy of our database system. We started working together very thoroughly on this issue. Well, it turned out that our coding technique is basically using a lot of the table functions and we used table functions, not the inline table function, the regular one. Of course we don’t use scalar functions at all that’s a different discussion that we can have in another opportunity, but kind of forbidden. We don’t use it our own so we use table functions. And we use them like in a chain, like several layers of table function. One calling another and another maybe cross supply and joins with another table function, something like that, alright. The new cardinality estimator in 2016, it was just considering this like these calls to the table functions were like immediate so the estimation was unreal. Actually we’re not even doing an estimation. So the end result at that point was not to use the new cardinal estimator just turn on a trace flag to escape the CE. I mean, use all the other features for SQL 2016 that are great but temporary turn off the new CE.

Carlos: CE being cardinality estimator.

Javier: Yeah, definitely.

Carlos: So I guess the problem was then that the optimizer, right, the cardinality estimator was going in it and it was looking at those table functions because they were kind of nested if you will it didn’t pick the best plan. Or it was being too aggressive there perhaps.

Javier: Yeah. It just saying when the CE reached some certain level into these layers of table function it’s just considers them they return immediately and that was not the reality. That is why we switched from 7 minutes to one hour, right.

Steve: So then when you switched the trace flag to go back to the old cardinality estimator. Did it go back to that 7-minute runtime right away or did you have to do anything else?

Javier: Yeah, no, no, it’s just that. Also, another possibility is just keep the compatibility level for the user databases in a pen because actually this is not a new problem in 2016. This is something that is also happening in 2014. So you keep the compatibility level for your user databases in a 110. It’s like, you know, false entry to use the old or the legacy CE and we got back our 7 minutes.

Carlos: Ok. Yeah interesting and I knew they have made some changes there but I guess just like everything else you have to test it because you could be, and obviously they have use case potentially for that but that nested table functions kind of makes me “Woosh”. Get the heebeegeebees a little bit. You need to go through your code and test it in a new version.

Javier: I remember long time ago like in 2008 or something, me and my boss we were invited to the Microsoft campus to discuss about new releases of SQL etecetera. They kind of give us a presentation of table functions. Well we got in love with them and we decide to switch all our TSQL coding to that way of writing because as I said, we love it. It’s quite clean, clear and we just love it. We were just fine for years until we found this kind of hiccup. But you know the Microsoft is working quite hard on this case and others to make you through now. I know that in be next they are planning to do some other changes to the CE.

Carlos: Now are you using, are you doing anything with Linux or SQL Server in Linux? Or you guys are pure Windows?

Javier: In my production and Q&A environment we are using all Windows Server now switching to Windows Server 2016 because there are some particular features that we need. Which just to tell you in one sentence, we love the possibility of having Windows 2016 and SQL 2016 because we can do always on on machines that are not on the same Windows domain, right, that’s the main.

Carlos: Right, another feature.

Javier: Yeah. This is something that we were asking for years. Now we have it and now we are using it. But in terms of Linux, yes I have a bunch of labs using some. I believe that almost all the distributions Ubuntu, Red Hat. Now, I believe that, I have the version to run on SUSE Linux that is the ctp 1.2 Linux. That is where the CTP 1.2 that was released a few days ago. I’m validating stuff against Linux and actually I love it.

Carlos: Very cool.

Steve: With SQL Server 2016 one of the other big feature that was added then was the query store. Now, is that something that you’re making use of?

Javier: Wow, man. This is my favorite new feature in SQL 2016. This is something that you know, before. I mean, in terms of being a DBA and doing a troubleshooting on this case that you’d have something that for month. I report that always runs in let’s say 5 minutes, and suddenly one day it started running in 2-3 hours. And nobody make a change, nobody touched the code, etcetera. Those cases were extremely challenging to troubleshoot. But now with query store it’s just a simple click to have your previous or your optimum execution plan for that particular thing so, yes, I’m using it. As I said, this is my favorite feature in 2016, query store.

Steve: So for people who are new to 2016 and haven’t tried that out yet, any tips or recommendations that you would give or hints for people?

Javier: Yes. This is something that, as I said, I’m using it on my daily basic task and this is like the first version of query store, right, in SQL 2016 was the first time that we saw it. There are some things to improve like every feature. In this case the only thing that, I mean in the connect site I made this recommendation as many other SQL professional. Now query store is per database so you have to enable it, and maintain it, and queried per database. And you have all these performance information within the same database. The recommendations or the comments for Microsoft and I know that probably they should be working on this is just have the query store per instance, right, being able to store this performance information on a separate user database. This is something that I would love to see in future SQL versions.

Carlos: Right, to be able to get it out there.

Javier: Yeah, I mean beside that the current implementation is great because as I said, before it was like a blind task. And it used to take me long time to do this kind of troubleshooting when performance just change without any changes etcetera. Now it’s just, you know, go to the query store panel and I’ll just check the query plan for a particular and I’m forced to use the old one, right, it’s good.

Carlos: I know when they first, of course I was talking with Borko before it was actually released. 2016 was still kind of being baked if you will. They were working on some of the reporting and I’d admit that I haven’t got into it as much as maybe I should I have. But one of the, so the reporting and some of the charting that they were going to make available was still kind of coming together. Have they continued to make improvements and I know SP 1, or Service Pack 1, was the big focus was on the new features that being made available in the lower editions. Do they continue to make changes to that reporting, so that reporting features in the query store?

Javier: Well, yes. Honestly, I haven’t seen that very deeply. I also focusing on Service Pack 1 in other kinds of new reports like the one for always on. Those are new reports in where you can do better troubleshooting for example latency. Because you know, as I said, probably query store I believe that was the first feature that was completed before SQL 2015 RTM. That was already like long time ago. So even if it’s to be one it’s quite mature. Of course, as I said, I mean you give this to the DBAs and we ask for more, right. But in Service Pack 1 the new reporting for always on, I mean the dashboard for always on for availability groups is really improved. So those reports are now greater than, they provide you a better way of troubleshooting of always on. Before it was kind of a blind experience.

Carlos: Very cool, Borko will be very happy to hear that. PM manager we keep in touch from time to time just being, you know, from time to time. So he likes getting feedbacks, right, telling that people are enjoying the feature.

Javier: Yeah, yeah. No, I mean, Service Pack 1 was a hint. I mean something that I’m still trying to understand. It’s the first time that they bring down Enterprise features to other editions so, wow. They have these new features, the new DMBs for troubleshooting the improvement in always on is really great.

Steve: So another thing that, I know when we’re talking before the show that you mentioned was the new database settings or database setting that are scoped per database in 2016. What are you doing with those?

Javier: Yes. Well, database configuration. This is something that is one of the great features for DBAs because now you can set a bunch of configurations or settings that before you had just for the whole instance or per session. Now you can do it per database like for example specify the max DOP, the max degree of parallelism per database. This is something that is really powerful. You don’t have to use the max DOP hint or start playing with Resource Governor to specify a different max DOP. Also, there is another setting which is great, which is the legacy cardinality estimation. This is connected with the problem that I mentioned before. You can specify now per database to use the legacy cardinal estimator without specifying any trace flag or anything. You have your user database you will go to these CISCO database settings. You change the legacy cardinality estimator to “ON” and all the under that user database will use the old CE. Same for other setting like for example you can do a clear procedure cache per user database now which is something that is also good.

Steve: Ok.

Carlos: So those are all settings like inside the database properties if you will.

Javier: Yeah, from Management Studio you go to your user database. Right click “Properties” and you’re going to see them there. You have the default which is the ones that are configured for default, for all user database which of course are off just to try to don’t modify the behavior by default. Also, another one that you have which is also connected with the legacy cardinal estimation is the query optimizer hot fixes. You know that Microsoft with every service pack with every cumulative update they introduce changes to the cardinal estimator which by default they keep them disabled just to, you know, don’t change the behavior in something that is running within the same SQL version. Before this, or actually parallel you can turn on a trace flag and enable all those hot fixes for the CE. Now with database scope configuration you can enable those hot fixes in the CE per database which is great.

Steve: And so if you have an older database there or something that is not kind of going to work well with those changes you can then turn it off for that one and then turn it on for others and be in good shape.

Carlos: Yup, fine tuning.

Javier: Correct, yeah. Of course, I mean, all of these have to be carefully tested on, not in production, like that’s the advice. Don’t play with this in production but actually this is something that is easy to turn on and off.

Steve: Ok. Well, shall we move on to the SQL Family questions?

Javier: Yeah, I’m ready.

Steve: Ok, so how did you first get started using SQL Server?                       

Javier: Well, I mentioned this story in my previous episode, Episode #2. Like 20 years ago I started with the Microsoft courses with Windows Server 3.51 or NT4 and one of the elective courses that were there. I mean there were like 3 courses. One was Exchange, one was Active Directory, and the final one was SQL Server 6.5 at that time. So everybody had to choose the regular training plus one of these three. My first choice was Exchange and was full. There were no place for me so then I say, “Ok, Active Directory seems interesting.” It was also full. So they say, the SQL course has plenty of space so that was the one that was there for me. So I took and after a couple of days I say, “Oh men, this is so boring.” But then after the professional career changes and I got the opportunity to get focus on database engines at some point in one previous job that I had. It was like with multiple vendors in terms of databases and also operating system. I remember using with Sun and Oracle and Linux with MySQL and Windows with SQL. It was like a data center with all that zoo of environments. At some point I put my career into the Microsoft rail and I focus everything on Microsoft especially on SQL Server so that is how I got involved to that.

Steve: Do you ever wonder how different your career may have been had that Exchange class not been full?

Javier: I haven’t think about that too much.

Steve: Ok.

Javier: I actually, you know, I think that being a DBA is something that you have to love and you have to feel it because we all are kind of particular sort of people.

Steve: Yup.

Javier: So you have to love it and at some point long time ago I said, “Ok, this is what I love.” So let’s go ahead and try to do this and do it well.

Carlos: If you could change one thing about SQL Server would it be?

Javier: Wow. Well, I mentioned a couple of things regarding query store. But something that I will love to, I mean, first of all if you ask me this question like 6 months ago I will say multi platform. I have that thing already but now that I have, you know, this I will say make the failover process on availability group even easier in terms of being able to replicate to the secondary logins, the SQL and all the things that are not use of database in terms of settings, everything because right now you do an availability group failover for disaster recovery purposes. Even if it is a test or if it is real you have to do a lot of manual task to fix the logins you know. I have like in all the production environments that I have around the world I have like state-of-the-art Power Shell script that try to take care of synchronizing the logins plus the SQL plus the SSIS packages, etcetera. I would love to hit the failover process and just forget about it.

Carlos: Alright. What is the best piece of career advice that you’ve received?

Javier: Well, few years back I received this piece of advice which I always remember. I used it like on my everyday task since I’ve received it. And it is, “Try to be focused”. Just be focused. You know, it could be a bunch of things but being focused on what you’re doing and even if you’re, I mean, ultimately that’s the thing that will allow you to accomplish a task in a good way especially when you’re under pressure. So being focused on what you’re doing is a great piece of advice that I got.

Steve: Very good advice.

Javier: Yeah.

Carlos: Javier, our last question for you today. If you could have one superhero power what would it be and why do you want it?

Javier:  Wow, so this is something that I’ve been thinking a lot and probably it’s not going to be related or totally related to technical thing. And maybe it’s not a real power but this is something that I would really love to have. First of all have an additional arm and hand. And let me complete it, also, being able to, I mean, if you don’t have to sleep because of a reason you don’t feel it so. I mean, don’t have the necessity of sleep. This is not because I have to work at night, etcetera. These two things, the additional arm and hand plus not having the necessity to sleep is to being able to handle my two little boys. Yeah, one is 1-year old and the other one is almost five so that is why I need these superpowers.

Steve: Yes, having two sons myself. My wife and I having two sons definitely can understand that. Although, mine are a little bit older and it does get a little bit easier as they get older.  

Javier: Really. That is something good because now I know that if I wait it’s going to be easier.

Steve: Yup. But you’ll come across different things like the first time they dent your car.

Javier: Yeah. I’m far away from that.

Steve: Yes.                       

Carlos: Well, Javier, thank you so much for joining us again. Oh, sorry.

Javier: No, no, I just wanted to say that it’s not totally technically what I just said but it’s real.

Carlos: That’s right. Well, Javier, thank you so much for joining us on the program today.

Javier: It’s always a pleasure to speak with you, Carlos and Steve, so thank you for having me. And we’ll look forward to be again with you in the future.