Episode 105: When is enough, enough?

How can you tell when a change is enough of a change?  How do you know you didn’t make too big of a change?  Steve and I discuss some thoughts around what you might look for when you make certain changes.  Some of the components we cover include Max degree of parallelism, cxpackets, virtual log files, the number of log files for a database, backup retention and memory.

Episode Quotes

“Then I think on the cost threshold for parallelism, again it’s going to take a little bit of knowing your system.”

“What it really depends on oftentimes in most businesses is how much money you are losing for every minute that that database is down.”

“If your system is performing great no reason to even talk about more memory unless it’s growing and you’re planning for the future.”

Listen to Learn

01:37 About the Companero Conference
02:54 Max degreee of parallelism, knowing the sweet spot
06:28 Max degree of paralellism , number of cores
10:08 CXPACKETS Waits
12:05 Cost threshold for parallelism
14:54 Virtual Log Files (VLF), growth and issues
18:53 What makes VLFs a bad thing?
23:23 How do you know if you have enough, not enough or too many?
29:00  Number of log files for a database
30:39 Backup retention periods and scenarios
37:35 More on memory
41:50  Page Life Expectancy

Transcription: When is enough, enough?

Carlos: Companeros welcome to Episode 105. It’s good to have you on the program again today. Steve, how is it going?

Steve: It’s good. Today’s topic is on when is enough enough. So Carlos, when is enough enough.

Carlos: Yes, that’s right, so this question was asked to me personally during a session as we were talking a little bit about parallelism and the change made to the cost resolve for parallelism. So the question was, ok now that you’ve made this change how do you know that you’ve raised it too high and so I thought we could talk a little bit about kind of extension of this idea from base lining which we’ve talked a little bit about in previous episodes, kind of best practices and now as we start putting in some of these things some of the sweet spots if you will and how we know that we’re in that sweet spot things to look at. So that’s kind of our topic for today.

Steve: Alright, so do we have any companero shout outs to mention?

Carlos: We do. We want to give a companero shout out to Daniel Temensken. He is a PFE at Microsoft. Daniel says, “Thanks for what you do.” He tries to plug-in when possible for us to keep up the good work, and so Daniel we’re thankful you Microsoft guys are keeping an eye on us and keeping us in line.

Steve: I don’t know about the keeping us in line I haven’t heard any feedback on that side yet. We’ll see.

Carlos: It’s only good and I don’t filter all of that to you, only the bad stuff. Ok, we’re still planning for the Companero Conference in October and one of our speakers, Mindy, she’s been on the podcast earlier and while we were chatting with her she gave us a little information about herself and what she’s going to be presenting at the conference.

Steve: Yes, I think one of the things I like about the conference is the whole single track concept and I’m going to get to see all of the presenters this time. Well, I’m not going to see me unless there is a mirror. But I’ll get to see all the sessions and I’m looking forward to seeing Mindy’s session as well.

Carlos: Yeah, that’s right. I think it’s going to be good. Again, just the ability to be able to review kind of see things from a different perspective. I think it’s going to set up a lot of questions for people to continue their learning. So let’s see as we get into the episode today. The show notes for today’s episode will be at sqldatapartners.com/enough.

Steve: Or at sqldatapartners.com/105 for the episode number.

Carlos: Yeah. I guess let’s go ahead and jump into this conversation. We have a couple of items or setting if you will that we’ll talk about. The first one, let’s go ahead and kind of tackle the max degree of parallelism. How do we know when we’re in a sweet spot?

Steve: Alright, time out for a second.

Carlos: Sure.

Steve: I’m getting a bunch of noise coming from somewhere. Hold on I need to close the door or something.

Carlos: Sure, sure.

Steve: Alright. It’s my neighbor has a leaf blower. Can you hear any background noise because of that?

Carlos: Yeah, I was going to say I could not hear it but I was talking at the same time, but nothing really stood out.

Steve: Ok, it sounds like he has moved around the other side of his house now. Sorry about that, so let’s jump back into the max degree of parallelism.

Carlos: Alright, I think this idea so I guess we’ll start with kind of the best practice and that is. What we’re looking at is. I guess we should start this conversation with all of these answers are going to start with it depends, right?

Steve: Yes, it does.

Carlos: And so we’re going to talk about some of these things in kind of lose general generic terms but that doesn’t mean you should necessarily adopt them as gospel. Obviously, your individual scenarios or your environments are going to
dictate some of this and so we’ll try to address a handful of them but we won’t be able to address every scenario.

Steve: Yup, but as far as max degree of parallelism, I don’t like to talk about that alone. I like to always talk about cost threshold for parallelism at the same time because I think the two of these goes so hand in hand. I think that there is a lot of confusion and just sort of downright bad defaults on both of this. And I guess both of these originated somewhere in the 90’s back SQL Server 65 or 7 or somewhere around there. They may have made a lot of sense 25 years ago. But the defaults you have on those today are not so great and I think that it comes down to the fact that hardware has evolved a whole lot since then.

Carlos: Yeah, Adam Machanic has his great line there where he says that these only make sense if you jump to a time machine and go back to guy who developed this at Microsoft, and use his computer to execute your queries.

Steve: Right, right. And unfortunately or maybe fortunately in this case we don’t have a time machine. So the default for the max degree of parallelism is zero which means unlimited parallelism. The default for the cost threshold for parallelism is five which means any query that has an estimated plan of greater than five it will be considered as a parallel query.

Carlos: Sure. The reason for max degree of parallelism is zero is because. As you begin to install SQL Server and you could think about it from their perspective, they don’t know how many CPUs you have, right? And I guess dynamically trying to figure that out and all the different scenarios that you might have as far as how much stuff you have on the box with SQL Server which I think is listening to a degree if you’re probably listening to the show. SQL Servers by itself, again there are going to be exceptions to that and so you’re going to see that conundrum that they are in there.

Steve: Yup, so the max degree of parallelism there is a lot of different opinions out there on what it should be set to. But most of them fortunately are kind of in the same range and typically what I see on this and what I like is close to, and I say close to because there is a lot of opinions there, but close to a max setting of 8. Or a max setting of the maximum numbers of cores you have or the max number of cores per numa node, and whichever number is smaller out of those.

Carlos: Right, so there is a little bit of Math involved there, right? It’s all the variations that are going to play into that. So taking into example that number 8, right, if you only have four CPUs, right you’re number is going to be four.

Steve: Right, but if you have four CPUs that are split amongst two numa nodes which usually numa is not that common with that few of CPUs. But if it was you would want to set it to two in that case. Sort of part of it around there, I mean usually with smaller number of CPUs like less than six or eight CPUs you’re probably going with the number of CPUs 99% of the time unless you have numa enabled. But when you get into SQL Servers with many CPUs this is where it can get really interesting. So if you’ve got a server with 50 CPUs, or 30, or 80 CPUs or even more than that. What could happen here is that a small query that probably shouldn’t go parallel could go parallel and could be run on all of those CPUs. I need to start saying this differently. They may be running all those cores because generally it’s not CPUs. It is multiple cores per CPU. So be run amongst all those cores so and that can be really inefficient. And generally what I see even on
servers with lots of cores is that 8 is a good setting. I mean it’s what a lot of people are recommending. It’s kind of the best practice to use there as a maximum because what you get into is when you have more than 8 there is a lot more work involved in bringing back all of the results from each of those different CPUs or different cores doing the work. There is a lot of work involved in sort of collating the results and waiting and if one of them takes a little bit longer there’s maybe a lot of waste going on there. You’ll see that show up because you may have a lot of CXPACKET Waits which maybe we should take a second to talk about that because CXPACKET is one of those wait types that is oftentimes there is a lot of misinformation out there around it. Some people say, “Yeah, I don’t have to worry about about CXPACKET Waits that’s just means that parallelism is occurring.” For the most part that’s probably true most of the time but you can get into a point where there are things that will cause excessive CXPACKET wait, one of them being incorrect settings on your parallelism. And when I say on your parallelism it’s really on your max degree of parallelism setting and your cost threshold for parallelism setting because those kind of go hand in hand there.

Carlos: I guess if we’re going to jump here so that question is we’ve made the change. How have you decided to make your change? You’ve now made it, how do you know, like how can you determine that you’re in a better place. I think there are a couple of components there. One I think the overall percentage of waits because CXPACKET will probably continue to be in there but you should see other waits to kind of float up. There should be a better mix if that’s the right word if for example CXPACKET waits was taken up 99% of all the waits. That should float down a little bit.

Steve: Yup, and usually what I look at there is as if more than maybe 50% of your waits is based on CXPACKET that maybe there is something there you should take a look at. If it’s more than 90 there’s probably there you should take a look at. If it’s up to 99% then there certainly something you should be looking at.

Carlos: Then I think on the cost threshold for parallelism, again it’s going to take a little bit of knowing your system but you just want to go and take a peek at, which is easier said than done but kind of getting in and knowing those queries what the big queries are and ensuring that they’re still parallelizing, right? And then do you feel like there are queries that are taking up lots of CPU that might benefit from parallelism and then again that’s kind of one of kind of decision as to whether try to tweak that or not.

Steve: Yup, and I think that there is kind of a balance there because you don’t want every query to go parallel. But the flip side of that is you don’t want every query to be single threaded either. And what the cost threshold for parallelism is really doing is it’s saying at what point, at how expensive of a cost are we going to decide that parallel processing in that query maybe better, and the default for that is 5. And if you look at most queries that you are running today I would assume that most people that are running queries or looking at queries for performance, they’re going to have a cost of more than 5. There sure there are some quick ones that don’t, and what that means is most of those queries are going to be processed in parallel. And that may not be a bad thing depending on your server but there are some queries that are low cost but they are greater than 5, so maybe they have a cost somewhere between 5 and 50 or 5 and 60, that they would probably run more efficiently if they weren’t ran in parallel.

Carlos: Well that’s the tricky thing I mean even though they exceed that threshold, parallelism is available to them but that doesn’t mean they will always run in parallel.

Steve: That’s right because you’ll end up with two plans. You will have a parallel plan and a non-parallel plan and with that the query optimizer will decide which one is better. I mean oftentimes from my experience it will go with the parallel plan at that point but not always.

Carlos: Got you. Yeah, again that’s another tough one. The answer there is that you’re
going to have to know a little more about your, watch your environment and see how the changed that you’ve made, how it’s affecting your numbers and then even drilling down to the query level if you will to determine if more changes are needed. So from there we’re going to jump into log files.

Steve: Alright, so the thing that usually comes up around the conversation of log files is virtual log files or VLF. And just a little bit of background before we jump into the enough is enough part of it but the way VLFs work is that as your log file grows, every time the log file grows additional VLFs are created or these are segments inside of that log file that are broken up that can be used individually one at a time. So when SQL Server is writing into the log and it always does this sequentially it will pick a next available VLF chunk that it can write to. It will fill that up and then move on to the next one, and the next one, and the next one after that. And those stay used, well if you’re in full recovery model, they will stay in used until you have a log backup. Now, if you’re in simple recovery model they will stay in used until all of the transactions that are in that like they are using that VLF until all the transactions have completed.

Carlos: And then that can be marked as “deleted” or being able to be over it.

Steve: Yeah, available for reuse basically.

Carlos: Available for reuse that’s a better word.

Steve: So what happens though and this part varies a little bit between different versions of SQL Server. SQL Server 2014 and 2016 did a little bit better job in how virtual log files grow but it used to be that when your log file grew if it was smaller than a certain size that growth would have four virtual log files associated with it. I think that size was around 256MB. Now, if it was between like 256MB and a gig you ended up with 8 virtual log files. And then if the growth was greater than a gig you ended up with 16 virtual log files. So with SQL Server trying to guess based off the size of that file growth how it could chunk that up appropriately so that you get sizes that would be more reasonable to use. And then with SQL Server 2016 and 2014 there was some changes around that so that when log files grow, with the smaller growth sizes you would oftentimes only get one VLF rather than several VLFs at that point. But the problem that you run into is that a lot of people or lot of databases have some growth settings initially and a lot of the defaults would either grow by 1MB or 1% with a starting size of 10MB. And as it grew you would end up with all kinds of really tiny virtual log files. And what that meant is that if you have a big transaction it would have to span multiple of these VLF files that were really tiny.

Carlos: Sure, and again kind of more work because it had to deal with multiple little files to get in and out.

Steve: Yup, I mean depending on how database was created it will oftentimes if you are an accidental DBA you don’t know about VLFs you might never have checked this but I’ve seen some that have had over 100,000 VLFs in a single log file. But why is that bad? Part of understanding when enough is enough is to know well what makes that a bad thing? I mean it’s just a lot of chunks in a big file. But what makes that bad is a couple of things. One is that when a transaction runs it needs to use multiple of those VLF chunks in order to be able to write to the log, and with big transactions you got to have multiple of them in use which may make it harder for the SQL engine to be able to write everything it needs to happen there. But the flip side of that and this is the real killer is that when you’ve got
that many and you try and restore your database SQL Server has to go through and recreate all of those VLF chunks when you do the restore. So part of restoring the database is allocates the data file, allocates the log file, and while it’s allocated in the log file it’s writing all those chunks. And I’ve seen restores of databases that took 8-10 hours because the VLF count, and then in the same database after reducing the VLF to a more reasonable number took an hour to do the restore.

Carlos: Wow! You know, obviously disk speed and all of that comes into play as well, right?

Steve: Oh yeah, and that’s just the same database in the same disk doing the comparison there. I’ve seen it be 8-10 times long because of a lot of VLFs.

Carlos: That’s an interesting point because normally we talk about performance, like we’re talking about application performance. Like my user is trying to get information out of the system, right, that scenario. But in this case one of the bigger performance killers is like an RTO perspective. I now need to restore that thing, “Well, I’m not going to be able to get the performance to meet the expectations of my users”, and that could be a problem.

Steve: Yup, and that’s one of those. I always try and think about that as if the system is down and you have upper management looking over your shoulder or calling you on the phone continuously saying, “Is it done yet?” And you’re just there twiddling your thumbs waiting for the log file allocate and not being able to tell them. I mean you’re thinking, “Oh, it’s got to be done soon.” But it could 6 or 7 or 8 hours before it’s done. I think that’s one of those things that misunderstanding of the VLFs there could lead to what people end up referring to as an RGE or a Resume Generating Event where you oftentimes, if you tell management you got a 2-hour recovery time but it turns out to be 10, that may be the end of your job depending on where you work.

Carlos: Sure. Now I hope that they will give them a little bit of leeway as long as they can get backupped. Now if they can’t get backupped that’s a different story. But yeah, that would be a rough place to be if they were that tight with the leash there.

Steve: And what it really depends on oftentimes in most businesses is how much money you are losing for every minute that that database is down.

Carlos: Exactly, that’s right.

Steve: I remember years ago when I worked at Amazon.com one of the things there that they measure for any outage was how much money did we lose. And if that money is a few hundred dollars it could be very different than if it was a few hundred dollars.
Carlos: Right. Yeah, I know that’s true. I think it helps put things in perspective. And again that kind of goes back to the culture of recognizing the value of things being up and then hopefully if that’s the window and you’re pricing things that way which I think again as administrators we could probably do a better job of saying, “Hey, what, you know what…” And I guess I’ll use the ChannelAdvisor guys those have access to selling things outside is a little bit easier. Customers purchasing products that’s easier to tally that downside or the cost there. But to be able to calculate that and say, “Hey, look you know what guys if we can’t do this then we’re going to lose X number of dollars. It’s going to cost Y to put it in.” The ROI does make sense at that point kind of a thing.

Steve: So then with virtual log files, how do you know if you have enough, not enough or too many?

Carlos: Yeah, that’s a good question. I was thinking about this and you put it on the list. I mean, so in my mind and luckily right, as knuckle dragging Neanderthal I haven’t had too many experiences having problem with this, Maybe at least that I have recognized. I think as long as I have a more consistent size that’s kind of where I feel better about things. What about you, Steve, when do you think enough is enough there?

Steve: Well, ok so really what it comes down to is how long does it take you to restore that database. I mean, that’s the key thing that I look at on VLFs. So if you have an environment where you’re doing a regular restore from production to a test server or a development server, that’s a great way to know. But if you don’t have that hopefully you’re in an environment where you test your backups. And if you have a backup that takes you let’s say an hour to run but 6 hours, assuming it’s similar hardware then that could be an indication that you might have a VLF problem there. However, that alone is not the only indicator. I have a script that I created and you can get to that on stevestedman.com/vlf, and it does the DBCC LOGINFO command and then puts that into a temp table and then does some sort of visualization with sort of a character based bar chart in the query output window.

Carlos: In the result, you could actually see the size of the individual files and kind of gives you an indicator as to how big they are.

Steve: Yup, yup, and with that, and again there’s a lot of opinions out there, it always comes to it depends. But my rule of thumb is that any time it’s over 1,000 VLF files in any one database or any one log file associated with that database that’s something that usually I want to deal with that right away. Anytime it’s over a couple of hundred or maybe 300 VLFs in a single log file that’s something that I like to deal with but it’s not super urgent. And just keeping it somewhere in that range. I think opinions will vary but I think most people who have experience with VLFs would agree that more than 1,000 VLFs can be an issue. And that many will also agree that more than 500 is something that wants attention as well.

Carlos: Right, and I think that’s a great point is that you have a metric to go back on and that is your restore time, right? So that can be your benchmark. That can be your feel good. You know now obviously there’s only so much performance you can squeeze out of that, right? It’s not going to go from 10 hours to 5 minutes I don’t think.

Steve: No. But it may go from 10 hours to 1 hour, and that could make the difference between staying up all night to deal with an issue or at least getting some sleep that night. Now the flip side of that is you don’t want to have too few of VLFs either. I mean, if you have too few like let’s say you only have 16 VLFs in your whole system. That would mean you only had 16 chunks of the log file, and when of them, depending on how things cross over between the VLFs that they would me marked as in use and they would stay as in use until all transactions touching them were complete or until they have been backupped. It’s kind of a balance there.

Carlos: And it also depends on the size, right? I mean if I have a new database, even like your DBA database that you have your own internal stuff in, 16 might be fine.

Steve: Oh yeah, absolutely.

Carlos: Then I have the production system with all my transactions on it that might be a different story. And again, I guess, I lean a little back on that because if I’m growing them in equal chunks, if I grow the log in equal chunks, then I guess I’m kind of trusting a little bit that they are going to grow in the best way, and the equal sizes they have equal number of VLFs so a little protection there and the database kind of indicating how many VLFs there will be for the size that I’ve specified there.

Steve: Right, but I think part of that comes back too to having your log size large enough that it doesn’t have to auto grow over time.

Carlos: Yeah, great point. That’s right.

Steve: The auto grow is kind there as in case of emergency this is what we want to grow but hopefully we’ve got that log file big enough that it’s not regularly growing. So do we want to talk about how we fix those?

Carlos: No, I think we’ve kind of mentioned them in different places or we can comeback on them, I mean, even in your performance tip. I think we’ll say that on different episode.

Steve: Ok.

Carlos: The only thing there on log files is so number of log files. We talk a little bit about sometimes dividing up or creating multiple data files. TempDB is kind of the classic example although it can be use in our other databases as well. But what do you think, number of log files?

Steve: It’s simple, one per database. The reasoning behind that is that log files are written to sequentially and whether you have one or you have 20 it’s still only ever be writing into one log file at a time.

Carlos: Yeah, exactly. So that idea of being able to use thread, parallelize, or how you want to think of that, being able to use each of those files without having to wait on something else won’t apply to the log.

Steve: Right, and I think the misconception there is that if I create two or three or four log files that it will be load balance between them as transactions are written but it doesn’t work that way. It just starts on one and uses until there’s no longer available space on it and we go to the second one if there is a second one. And it really just doesn’t make any sense especially if they are on the same drive. It really just doesn’t make any sense to have multiple files. I don’t believe I have ever come across a use case with SQL Server where I would have recommended multiple log files.

Carlos: Right, interesting.

Steve: So enough is enough, one is enough, two is too many on the number of log files for a database.

Carlos: That’s right. So another I want to go to and that is backup retention periods. We talked about, so ultimately we get into a scenario where our backups might be taking a little bit longer because we have two much, well I guess one history but I guess we’re using the word retention is how long we should keep them for?

Steve: Right, and then there is this sort of pack rat view of that which we keep them forever. And then there is the we’re tight on disk space view which means we keep them for as little as possible.

Carlos: Or just delete the ones that I don’t need so I can make room for the ones that I do need.

Steve: Right, right, so hopefully there is some balance in the middle there that can find. I think that from the DBA perspective, I mean I would like to have as many around as I possibly can because when something goes wrong that you’ll need a backup to recover from for instance corruption, oftentimes people don’t catch it immediately that day. It might be even a week or depending on your monitoring longer than that before you know that there is something wrong that you might need to go pull something out of the backup. Now, usually if you need to do a full recovery from a backup and you’re just going to abort the current database, usually you know those events pretty quick. But it’s the type of event where you realized something was missing, maybe some rows out of a table were deleted three weeks ago and you really need to restore that database and just pull those rows back in. It’s hard to know exactly what that retention should be.

Carlos: Right. I’ve worked for a couple of .coms and another scenario is not just the data but it’s either a data change or the table of table change, right? So a column gets added, a column gets removed, whatever and then all of a sudden because the reality is this change control process maybe got side step or whatever because it’s not good history. Nobody can remember exactly when the change got made and now all of a sudden it’s affecting things, so that ability to be able to go back and restore one from a couple of weeks ago and say, “Well, it was either this way or not this way as of this state.” I feel like been able to at least get things off of my plate by being able to provide that information.

Steve: I think one of the things that comes down to there on the backups and how many is enough is sort of a granularity based on how old the backups are. So for instance for me if I had unlimited space or a lot more space than I ever use for database backups, I probably wouldn’t keep every single backup. And most of the backup solutions that you have built in with SQL Server or other scripts available out there, they don’t really consider the granularity of backups over time, meaning you have a small retention window. After X number of days all backups are going to be deleted, or maybe after X number of days all your full backups are deleted, after different period your differential backups are deleted and after different period log backups are deleted. Hopefully you keep your full backups around the longest because you can’t use the other two types without those. But one of the things that I’ve come across when dealing with corruption is somebody is able to discover a corruption and it’s been in their system for months. And I’ll ask, “Well, do you have a backup from when before the corruption occurred?” And then the answer is oftentimes “No”. But sometimes somebody else will say, “I’ve got a backup from 18 months ago.” Will that help that all? And sometimes depending on how fast your data is growing or where the corruption occurs and 18 month old backup might be nice to at least see what was there or compare to what was there on older data on the database. So one of the things I like to do in a perfect scenario is maybe, like let’s say you’ve got enough space for one month of full backups but rather than keeping around a month of full backups taken let’s say every single day, which most people don’t have the space for that. But let’s say you did I’d rather keep around a week of full backups and then second week would have maybe backups from every other day and then the 2nd to 4th week or the 3rd and 4th week you might have backups once a week and then your retention period beyond that would be maybe a single full backup once a month that you keep around.

Carlos: Something like that, right.

Steve: Yeah, and I know that’s a lot harder to manage, and it might be as simple as you just have a job that once a month it takes one of your backups and copies it off to different network location.

Carlos: Right, we used to do that for auditing like the end of the year backup, you know, database for auditing purposes. So that auditors would come April, we have to keep this thing around for a time.

Steve: And my experience has been working with different areas of management is that oftentimes they don’t always understand exactly what it means when you talk about the backup retention, and when you explain to them that if we have a 2-week backup retention that means we have zero backups that are older than two weeks. Oftentimes that creates a bit of fear there that will lead to possible solutions.

Carlos: And of course one of the classic entrances into the cloud because nobody wants to go out and buy more disk space for just backups. The disk space has become very commoditized, very cheap in the cloud and so being able to store that information for a little bit longer. That’s kind of a lot of people’s first taste with getting a cloud technologies into their environment.

Steve: Yup, and I think that backups are great way to start with it.

Carlos: Well, let’s see we have a couple of others. I think I want to skip to TempDB. Do you want to touch CPU and memory? Or you want to call this an episode.

Steve: Yeah, I think it would be good. I mean you’re saying we want to skip TempDB. Yeah, because enough is enough on CPU and memory, I’d always take more.

Carlos: Yeah, that’s right. So ok, let’s hit on memory then just for a second. Yes, enough is enough, that’s a great question for that and there is this saying I hear somewhere that more memory will cover up a multitude of sins in the coding and all these issues that you could have in the database.

Steve: Yeah, a great way to fix I/O problems is with more memory.

Carlos: Yeah, exactly that’s right and the database just lives in the memory then no I/O problems.

Steve: So interesting on that I worked on a SQL Server on a few years ago where it was having some performance problems and it had, if I remember correctly, 64GB of RAM and they increased the memory from 64GB to 512GB of RAM.

Carlos: Oh, wow, so that’s substantial.

Steve: Yeah, very substantial.

Carlos: Exactly a license leap I think. I feel like 128GB is the.

Steve: We’re already on Enterprise Edition. Yeah, but that kind of a jump and at that point in time I think the cost for the memory was somewhere around $10,000 +/-. But that basically got rid of all the I/O issues that we were having there. Well, it still had a few I/O issues when you restarted the instance because nothing was cached in RAM. But once the instance is up and running for a bit it had so much stuff cached in RAM. I mean it was so much faster because it never had to go to disk to get anything. Of course it had to go to disk when there are rights but most of the performance issues around reads on this database, and it just took care of all the performance issues at that point. That was probably 3 or 4 years that continued to run without any significant performance issues simply by adding that much memory. Now, the side was effect was it also made it so that some of the developers not have to worry about performance tuning. So that when the database eventually grew to the point that that wasn’t enough memory for it, well they may have had more difficult performance issues to deal with at that point.

Carlos: Sure, and then it all comes down crashing down at that point. You’re probably talking about another system to go to the next level of memory at that point because computers are getting more robust. You know, once you start talking about terabytes of memory those are different systems.

Steve: Yup absolutely, so the question on how much enough is enough. Well, on that specific system where we set to 512GB of RAM. The thing I noticed that when it was normally running, for probably the first year that that was running it never exceeded about 400GB of memory used.

Carlos: Oh, that’s interesting because that was probably the size of the database.

Steve: Yeah, when on that one when everything was cached up and whatever the size of the database plus Temp tables and everything it was using, I mean it really didn’t exceed 400GB. But then a few years later it grew and it eventually got up and hit that limit but what that told me, when sort of it had that flat line or right around 400GB just sort of it, was that we perhaps bought too much memory and I kind of bite my tongue as I say that because it’s hard to say too much memory. But the fact that we never used more than 400GB indicated that if we had put 400GB of RAM in there that would have been enough.

Carlos: Sure, the application would have worked just as fine.

Steve: At that point in time.

Carlos: Yup, at that point. Yeah, so how do you know when enough is enough? I mean, obviously I think there is the indicator that people have lots of different thoughts about but the Page Life Expectancy (PLE) which I think has kind of been kicked around a little bit. I think for better or for worst somebody wrote a whitepaper at Microsoft to kind of came up with a recommendation that was quickly adapted as the standard. So in environments where we don’t have 512GB of memory, how do we know when enough is enough?

Steve: Yeah, and I think that comes down to sort of balancing the Page Life Expectancy with the page faults and knowing when something has to be bought in from disk versus when it’s able to be read from memory and looking at the overall performance. I mean if your system is performing great no reason to even talk about more memory at that point unless it’s growing and you’re planning for the future. But if you’re having performance issues there, whatever it is, whether you’re doubling from 4GB to 8GB, or doubling from 32GB to 64GB. Memory is oftentimes a cheap way compared to alternatives to improve that performance. So one of the things I like to look at because you brought the Page Life Expectancy was to watch how that Page Life Expectancy grows over time. One of the things that if you chart it over time and how it’s growing. If you’re running a regular CheckDB job, oftentimes that will skew your numbers because when CheckDB runs it of course has to bring the entire database in the memory bit by bit. And oftentimes when that happens you’ll end up pushing a lot of pages out of memory and it will skew the numbers on your chart there. But if you weren’t running CheckDB, how long does your Page Life Expectancy continue to grow? And if you chart that you could can sort of see pattern where it will grow and grow and grow and then at some event happens. And that event might be CheckDB or it might be nightly ETL process or it might be some job that runs that has to pull in a whole lot of data on a regular basis. But if it continues to grow and grow and grow throughout the day until you hit a few of these events that tells me that most of the time you’ve got enough memory and it’s only those certain events that you hit that would have prevented it from being read from disk if that data was cached in memory. And if those things are happening in the middle of the night or they are not impacting the actual business at that point. Yeah, no big deal, I wouldn’t worry about it. But if it’s the kind of thing where it’s an ETL that kicks off at midnight and it runs through until 10AM and it is impacting business in the morning well you may want to consider more memory to help with that. Let me just finish one thing on that first. So that being said though, you would want to understand that it’s definitely a memory constraint before throwing memory at it. Because I’ve seen people throw memory at long running ETL jobs and then they found out that it has no improvement because the bottleneck is not the memory, it’s something else in the system.

Carlos: I guess I’ll kind of go along with that a little bit in to take a peek at, obviously you don’t want to take a look at memory consumption, but the thing that I found out helpful is also just taking a look at the chattiness of the disk. You know, one of the things that I have found is that databases will kind of get grouped together and you have one database that’s kind of like the mission critical or very important. It’s rising in this rank, maybe not mission critical but it is becoming more and more important. But then you’ve talked on these other databases that are kind of one offs or not as “important” and then you found out that they are the ones that kind of being chatty, or they are taking up more space than they should. Those are the situations where for the system that you’re most interested in maybe you have enough and it’s just a matter of moving that lesser database of somewhere else so that the system that you care about is not impacted.

Steve: Yup, that’s a great example and that’s a great reason to have different instances running in different VMs, and that you could constraint some of those less important databases to not chew up the memory that way.

Carlos: Right. Ok, so I think that’s going to be our episode for today – Enough is Enough.

Steve: Enough is enough.

Carlos: Yes. Do you agree with our list? Give us your thoughts, let us know. You can leave you comments on social media or the website at sqldatapartners.com/enough or on sqldatapartners.com/105.

Steve: Oh, I just totally tripped there. Didn’t I?

Carlos: That’s fine.

Steve: Sorry, Julien. I’m making this hard for you to edit today.

Carlos: So let us know if you agree with our list. If you want to leave us a comment, you can do so on social media or you can do so at sqldatapartners.com/enough.

Steve: Sorry, I am just blowing it today because I was thinking about our LinkedIn and Twitter at the end. Let’s do one more take on that because I just spaced. Oh, sorry.

Carlos: No, that’s fine. I guess is there anything else we need to include before we end or do we want to wrap up there as well?

Steve: No, I think we could just wrap up there because we’ve kind of already included everything.

Carlos: Ok, sounds good. So that’s our list, let us know if you agree. If you have other comments you can reach out to us on social media or you can leave as a comment at sqldatapartners.com/enough.

Steve: Or at sqldatapartners.com/105.

Carlos: You can always connect with us on LinkedIn. We love hearing your comments and connecting with more of you. You can reach me I’m @carloslchacon.

Steve: And you can get me on LinkedIn @stevestedman, and we’ll see you on the SQL trail.

Episode 104: Keeping up with Technology

Do you have any experience with [Insert random technology]?  Your heart starts to race and your palms get a little sweaty.  You don’t want to say no–we’re tech folks–we know stuff, but there are so many new things to learn!  How are you supposed to keep up with it all? In this episode, we chat with Eugene Meidinger about his thoughts on keeping up and his ideas on the most important learning components.

Episode Quotes

“Keeping up with technology itself, like it’s impossible.”

“One of the important things is having awareness on what the problem is and what the challenges are.”

“One of the things that we’re afraid of is our skills decaying.”

Listen to Learn

01:08 How do you keep up with technology?
01:43 Eugene’s points on keeping up with technology
05:20 People who keep up with technology
06:13 How to stay relevant when it seems impossible to keep up with technology?
07:28 Generalization and specialization
13:03 Developing mastery and expertise
15:40 Steve’s experience in teaching a DBA class at a university
17:04 Generalization examples, job interview process
18:14 Rich mental model
20:25 Analogy of keeping up with technology as radioactive decay
23:00 Three things to have a longer “half life” with IT knowledge
26:30 Big Data or Pokémon site
29:20 Things that last: People Skills
30:31 The idea of having a periodic table of skills
31:30 Understanding theory, fundamentals and internals
35:03 Discussion summary
37:03 SQL Family questions

Compañero Conference
How the SQL CAT team monitors databases on Linux
Big Data or Pokémon?
Eugene on Twitter
Eugene on LinkedIn
Eugene’s Blog

About Eugene Meidinger

Starting out as an accidental DBA and developer, Eugene Meidinger now focuses primarily on BI consulting. He has been working with SQL Server for 5 years now, and is certified in Querying and Administering SQL Server 2012. He is a Pluralsight author on Power BI and also co-leads the Pittsburgh Power BI user group.


Transcript: How Do You Keep Up With Technology?

Carlos: Eugene, welcome to the program.

Eugene: Thank you! I’m very excited to be here.

Carlos: Yes, it’s good having you. You have been a guest perhaps unofficially before on a couple of our panels when we were up in Pittsburgh and then Baltimore. You contributed to the conversation, we’d met, started talking and we want to get you on the program so thanks for being here.

Eugene: Yeah, definitely.

Steve: I guess we should say, welcome back.

Eugene: Well, I’m happy to be playing a starring role. I’m still mad at myself the first time because you’re supposed to say your name whenever they gave you the mic and I forgot to do that, so I’m just like Guest Speaker 3 or something like that on the first one.

Steve: The unknown voice with no credit.

Carlos: Yes, so we apologize. But thank you for being here and chatting with us today. This is actually an interesting topic and we had this as a SQL Family question and I thought that you had an interesting take on this. So the question that we have is how do you keep up with technology? It’s all over the place and of course we’ve even introduced since then kind of SQL Server in the News segment and it’s just amazing all of the different things that continue to come out of Microsoft. Let alone all the other companies out there. So I’ll ask you the question, let’s get started here. How do you keep up with technology?

Eugene: I think you have to just not sleep ever and then you’ll be fine. But for everyone else, anyone who happens to have a family or a significant other, or kids, hobbies, or just regular human body you’re not going to do a very good job of keeping up with technology. I think in many ways it’s not a very well defined goal. I think it’s more of an emotional fear. I think we’re afraid of two things. I think we’re afraid of losing our job or becoming so irrelevant or obsolete that we can’t easily change jobs anymore, that’s the first thing. I think there is a large number of us who fear becoming that COBOL developer who’s never updated his resume, and maybe has a good job now but there is a recession or they want to work somewhere else and they’re out of luck. I think that’s a fear that’s driving a lot of us but then the other question or the other fear.

Carlos: And I think there’s one, maybe lesser fear but I feel it’s kind of out there is, you know whatever, a social situation, “Hey, what it is that you do?” “I do COBOL.” And tech setting and they know what that is and you’re going to get the, “You’re really old.”

Eugene: I can tell you something. I’m still technically at my 20s and I don’t put VB6 on my resume but I know how to write VB6 for similar reasons.

Steve: So are we putting VB6 then on the same category as COBOL now?

Eugene: I would say technologies that I want to avoid.

Eugene: No, Ruby is basically a VB6 with some prettier syntax. I mean you could make the argument. Yeah, no, it’s definitely look down upon for being behind so one main thing is you want to keep your job. But then also you want to keep your friends and family, right? Because I joked earlier that, ok well, you could spend all of your waking hours reading books and watching videos and doing all the stuff and you probably do a good job of keeping up with technology but for me personally 9:00-10:00 PM is sacrosanct. We do not mess with that time. That is our date hour. Me and my wife are hanging out no matter what.

Carlos: Very good.

Eugene: Yeah. It’s important and so there’s balance. I think really what people want to know is how do I keep my job? How do I do it in a way that doesn’t cause me all this grief and anxiety and frustration? Keeping up with technology itself, like it’s impossible. I mean, you follow all the different things that are coming out with Azure. They just talked about CosmosDB where they took DocumentDB and then they slapped on four other different no SQL database models, right? And you’ve got SQL Server 2017. I really hope we’re not switching to an annual model. But they put Python in there. They’ve got all these other changes going on. There’s just all these different stuffs and so you’ll look at all of the things and I just don’t[00:05:00] think, the way people to find it’s possible to keep up. There really is just too much stuff. Maybe 30 years ago you can keep up with SQL but today you can’t and if you count everything, if you count all these different changes.

Carlos: Yeah, this question perplexed me for a while, and I actually asked it when I was on SQL Cruise which is another reason why we’ve been inspired to do the Companero Conference because I was impressed and I felt like there were a couple of people that did a pretty good job of keeping up. But I’m not sure, and not to say that they are not keeping up, but the more that you follow them, the more that you kind of see some niching going on and the more that you see content sharing, right, so they’re kind of sharing what other people are doing. Similar to what we’re doing here. We don’t know all the technologies but we’re bringing people who do and can talk about it. So that’s one interesting facet that I’ve seen there. Sorry Steve, you’re going to say something?

Steve: I was just going to say given all these, I mean, it’s nearly impossible to keep with all technology or even all things in SQL Server. But you need to keep up but you need to keep your job as you said and keep your friends and family. So what do you do? How do you go about staying relevant at that point?

Eugene: I think one of the important things is having awareness on what the problem is and what the challenges are. I think there are a couple of different sources of where this is actually a challenge, so one of the things that we’re afraid of is our skills decaying. We’re afraid of being that COBOL developer and our knowledge becoming less and less relevant over time. That’s one challenge. There is a challenge where we’re worried about all these new technologies. I think the cliché example is JavaScript Frameworks. It seems like there is a new one every 6 months and you don’t know which is the right horse to bet on, which is the right choice to make. I think two really big things, just talking about generalization and specialization. In my mind, specialization is how you pay the bills. You have to pick a specialization and a degree of specialization. You need to figure out, “Ok, what do I want to go deep on?” And it doesn’t have to be Itzik Bengan deep. It doesn’t have to be David Klee deep where you’ve picked one singular thing and you are the “world’s expert”. But you have to pick something to go deep on and so that’s going to require focus. Focus on terms of what things are you not learning, what is your specialization, just setting aside time and that’s going to pay for the food today, that’s going to pay the bills today. But then the other piece that hole like, do I learn Angular kind of piece or in the data world, do I learn R, do I learn Python, do I learn Docker? That’s going to make sure that you get paid 10 years from now. Generalization makes sure that you put food on the table a decade from now. And that’s less about focus and that’s more about time. When you listen to podcast you get this exposure and you’re generalizing. You’re dealing with these unknown unknowns. I think the very first step is deciding do you have a problem where you don’t have enough specialization? Have you not gone deep enough or is the problem that you need to generalize more? Do you need to be more aware of what’s out there? I think for a lot of people they are scared of all the new stuff but really they still need to make sure that they know where they want to go and where they want to focus on for their career. I think the first thing you need to do is decide what’s my actual problem? Do I need to go deeper or do I need to go wider? And what am I doing to deal with that.

Steve: And to complicate it even more, I mean in some cases it might be do I need to do both – go deeper and wider. And that could be more subjective.

Carlos: When I think about it, I feel like at least going through the U.S. education system, right? The three of us have gone to college and that’s kind of the route that we took. You get some exposure there so that’s kind of the generalization if you will. You start in Information Technology you get your first tech job. From there, I[00:10:00] think the most important thing is to go deep. Pick a couple of areas and that could be in a couple of different ways so tech stack. But also even just like an application of stack. More and more we hear from the CIOs and some of the things they are looking for in addition to the tech is I want to know the business. So kind of understanding the pain points and how technology solves those things. And I think once you kind of get deep and again like you’ve mentioned, just one area then it will be easier because you understand the full gamut. It will be, “Ok, where do I want to go next?” How can I take what I know and then apply it to the next hurdle or the next specialization area?

Eugene: Yeah, I definitely agree with you there. I mean, I think for a lot of people like if you are out of college your mission is to get past working at helpdesk. Your job is not to be learning Docker right now. Your job is probably not to be learning PowerShell or Hadoop or whatever the cool new next thing is. You’re right, when you’re coming out of college your job is to get enough specialization that people want to pay you money to do something. But part of that going deep too like you said is that. You know, I do martial arts and there is definitely a big difference between no belt, and white belt, and green belt, and all these different things. And I’m a green belt right now so I’m halfway there at the school that I go to. Sometimes you have to learn how to develop mastery in something. If you’ve never become an expert in area, again I’m not talking like elite top 1% expert. To me expertsy starts whenever you first present to your local user group or you write a bunch blog post; anything where the stuff has to go in through your eyes and come back out your mouth that’s starting to develop expertise. It’s on the far end of it.

Carlos: I guess I’ll throw another option there because I’m a big fan of Lunch and Learns. I think unfortunately managers don’t buy into it. The culture is, “Oh yeah, Lunch and Learn, you go bring your own lunch and make some poor shmuck present on something.” I wish that they would just say, you know what, again it could be like small groups pay the whatever it is, bring in pizza whatever, right, so that you can come and learn this. But that would be another option to say, “Hey, co-workers or group, I’ve learned something.” In fact, Doug Parnell, who is going to be speaking at the Companero Conference. One of the criteria they have for where you can go to conferences or get other training is his ability to able to bring that back and then explain to the group what it is that he learned which is interesting. So that’s not deep specialization. It’s just I’ve listen to it, I have some comprehending, and now I’m going to get at least further enough along that I can now explain it to somebody else.

Eugene: Yeah. Anything that’s going to be testing your mental model or something is going to have you that. And like I’m saying, I think that when you learn how to develop a certain level of mastery that becomes repeatable. Like you said, when you come out of college you need to learn how to go deep and once you’ve done that successfully and you’ve actually gone truly deep somewhere then now when you switch over to Hadoop or something like that you can do that. For me, I get that with speaking where the first couple of presentations that I gave there was a lot of fear and anxiety, and a lot of work. And now I’m at the point where I understand kind of the backbone of a good presentation and so it’s a lot easier for me to say, “Oh, I need to give a presentation on Power Query in two weeks or something like that.” And start putting together that outline, putting together that structure because I know what goes into it. Just the same exact thing with understanding what goes in to actually developing mastery somewhere even if that’s a journey man level so to speak and not a true expert.

Steve: So interesting, with that really the key is on developing that first thing that you’ve mastered. It’s not mastering it. It’s figuring out the process of how to master it so that you can then translate that to the next thing you have to learn.

Eugene: Yeah, absolutely. I think a big part of that like we talked about is understanding the difference between, all these different learning things. Are they giving you exposure or are they giving you mastery? Are they helping you with those unknown unknowns, like “Oh, I didn’t know that Spark was a thing.” Or are they helping you develop more of a mental model of how that stuff works and I think[00:15:00] the big dividing line for that in a lot of cases is is it active learning? Is it something where you have to write or type or speak or code or something so that you can actually test that model that’s in your head. Because you can read all the books in the world or listen to all the blogs, or listen to all the podcasts but you need to have the rubber hit the road at some point, and that’s truly how you develop a sense of mastery and expertise somewhere. Again why I say that I think mastery starts with that first user group presentation or that first blog post because that’s something that really test your knowledge. Make sure you actually understand it at all.

Steve: Interesting. I can think of an example on that occurred in my experience was about 10 years ago I was asked to help teach a class at a local university, and it was just a DBA class and it was not the 70-461 but it would have been the equivalent of what the 70-461 exam was then. Then like right I was about to start doing it the person who’s going to help out bailed out on it so I was all on my own to go teach this 10-week class at the university. And for me that was an incredible learning experience because it pushed me beyond what I knew at that point and it made me learn at not to the point that I could just talk about those things but to the point that I can actually teach those things. And I think that was one of those things that jumping into it I never expected that to happen but I had to go deep on a whole lot of topics over that 10-week period. By the time I came out of it I was at a whole different level on what I knew about those kinds of things. I think your example of being able to take it as input and then give it as output through a presentation is a great way to learn at least in my experience.

Carlos: Then the next benefit. I have to think again kind of because now that you’ve mastered. You know, you’ve done that specialization as you go into the generalization components if you will so i.e. talking with others at a conference, listening to the podcast, talking to a vendor, talking to a co-worker, a potential employer and things like that. You can then pick up on how their topic whether that’s a technology, an idea, a process, how that overlaps into what you already know or how it doesn’t, and then be able to speak to that to help that conversation continue to flow. I guess I’m thinking more of a job interview process because that’s kind of what we were started with as job security, “I’m afraid, can I get a job?” And I can’t say that I’ve gotten all the jobs that I’ve ever applied for. That’s not true. But I feel that ability to be able to speak to the things that they have brought up has definitely been at least something that they had to consider in looking at different applicants.

Eugene: Talking about that job interview, even just talking with people, I think that by having a rich mental model, a rich understanding of something it gives you the capacity for analogy, even if it’s an awkward analogy or strained analogy, at least gives you that option. A good example is all this big data stuff. At some point I want to start learning about Hadoop, and Spark, and all these other technologies, and right now I’m still at that exposure phase. I don’t know pretty much anything but when I start looking into them. You know, I was joking with Kevin Feasel, one of your big podcasts cameos that wait a minute, Scala is just like Haskell but different, or F# but different. Or that Spark is basically a service bus but different, or Hadoop is kind of like whatever the SQL data warehouse project is, that appliance kind of thing that they sell. I forgot the exact name. It’s like Parallel Data Warehouse or that sort of thing. So whenever you have some area that you gone that richness with when someone talks about something in a completely different area you at least have the option to go, “Well, I don’t know anything about that but from what you’ve said it sounds a lot like X.” Or even something simple. When you understand how a transaction log works with SQL Server you’re going to be able to make some really good guesses about how it[00:20:00] probably works with MySQL, or PostGres, or Oracle, or something like that. There is a lot of those things that will translate. And even if it’s not a one-to-one translation at least now you have a jumping board whereas if you are a jack of all trades you don’t really have a good way to tell if that comparison, that analogy feels right or not.

Carlos: Yeah, interesting. Now, to jump back in here you kind of have an interesting analogy with keeping up with technology. You model it after radioactive decay.

Eugene: I do. Well, I think it’s a good way to think about it because again if we talk about the beginning and how keeping up with technology is this nebulous anxious sort of thing. It makes me think a lot about when we talk about the cloud. Which originally was just some guy going, “Oh, this internet thing is undefined I’m just going to draw a cloud.” And we decided that’s our branding, right? That’s our marketing plan. Keeping up with technology is whatever it makes me not feel so nervous at night when I go to bed that I’m going to lose my job. That is keeping up with technology. I wanted some mathematical way because I’m a giant nerd of thinking about this, of actually working through this. And to me radioactive decay makes a lot of sense because when you’re dealing with, let’s say you have a pound of Uranium. I’m no physicist but I learn some basics in school. You’ve got a pound of Uranium. That Uranium is going to have something called a half life which simply put is just how long to have half of it. You could apply that to bunch of things but radioactive materials are pretty consistent and that half life is stable. And so I think that IT knowledge also has a half life. Now, what you say it is can vary. Now, Allen White, he says that every 5 years you have to retool yourself. I remember one the first time I was on this podcast he said that and I said, “Well, I’ve been doing this for five years does that mean I have to start over?” But in college I would joke about the same thing. I’d say, “Half of what you know is useless in five years.” And that’s how it really feels. And maybe it’s 10 years or 20 but the idea remains, but let’s say it is five. Well, you can mathematically model that, right? You can say, “Ok, what percentage would I retain each year so that in five years I’ve only have half of that knowledge.” And it turns out that percentage is 87%. That means that if you know 100 things that are not COBOL. You know, 100 things that are still relevant today then if your half life, your IT half life is five years, that means that 13 of them either fell out of your head or no longer applicable, right? 13 of them are either VB6 or something you haven’t done so long you forget how to do words, or DTS or whatever.

Carlos: You kind of know it but you wouldn’t want to be asked to do it again.

Eugene: Right, and so that kind of gives you a way forward because if you think of it that way then we’ve got three knobs that we can twist to try and improve how much stuff we know so that we’ve got a longer half life ourselves, a longer shelf life, whatever you want to think of it as. The first option is that you just learn more stuff. You just shove as much stuff in as you can.

Carlos: So instead of 100, it’s 150.

Eugene: Right, exactly. If you need to learn 13 things a year just to tread water then if you can learn 20 or 40 or 50 or whatever then the total amount of relevant knowledge you have is going to increase. Do you want to go deeper into that right now or do you want to go through all of the three.

Carlos: Let’s go to the other three things. I think that would be good.

Eugene: Ok. The second knob that you have is you can learn more of the right things so that’s about having a better focus. That’s about having a plan. That’s about improving the signal to noise ratio because you can spend 160 hours in your entire week reading Twitter and Hacker News but you’re going to learn about local elections or Go Lang or Rust or some local startup or what Zuckerberg is up to this week. Even the technology things may not be relevant to where you want to go or what fits your knowledge or just there’s a lot of junk out there. There is a lot of low quality materials so if the first thing is learn more things. The second thing is to learn more of the right things. Learn more of the things that fit what you want.

Carlos: So staying away from bleeding edge stuff and away until you start to see some more adoption. Maybe early adopter is the phase. You’re like, “Ok, that’s what I[00:25:00] will jump on to because I’m seeing it more widely used.”

Eugene: Yeah, I think one of the strategies with dealing with the bleeding edge stuff is make a low investment with that. So that’s why stuff like this podcast is so great because you can spend an hour while you’re doing something else and get enough to be conversational at bleeding edge technology and then later on you can figure out, “Ok, this fit with my career. Now, I want to go deep.” So that’s the second thing is just learn the right things. The third know that we have is that radioactive decay, that how quickly does my knowledge become obsolete and that relates to what you just said as well is learn things that last. Learn things that last longer. So things that don’t last are stuff tied to a specific version. So the exact feature set that happens to be in SQL 2005 is perhaps not too useful to you. But understanding how to use some of those features that came in there or understanding some of those advance window functions that came with 2012. That is going to last longer. Certain types of technologies are just immature. Again, I joke about stuff like Angular where they’ve been breaking releases every 6 months but you have that big data space. It’s the hot new thing but I’ll tell you what there is a great site called like Big Data or Pokémon and it will give you a big… It’s true!

Carlos: Nah, I have to look it up.

Eugene: Go and look it up. So it will give you a name like Empowla, or Horsea. I forget some of the other ones. And they’ll say, “Is this a big data program or is this a Pokémon?” And then you’ll click on a button and it will tell you if you’re right or wrong. And you’re going to be wrong a lot of the time. It’s true. It’s great. It’s the best site ever.

Carlos: Ok, here we go. So I’m here, https://pixelastic.github.io/pokemonorbigdata/. We’ll put it up in the show notes. So the first name is Horsea. I happen to be a Pokémon player for the kids, for my children. I have 5 kids.

Eugene: Sure. Yeah, family bonding. I get it.

Carlos: That’s right. So Horsea, big data or Pokémon?

Eugene: Are you asking me?

Carlos: Yeah. I’m asking to the group.

Eugene: I’m pretty sure that one is a Pokémon.

Carlos: Yeah, I’m going Pokémon too. Steve?

Steve: Yeah, I’ll with the group on that one. I’ve never heard of that big data.

Carlos: Yeah, here we go. It is a Pokémon. Ok here we go, Vulpix.

Eugene: Ok, that’s definitely a Pokémon.

Carlos: Definitely a Pokémon.

Eugene: I had a try with it. I promise you.

Carlos: Here is a softball one, Hadoop.

Eugene: That is a big data.

Carlos: That’s definitely a big data. Here we go, it’s a native one that I’m not sure of anyway, Spoink.

Steve: I’m going to guess it’s a big data.

Eugene: Yeah, that’s sounds like something is going to make it for a big data company.
Steve: Sounds like a tech thing.

Carlos: Oh, it is a Pokémon. Look at that. Ok, that is funny. So I don’t know if I should thank you or send you a nasty email now that you’ve introduce me to the site because I’m going to have to go through and.

Eugene: It depends on how much time you waste.

Carlos: Exactly.

Eugene: So the point that I was making with that is that when you have so many of these big data technologies, even within Hadoop you’ve got all these goofy names. You got Pig, and Sqoop, and Flume, and Hive and HTFS and all that stuff. Because it’s immature you don’t want to make a huge time investment. These are things that are going to decay quickly because it’s going to be like some sort of ultimate battle and by the end of it one is going to standing with the crown. And you don’t know which one it is right now.

Carlos: Now, there’s a lot more players in it but it almost reminds me of, what was it? Blu-ray? And what was that technology?

Eugene: It was something like DVD, HD DVD or something.

Carlos: Yeah, DVD or something.

Eugene: Yeah, exactly or even going back VHS and betamax and all that kind of stuff. And so bleeding edge technologies are something that don’t last. But let’s talk about what things do last and we had to some of these things. But things you’re going to learn that last. One of the biggest one is people skills. People do not change or if they do it’s much much slower in terms of centuries than it is the years with technology.

Carlos: So decades, generation.

Eugene: Grammar doesn’t change that quickly. I can promise you. So if you’re going to learn how to write a good email or I have a blog post about how to write a good abstract, you know, that’s going to last the test of time along at the same time,[00:30:00] speaking – public speaking skills. You guys do consulting and I’ve learned myself that if you can stand up in front of 50 people and pretend like you know what you’re talking about you can do it too. Learning the trick of, “Well, I don’t know but I think it will work this way I’ll get back to you. I’ll give you an answer.” Those kind of soft skills are timeless, truthfully.

Carlos: The thing you’re intuiting there is we’re just making the stuff up, aren’t you?

Eugene: No. I think I implied it. I don’t know if I intuited but the distinction is lost on me.

Steve: So it would really be really interesting as we go through these different items if there was like a periodic table of skills that you could look and say, “Well, the half life on public speaking is 200 years. But the half life on big data is 9 months.” And try and do a comparison that way to figure out, “Ok, if you need to increase your skills overall.” What are the ones that you can either increase or going to last for a long time versus what can you learn quickly that might be a risk but it may pay off in the short term but you know what it’s going to be different 5 years from now.

Eugene: Yeah. I would say the people skills are definitely the noble gases of the skill world because they are not reactive. They last forever. But another thing that last long is I think, you know we talk about it going deep, understanding theory, fundamentals and internals. Going that one layer below and understanding how something actually works because it’s so much easier to tranche for that. But it also lets you make certain guesses and inferences. I’ll give you a perfect example. I have literally thanked Paul Randall twice for his transaction log course because it saved me so much for understanding that like for example dealing with availability groups. If you don’t know how the transaction log works on an internal level, availability groups are such a huge pain because you’re like, “Why I can’t sync this?” Or you say, “Do I have to take backups on both sides?” But if you understand how it actually works then you can intuit a lot of things. You can intuit, “Ok, if I’m taking a backup right now is the transaction log going to keep growing while I’m still doing the backup or will it stop?” That kind of stuff. So we talked about three different things: learn more things, learn the right things and then learn things that last. Things that last is going to come down to the deep stuff fundamentals, internals, some of the hands off stuff. And then it’s going to be those people skills. It’s how to write, how to read, it’s how to communicate, it’s how to learn in general, that kind of stuff. So those are I think the three different approaches you can take because the first two increase just your inputs, and then the last one decreases that radioactive decay. So if you know 100 things, if your half life, if you can shift that from 5 years to 6 years. If you can make that tiny little shift then still learning just 13 things a year, you’re going to end up knowing a 120 instead of 100. So slowing that decay you’re going to know more relevant stuff as a result over time.

Steve: Interesting. As you say that I think I’m really glad I’m not a JavaScript developer because I think the half life there would be…

Eugene: 6 months.

Steve: If even that maybe.

Eugene: Like I said, I know that Angular is coming out with like build number changes like full number changes. I think the plan is supposed to be every 6 months or something like that. And I’m still mad about SQL is coming out every 2 years so I don’t know how I will deal with that.

Carlos: Yeah, that’s right. Different worlds, right? You know the dev ops level on that side.

Eugene: It’s sneaking over the SQL world for sure all the dev ops.

Carlos: It’s well on its way. Well Eugene, great information and I guess we should note that if people wanted to extend the conversation a little bit or actually here you present this, your presentation at the GroupBy Conference would be available[00:35:00] and I’m sure it will be posted by the time this goes out actually.

Eugene: Yeah, we expect so.

Carlos: We’ll make sure that’s included in our show notes as well.

Steve: So I guess then just to kind of wrap it up at this point.

Eugene: Yeah.

Steve: Well, before going to SQL Family, just summarize a little bit of where we’re at.

Eugene: Oh sure. Yeah, ok, I can do that. Just to summarize everything you have to figure out, “Ok, what it is my real problem?” Is it that I need to go deeper with things or do I need to be learning more things? And then if I’m going deeper I need more focus. I need a plan. I need scheduled time because doing active learning is hard. It requires focus. That’s the fuel for deep learning. The fuel for generalization and broad learning is time. But you can listen to podcast while you’re exercising or doing the dishes or commuting. You can learn some of these things without giving it your full attention. And you don’t often want to give it your full attention because it’s so volatile. But really a lot of it comes down to three big things. If it’s like this radioactive decay where our knowledge is continually fading in relevancy, you can either learn more things which means putting in more time, more energy, or more money in some sort of way. You can learn the right things by say leaning on curation making sure you’re dealing stuff that’s good quality or having a plan and making sure that stuff fits in within your plan. Or you can learn things that are going to last longer; that are going to last more than five years and not become irrelevant, that aren’t just a hot new thing. Generally, that comes down to going truly deep and learning internals or fundamentals or theory. Or means learning people skills or business skills, things that haven’t changed nearly so rapidly over the past 10, 20, 30 years, things that sometimes don’t change for generations. So that would be my general advice with trying to keep up with technology. You may not be able to truly keep up with technology but you can find a way to keep your job and keep your friends without so much angst and so much anxiety.

Steve: Alright, very cool.

Carlos: Good stuff.

Eugene: Yeah.

Carlos: Shall we go ahead and do SQL Family?

Eugene: Sounds good to me.

Steve: Let’s do it. So how did you first get started with SQL Server.

Eugene: So it was largely by accident if I’m being honest. I took a database course in college and that was using MySQL as backend. I was a TA for that class later and so the different professor was using Access. And then later I did a non-credit intern project and did all the development work and that was using MySQL. Up until my first long term job, my current job, no experience with SQL, didn’t know it was a thing. And then I’m looking for a job after my first one and the job says .NET/SQL developer. And I’m like great, I always want to do software engineering, do a lot of programming, this would be perfect. Well, I thought it’s going to be 80% .NET and 20% SQL and it was flipped. Half of that was DBA stuff and I remember my first month looking, googling up the differences between a view, a stored procedure, and a function because I didn’t know any of that at the time. I could do my SELECT *, I could do my WHERE and that was about it. But I just learn on the job and I got involved and then I find out that, “Oh, user groups are a thing.” And I start going to local SQL user group in Pittsburgh and then I found out SQL Saturdays are a thing. I’ll tell everyone here. Don’t go to the after party because you’ll end up as a speaker. I got cornered by Gina Walters who was running the group and Rick Heiges who was a former PASS board member, and they’re like, “You should present.” And I said, “I’m not qualified.” And they said, “You should present anyway.” And so I gave my first presentation on execution plans. I was terrified but I loved it and I just kept going from there.

Steve: Alright, good stuff.

Carlos: Now, in all that time working with SQL Server, if there is one thing you could change about it what would it be?

Eugene: I know this had been said before but licensing. I would change licensing. If there was just one simple guy like I get, ok we got like Free, Express and we’ve got Standard, and Enterprise. Microsoft wants their money they see Oracle doing their thing I get it. But then you’re throwing stuff like, ok if you have a cold standby, that one is free. Well, in 2014 we change that now you have to have software assurance for it to be free but the moment you start taking backups, you’re doing production works so doesn’t count anymore and all these little nuisances are just really overwhelming. So licensing by far I would change.

Carlos: And then if you have that license you could take it to the cloud, but then you[00:40:00] have to

Eugene: Yeah, now you got hybrid.

Carlos: Failing over, and if you’re in the cloud for too long and that’s different licensing.

Steve: Yeah. That’s definitely one that would be worth straining out a little bit. So what’s the best piece of career advice that you have ever received?

Eugene: I’ll give you two because the best piece of career advice I know of I got out of a book so I don’t know if I’d count that receiving it but there’s really great book that I was given by a friend in my first job, and it’s How to Have Confidence and Power in Dealing with People which sounds really fancy but it’s a lot of common sense stuff of just how to work with people and talk with people and that kind of stuff. For someone who is this introverted nerd who didn’t know how to work with other people it was big. And the biggest thing out of that book, the best career advice that I’ve ever found in my career is “paraphrase what people say”. Repeat it back to them to make sure you’re on the same page. Just ask, “Hey, do you mind if I paraphrase that to make sure we’re on the same page.” And then just repeat back what you heard because there are so many times that you heard something different than they said and even of you got it right it lets them know, “Ok, he understood”, and they can relax a little bit so that’s been huge for me. As for received, probably definitely something that’s recent and sticks in my mind is from Erin Stellato where I talked to her about, “Hey, I want to get a job and big data or data analytics or something like that.” And she said, “Make the job that you want to have.” In the sense that instead of thinking, oh I’m going to have to find some other job. Well, I can look for opportunities to say, “Hey boss, I did a little bit of R with some of our internal metrics and here is what I’m able to find.” Or just something that shape the job that I’m already in to something more of what I wanted to be three years from now or something like that. That’s something huge.

Steve: Ok, great.

Carlos: And not to bang this drum companeros here, forgive me. But I think that idea is if you can tie the technology to a business scenario I would be willing to wage your 99% of the time you’re going to get to do that project. You know, assuming budgets and all of that stuff are all in order. But if you can prove value to the business by it, much easier scenario, much easier conversation than, “Hey, I want to do big data.” I have this problem I think I can solve it. Now having said all that our last question for you today, Eugene, if you could have one superhero power what would it be and why you want it?

Eugene: Yeah, I’m tear with this question because I’d want to be able to learn mildly useful things really quickly. Because I feel like most superpowers would be just way too obvious, way too intrusive like, Carlos, if you’re flying around the work or whatever people are going to notice and then you’ve got paparazzi and all this kind of stuff, right?

Carlos: Got you. There you go.

Eugene: Or if you’re some super genius that you can just touch a computer and tell what’s wrong then people are going, the FBI is just going to kidnap you and dissect you and figure out what’s going on. But there are all these minor little skills that I mentioned and that are useful but no one would go, “Hmm, I wonder what happened to him?” Like I want to learn lip reading someday or lock picking or something that my wife and I are learning right now is sign language. Like she is fully capable hearing, no problems there at all. Well, ok maybe sometimes she can’t hear me as well. But we’re learning sign language because one it’s just this cool thing. But two it legitimately is something useful in these occasional situations. So if you are in a loud concert or you’re 30 feet away from each other you can still communicate. And right now our repertoire is pretty limited. We mostly can say, “Hey, I’m going to the rest room.” “Oh, look at that cute child.” But we still get some value out of it right now. So my superpower would be learning all these like mildly useful little skills really easily but nothing that would attract notice by any authorities or other people.

Carlos: Lots of attention.

Eugene: Yeah, right.

Carlos: So I’ll second you there on the sign language. My wife and I took a class while we were in college together. It hasn’t been super useful outside of teaching our kids when they were growing up some sign language like terrible tooth time they can’t quite talk. They want to communicate that’s been the best thing there for it but yeah, super cool. Eugene, thank you so much for being on the program today.

Eugene: You’re very welcome. It was a pleasure.

Steve: Thanks, Eugene, really enjoyed it.

Episode 103: Plan Reuse

When we write our queries to the database, SQL Server has to go and figure out the best way to go and bring back the data you asked for. A query plan is created to help SQL Server remember how to get the data for this query. It takes time to make these queries, so the database wants to limit the number of times it has to create the plans so it will try to reuse the plan as much as possible.

Our topic for today’s episode is query plan reuse and the pros and cons with this concept. We will also touch on the concept of parameter sniffing–a technique SQL Server uses to try and figure out the best values to use for the execution plan with the hopes the plan will help the most queries. Special thanks to James Youkhanis for the suggestion.

 Episode Quote

“The concept behind this is it’s there to make things a little bit faster by reusing cache plans.”

“Parameter sniffing is a good thing because without it SQL Server wouldn’t be able to optimize your plan for any kind of parameter. But occasionally it goes wrong.”

“I think it kind of comes down again to kind of knowing your system and understanding the problem”

“Optimized for adhoc workloads is one of those parameters that we most of the time will recommend people turn on”

Listen to Learn

4:53  SQL Server in the News
5:00  Ola Hallengren scripts now on GitHub
6:45 What is plan cache?
7:48 Description of T-SQL and its execution plan
10:15  Scenario in regards to statistics and indexes, and data types
11:30  One-time use query plan cache
12:22  SQL Server and the importance of memory
12:50  A specific problem with one-time use query
12:55 Parameterization
17:30  Parameter sniffing
20:25  Stored procedure and plan cache, parameter sniffing issues
23:55  Options to solve parameter sniffing issues, recompiling
27:28  Controlling plan cache size
28:10  Plan cache and flash array
29:27  Idea of ad-hoc workloads
32:30  Needs parameter reports and examples
38:15  One-time use query reports
38:50  Instance level memory report
39:40  More about hints, recompiling and plan guides


Transcription: Plan Reuse

Carlos: So companeros, welcome to Episode 103. Thanks for tuning again to another great episode.

Steve: Yeah, Episode 103. Wow, so this episode is on cache plans and plan reuse which is one of my favorite topics in SQL Server. I know I’ve had a lot of lively debates over the years around this.

Carlos: Yeah, that’s right. I think it’s one of those things where from a performance perspective so indexing and some of the other objects and then you got to figure out to the internals of how SQL Server works. Yeah, it can a bit confusing a little.

Steve: And this topic came to us from James and he suggested that we talk about plan cache and plan reuse. I want to thank him for that suggestion.

Carlos: Yeah, we apologize. He was suggesting, gosh it’s been, I’m embarrassed to say how long have it’s been but it was during a time when we had a slew of interviews kind of lined up. It was kind of push to the back there but we’re glad to finally circle back around to it. We have a couple of shout outs this episode.

Steve: Yes, so one shout out came from sqlgambo on Twitter and this was in regard to a post that I had tweeted about Database Health Monitor, and he came back and said why I am building these tools of mine. I guess he hadn’t seen Database Health Monitor and then he and I actually chatted through private messages on Twitter for quite a bit and learn some stuff about what he is working on. And learn that he really likes with Database Health Monitor, so yeah, good to connect there.

Carlos: Yeah, very cool and it’s kind of interesting. We were talking before we started recording here about the history of database health monitor and how it kind of started from SQL Server reporting services reports. I was there, I was building my own home grown and came across Database Health Monitor. Lots of other tools out there, that’s one of the very nice things about the community and making those things available.

Steve: Yup, definitely.

Carlos: So Companero Conference coming up again on October – October 4th and 15th. Interestingly enough I was just at Nashville for a health conference this week trying to make some connections with some hospitals, and one of the things that stood up to me was the importance of unstructured time. So in that conference they were trying to because it had a mix of panels and speakers, and then vendor type sessions which weren’t horrible but they are still vendor sessions. So they were just trying to get through them. They kind of plow through it and they had this conference in a day and a half, a little bit more. I didn’t felt we had enough time just to talk, like let’s understand what it is. Are you being affected by this problem to be able to keep the conversation going? Anyways, so I thought give me some perspective on this idea of creating some structured content from over a session or from a panel. But they aren’t giving the unstructured time for people just to be able to talk, connect, right? Where are there similarities and commonalities? What I might want to pick up with this person after the conference. What conversations do I want to maybe talk about tomorrow or things like that? And so I found that again kind of single track conference, other way we’re going to do ours that I would have enjoyed a little bit more unstructured time with some folks.

Steve: Interesting. Yeah, I think that that unstructured time can be more valuable relationship building time that you have at a conference. And I think going to a conference and just getting the material, I mean, really you could do a lot of that on YouTube. But going there and making those contacts with people and being able to have time and talk about what it is you’re doing or when we do a session on performance tuning to have time to talk afterwards about issues you’ve ran into or problems you’re trying to work on can be incredibly valuable.

Carlos: I agree. I am looking forward to our conference and putting that on. We hope Companeros will join us October 4th and 5th in Norfolk, Virginia. You could take a look at companeroconfernce.com and we’ll make sure we have the link on the show notes page again as well.

Steve: And now on to SQL Server in the News.

Carlos: This has been up for a couple of weeks now but thought it was interesting. Many of you may be using Ola Hallengren scripts. He has decided to put them out on GitHub, kind of made them available. Obviously they are free to download but the now the difference being that you can actually suggest changes to his code and I know that a couple of people have done that already so it would be interesting to see what happens with those scripts as a result. Kind of going back to our community episode we talk a little bit about this, so it would be interesting to see what happens.

Steve: You know, we didn’t really talk much about Ola scripts on the community episode because it wasn’t really a community contribution project. It was sort of something he has built over time but now that it’s out on GitHub maybe it will become something more amazing than it is already based on community contribution.

Carlos: I believe it came out on GitHub after our episode so I wonder if we are not influencing people out there, Steve.

Steve: Yeah, who knows.

Carlos: Can we take credit for that, you know, SQL community you can thank us for Ola putting his stuff on GitHub.

Steve: Whether we were the cause or not.

Carlos: Yeah, sorry, butterfly effect or something. Ok, so today’s episode can be found at sqldatapartners.com/plancache.

Steve: Or at sqldatapartners.com/103 for our episode number.

Carlos: So again, ultimately what we are talking about is execution plans, plan cache, and plan reuse. So first, I guess let me back up and kind of from the 10,000 level view, what is it that we talk about when we are talking about plan cache.

Steve: Well from the high level it’s basically the plan cache is a piece of memory or junk of memory in SQL Server that keeps track of every query plan as it gets compiled. And if your queries are written in a way that it can reuse the plan it can then grab and reuse one of those existing plans rather than having to recompile that every time. And the concept behind this is it’s there to make things a little bit faster by reusing cache plans.

Carlos: Right, so allow me to go back up just a slight bit higher and take the idea that we use T-SQL to tell the database what it is that we want out of the database. So T-SQL has been described as the language of what is it that we want but it doesn’t tell the database how to go get it. So the database has to decide how best to do that and when the query comes in like as you mentioned, it’s going through a process to decide, “Ok, well this is what you want. How do I need to go and get that data?” And so as a result it’s going through a look and say, “Ok, well I think this is the best way and I want to create a plan.” I want to create an execution plan or a way to go and get this so that way every time you want it I will know how to go and get it. There could be many different ways almost like kind of going from Point A to Point B, lots of different ways to get there. And it has to figure out which way is the best or at least which is its going to use on a regular basis.

Steve: Right, and it can be a very expensive step in the process that compiling and figuring out how it’s going to go about getting the data.

Carlos: Well, it’s interesting. So they mentioned expensive and I guess kind of this but there is a cap, a couple of milliseconds, which all of a sudden I can’t remember how many it is, that it will what that is. And I thought, well gosh milliseconds isn’t sound all that long of a time but I think it’s all a matter of how busy your server is and then how many executions are coming to SQL Server as well.

Steve: Yup. And I guess to put it in perspective with milliseconds there. I mean last week I was working on a query with a client where we were trying to increase or decrease the run time from about 800 millisecond down to about 150 milliseconds. And milliseconds could make a big difference there, and this was a query that was being run continuously like through website traffic, web service
traffic and all kinds of things. It was being hit quite often so a difference between 150 milliseconds and 800 milliseconds meant a lot to the performance of the system.

Carlos: Sure, that’s one of the great point, right, is the frequency not just of all of the queries but of that specific query, because if it had to recompile every single time and you’re adding milliseconds on there then you’re just kind of piling everything back up and it’s going to go and redo a lot of that work every single time.

Steve: Yup, and the work that it’s doing there is it’s going out and it’s looking at statistics and what indexes are available, and what data are you looking for, and what are you filtering on. And it puts all those things together to figure out what is the best way that a SQL Server engine can you go and get your result set for you. And if it wasn’t so smart it will just say, I want to go out and I’m going to do a full table scan on every table you’re looking at and give you your results brought back together on your JOINS. But that just wouldn’t cut it in today’s world. Maybe databases 20, 30 years ago might have done that but today there is so much going on with indexes and big tables and different ways to get the data. There is a lot of options there to look at. I mean if you got a table with 30 indexes on it versus a table with 2 indexes there might be more work that has to happen there when it’s figuring out what is the best plan to use.

Carlos: Sure, and then we kind of get into data types, right? That plays a role as well. There a lot of things that it has to look at and consider.

Steve: Yup, so what happens after that plan gets compiled is it gets put into this memory location called the plan cache and those plans are kept around with the hope, SQL Server hoping, that they will be reused so it doesn’t have to do that work again but sometimes they never get reused. What you end up with is you end up with what could be called the one-time use query plan cache where if things are changing in the query and they are not identical you end up with all this one time use queries in the plan cache that can kind of clog things up and sometimes push other things out of the plan cache that would be useful to be reuse.

Carlos: Like you mentioned, going back to that idea, so the plan cache is a space of memory where your data has to store as well, SQL Server read everything from memory, right? So has to be able to read that stuff there and so if you’re not using it that mean that there are, memory is kind of a vital thing and a finite thing as well, that you are using those resources in a way that’s not helping your SQL Server go faster.

Steve: Right, so think of it this way. Let’s say we had a table called ‘podcast’. And in there we have the list of all of the podcasts that we know about and in there is a column called ‘host’. And you just said, SELECT * from podcasts WHERE host = ‘Carlos’ If you run that query it’s going to create, I mean by the first time you run it it’s going to create a cache plan and then if I come along and run that same query a moment or two later or a few minutes later and I run the exact same query SELECT * from podcasts WHERE host = ‘Carlos’ It’s going to not have to recompile that. It’s just going to use that already compiled plan and it’s going to save some time there. But then if we change it up and Carlos runs the query to say SELECT * from podcasts WHERE host = ‘Carlos’ and I say SELECT * from podcast WHERE host = ‘Steve’ that’s going to be two different plans because the queries are different. By different if you just look at the text of that entire query and if it’s not exactly identical meaning turns in the same place, space and the layout is exactly the same and the difference is we’ve changed the WHERE filter to say Carlos or Steve, that’s shows up two different plans in the cache. Now imagine if this was like, SELECT * from customers WHERE customer_name = ‘Fred’. Or customer_name = ‘Mary’, or customer_name = any of the 10 million customers you have in your website. You could end up with many many of these one-time use queries that may not or maybe they get used once or twice or three times while that customer is there but they end up chewing a whole lot of memory, and the way you get around that is you use parameterization. And the way that works is instead of saying, SELECT * from podcasts WHERE host = ‘Carlos’ you say SELECT * from podcast WHERE host =, a parameter. And that parameter is there and when that query gets run, whether you are running it through code you’ve written, website code, or reporting services or wherever. Instead of passing through the text string of Carlos or Steve as the parameter it passes through a separate parameter that says compile this without knowing what the parameter is necessarily. Just compile it and then we’ll fix up that parameter after the plan has been compiled.

Carlos: Then one of the ways that it goes and figures out but maybe parameter it should use as a default, is it well look at those statistics to say, “Ok well, I see Carlos is in here 100 times, Steve is in here 10 times.” Like, “Huh, Ok, I see there is a tilt towards Carlos so I’m going to assume that more people are going to query Carlos than Steve.” So I’m going to make some decisions potentially based on that distribution. This can be a good thing or this can be a bad thing.

Steve: Yup. An example of that, think of it as if you were looking up for customers by zip code. And now imagine that you are in a small town running a business and maybe that small town has one zip code, maybe two zip codes, and most of your customers are local so most of the time when you’re looking up customers you’re looking them up based off of those one or two local zip codes. Well, then you get a customer that is somewhere on the other side of the country in a different zip code. It might be that 99% of all of your customer are in one or two zip codes but then you have this 1% that are in other zip codes. What can happen with that is that the plan can assume that most of the time that’s being run with something that requires more work because it has to scan more customers based off that zip code distribution but then when you run it for that one customer without one zip code that doesn’t match your local zip code. It could get to that customer with less work but it doesn’t because it goes through the normal path of that pre-compiled plan to find the information there. There is a term for that and it’s called parameter sniffing. Where when a plan gets compiled the first time it gets compiled it looks what are the parameters that are being passed in and it figures out a good plan that’s going to work for those parameters.

Carlos: Based on everything that I know as of the data that’s all in these tables what is the highest probability of what’s going to come in and let me make it easiest for kind of the 80-20 rule if you will. That’s where I’m going to go and try to get that.

Steve: Yup, so then let’s say you have that customer example by zip code and you are looking it up by zip code and the very first time or when the plan gets compiled you use one of these odd zip codes that’s not a very common zip code in your system. It may then look at it and say, “Ok, there is only a very small percentage of the rows in our result set that use that zip code so what we’re going to do is we’re going to do an index seek right to that zip code location.”

Carlos: An anomaly in that sense. Like, “Oh, all the data must become an anomaly.”

Steve: Yes, but then if it was the first time you compile that plan and it was one of the common zip codes it may look at that in an example of instead doing a seek which we go right much quicker to where you’re going for the smaller set of data. It may say, well a majority of the rows in this table are in this zip code so instead of doing an index seek we may do an index scan or even a table scan because that’s going to be the most optimal way to bring back everything that you are looking for based off of that initial parameters. So what you end up with is that when that plan gets compiled and that’s compiled the first time you run it or if it gets pushed out of memory or somebody flags it to be recompiled that the next time it is run it gets recompiled. But if that time that it gets recompiled if you’ve got good parameters that represent a regular data set in your system you get really good performance out of your query. But then if it happens to get recompiled with one of those unusual parameters that causes your plan to do something different then what is optimal most of the time you could end up with a really inefficient plan, that ends up bogging down the system or really hurting the overall performance.

Carlos: Sure, and what gets a little bit weird and you may think I guess why would this affect me, this should be a problem. If you’ve ever seen scenarios where one time query runs pretty fast and then all of a sudden it doesn’t and then maybe later in the afternoon it runs again fast. That’s a common symptom if you will.

Steve: Yup. And I guess with that the common scenario that I see as a freelance consultant as Carlos and I are. I see that you will be working with a client and they’ll come and they’ll say, “Things have really slowed down. We haven’t changed anything on the database and something has really slowed down on this one specific process or job.” Or whatever it is that’s calling into to use this query. And then you’ll go and look at it and you’ll find here is the stored procedure that’s the root cause of the problem. You’ll copy it out. You’ll copy and paste it and put it into your Management Studio and change it around so you can run it, maybe not as a stored procedure but in line. And you run it and everything runs great because with the changes you’ve made to go run that in line it gets a different plan. And then you look and you think when you look at weight statistics and something like and you can see that like at noon today that’s when things just tipped and it went bad; and prior to that everything was running great. So what could have happen to cause that and what often happens is that you have a stored procedure that gets pushed out of the plan cache for some reason. The next time it is ran it is run with this unusual parameters sets which causes it to get a bad plan and then every other call into that starts using that bad plan. Or it was good plan for that one parameter but it’s a plan for the other parameters. And I’ve seen that take a stored procedure that normally runs in under a second and cause it to run like 7-8 minutes when it’s something more complex. Then everyone they hear that and they grip and say, “Oh, the database is horrible and everything broken.” And this is all really because of the parameters sniffing and parameter sniffing is a good thing because without it SQL Server wouldn’t be able to optimize your plan for any kind of parameter. But occasionally it goes wrong and you end up with the wrong plan based off of plan reuse on that stored procedure.

Carlos: And you may be thinking how big of a problem could this be. Now, I won’t say it’s the only reason but the new feature of the query store feature is basically trying to solve this very problem. And so it’s big enough of a problem that Microsoft has built a tool to help you combat it. And I think it’s one of those things that I think a lot of, at least I can remember going back and being frustrated and I think a lot of times it probably had to do with parameter sniffing.

Steve: Yup, and most of the time I ran into parameter sniffing issues it starts with customer service department who’s working with customers getting lots and lots of complaints that something has gone wrong or the system is down and not responsive. And then it leads to then eventually jumping into the database and finding, ok here is the problem, and then there is a lot of different things you can do to help mitigate the issue with that point. I mean one of them is people will put the option recompile on a query or the recompile hints on a stored procedure and that can cause more problems or different problems.

Carlos: Well, let me explain that. Let’s just go through that just for a little bit. So now I guess what we’re saying is to mitigate that you can tell as it goes through the process of actually creating the plan you can get, there is some we call hints, because a stronger word is plan guide but that is slight different. We’ll touch that in a minute but you can tell SQL Server, “Hey, SQL Server you should do it my way.” Or in the example of the recompile you’re saying, “Hey, every time you run this I want you to recompile.” I don’t want you to use the plan you have available to you. I want you to throw that plan away every time and figure out how to do it again. And so that’s an option that you have but there are risks if you will associated with that and you want to be careful about how you go about or when you go about implementing that option.

Steve: Yup, and one of the knee jerk reactions that I often see when I explain to people that, “Well, here is what happened. You got a parameter sniffing issue. The query got a bad plan, or the stored procedure got a bad plan. Here is what we did to fix it. We force it to recompile or we change the query around.” They often think, “Well, can we just force all of our stored procedures to recompile so we’ll never have this issue again.”

Carlos: Yeah, exactly. How do I eliminate the problem from never happening again.

Steve: Yup. And then the answer to that is, well you could but however by doing that you would eliminate any of the benefit that you get from the plan cache and be able to reuse plans. And depending on the system and the performance there, that could make things far worst over time depending on the load. So there are things you can do. You can go in and perhaps change the queries or change how things are working in the stored procedure or maybe have a stored procedure that calls another stored procedure that does something different. I mean there is a lot of different, there are probably 20 different ways you could go to figure out how to do this right. I guess we don’t have time to go through all of those now but sort of the knee jerk reaction is just make it so that everyone recompiles and that’s not a good thing to do.

Carlos: Well at least by default. I mean that may be an option that you pursue but don’t make that your first choice.

Steve: Oh yeah. And I’ve certainly used that for a single stored procedure or a couple of problematic stored procedures. Used the recompile option on them and every time they are run they get recompiled. And it’s just because of how they’re written and they are in positions where they could be rewritten but the overhead of recompiling those is cheaper than the time it would take to go rewrite those.

Carlos: Right, but I think it kind of comes down again to kind of knowing your system and understanding what the, so you understand the problem, “I believe I have a parameter sniffing issue.” What do I know about this either query, procedure, view, whatever it is and do I know any history about it? Can I go find some of that out to then understand what makes the most sense?

Steve: Yup. And we could probably go for hours on parameter sniffing but let’s shift back a little bit to sort of the generic plan cache topic now. So one of the things that often comes up with the plan cache is people say, “Well, how do I control the size of the plan cache?” And, you can’t. It’s something that’s dynamically sized by the SQL Server internally and it depends a lot on the server load and what memories are available. One way to control it is just put more memory one the server but that’s not a really good answer.

Carlos: Well, another feature that they added, I’m forgetting the edition, I want to say was it 2014? It seems like it was older than 2016 but maybe I’m remembering wrong. And that is when they added the ability to add flash, so if you have flash on the SQL Server you could actually expand the plan cache to use that flash array to give you more space when you have an issue like this. So to kind of indicate the gravity of problem Microsoft is putting solutions out there around the plan cache and its size. Even if they are not giving you the controls of like in Oracle to say this is what it should be.

Steve: Right. So the best way I’ve found to deal with the plan cache if you’ve got stuff that is getting pushed out or a lot on one-time use queries in there and things like that, it is to better understand what’s in there and then it might be and I’ve worked on systems that I have tens of thousands of different queries run against them and then it turns out there is a dozen queries that are really the big offenders in hogging up the plan cache was one-time use queries. And you can go in and work and optimized those dozen queries to use parameters or do whatever needs to be done there. And oftentimes with a small amount of work you can have a really big impact on the plan cache there.

Carlos: This is where the setting, is that the right word, for optimized for adhoc workloads comes in. So this idea of this adhoc is that, “Hey, I have a one-time use query. I have a whole bunch of those. What I’m going to do is instead capturing or keeping the full blown execution plan is I’m going to just keep a little stub. The first time it gets run and then when it get run the second time then I’ll keep the whole thing and make use of the being able to run it more frequently.

Steve: Yup and that optimized for adhoc workloads is one of those parameters that we most of the time will recommend people turn on.

Carlos: Yeah. I know we’ve talked about it before. I only ever heard of one person complaining about it which we actually talked about, like it was in Episode 99. Mindy brought it up, you were in the panel in Baltimore. I remember Wayne Sheffield talking that he’d seen some CPU spike but I think. Again, obviously you have to test in your environment, right? But it seems like it’s almost one of those standard features that you can enable now.

Steve: Yup, and that’s why I said we almost always recommend it. Not always but almost always. So then I guess, I mean as far as understanding what’s going on with your plan cache, and I know we talked about Database Health Monitor a little bit earlier but in the very beginning when I first created Database Health Monitor some of the very first reports that I built were around understanding the plan cache because I was working on an environment where it wasn’t well understood and I needed a way to built a show what’s going on with the plan cache.

Carlos: Sure. And I think at least in my first interactions with that are giving you the top queries because it will keep some statistics about the plans and their executions. You can go and start to interrogate that a little bit. Generally, from a performance perspective that was the first time I remember kind of going and taking a look. It’s like, well what are the top plans by CPU, or by memory or just how long it ran, something like that.

Steve: Yup. And there are four reports in there that I usually look at that are right around understanding the plan cache, and one of them is just called the plan cache report and it’s a pre database report. And what it will do is it will show you the first 500 largest plans in the plan cache and it will show you how big they are. So you can go and see, “Oh wow, we’ve got 12 plans that are very similar that are taking up 30K each. And you do the Math and you add it all up and realized, wow some of this add up real quick to taking up a lot of your cache. Another one that’s really handy is the needs parameters reports. And what it does it goes through it and analyzes the queries that are in the plan cache and it looks for things that could be parameterized and then it groups all of those together. So if you had 1,000 queries let’s say customer name in them that was hard coded in the query, it will go through and say that by fixing this one query it would reduce your plan cache from a thousand instances of that same or similar plan to be one reusable instance.

Carlos: Now let me ask you a question on that because I guess this is where I’m drawing a blank here, it was a gap, right because I thought that even. So we talked a little bit about stored procedures versus adhocs so views or inline queries. But I thought even though I was in line query and I’ve been written with like framework or something. If the SQL Server gets that query it’s still going to try parameterized it. Even those on store procedure I guess is what I’m saying. Obviously, the very reasons why I would have that but in that scenario what do you then go about doing to solve for them?

Steve: Well, take the example earlier where we’re querying the podcast tables, SELECT * from podcasts WHERE host = ‘Carlos’ or host = ‘Steve’. If you are running that code and it’s actually running that exact query hard coded from whatever application it is. But that’s what’s ending up in the plan. It’s hard coded with Carlos or Steve in there. That is taking up just for those two queries, two or sometimes four, plan cache entries. And let me just clarify that when I say it’s two, that’s the obvious, one for the query that is looking for Carlos one for the ones looking for Steve. But sometimes you will get a parallel and a non parallel version of that query in there so sometimes a single query will have two different plans in the cache. But to go back to what you’re looking for there. If the application is passing through hard coded strings like that each one that it’s passing through will get a different plan, so then that’s really all the needs parameters report does is it goes and finds the items in the plan cache that are very similar with everything besides the parameters.

Carlos: So I guess let me ask the question this way then.

Steve: I don’t think I answered that, did I?

Carlos: I think you did answer it. I think I asked the question wrong.

Steve: Ok.

Carlos: So I am misunderstanding and SQL Server will not in every instance try to parameterize your adhoc queries.

Steve: Yes, that is correct. And the way to tell that is to look at what’s in the plan cache. And if what’s in the plan cache contains those hard coded values for a name for instance, then they haven’t been parameterized. Or the other way to look at it is if you run it with two different parameter or two different names in that value do you get two copies of that in your plan cache? And if it is then it is not parameterizing it.

Carlos: Ok.

Steve: Now with that, I guess the thing I like to do is find the ones that are the most commonly ones that need to be parameterized. This only works if you have access to the code because if you’re running an off the shelf application where you can’t change any of the code you might not be able to do this. But if you are a development organization and you’re building an application, if you can go in and find that these are the queries that end up using the most out of the plan cache. They have big plans and they are called thousands of times and then you can figure out which one needs parameterization and go parameterized a couple of those you can often times have a big impact on the amount of those one-time or low use plans that are in the cache.

Carlos: Again, to connect the dots here. There is where we’re actually going to the code and instead of using that hard coding you are going to use like sp_execute or changing the way that you are making that call to the database.

Steve: Right, and I mean for instance if you are working in pretty much any programming language that I’ve ever seen or work with that allows parameterization, you usually pass through some kind of a parameter value like a variable name in place of what were you would be filtering on that hard coded string, and then in the code you say, here is the query patch it up with these parameters and then execute it.

Carlos: So the developers have to make one more step in putting that dynamic, I’m assuming it’s a dynamic query linking it all together. Before they send it to the database they need to make one more step to help us out of it.

Steve: Then we can have a whole other conversation for probably an hour for parameterization and the impacts that it has on preventing SQL injection or helping prevent SQL injection. I mean there is another benefits to parameterization just besides the performance. Maybe we’ll save that for another time. So another report, there are a couple of others that I look at is the one-time use query report. Did I already mentioned that one?

Carlos: I think we may have touched, we talked about it but this is just to show us how many queries have just been executed in one time.

Steve: Yup, and that’s a handy way to see how many of these are there. And if you look at your database and there’s 2 or 3 or a couple of dozen, you probably don’t have to worry about it. But if you find out that there are thousands of them there then maybe it’s something you need to look into.

Carlos: And then that’s where that optimized adhoc workload comes in.

Steve: And then the other report in Database Health Monitor that I really like to use and understand is the instance level memory report where you can go in and see how much memory is being used by each database, but it also shows you how much memory is being used by the plan cache. And it’s interesting, on some server the plan cache might be using more memory than some of your databases are. I mean it depends on the size of your database and performance and things, and the load, but it’s just good to understand how big it is. And I guess I said earlier you can’t really control the size of it but you can control the size of it by reducing the amount of one time used queries either through optimized adhoc workloads or by adding parameters.

Carlos: Influencing what’s in there.

Steve: Yup, yup.

Carlos: So another thing we will touch on, we talked about it earlier, as we talk about the ability to manipulate a little of this, right? So through using hints, recompiling, the other one is plan guides but we also want to say. Again, take it for what it’s worth but in my years as a database administrator I’ve only seen hints and guides used in two instances. And so I think sometimes particularly on the forms we kind of see that get rushed, again this idea of, “I want to make sure it never happens again so I’m going to put this very extreme process in place.” When maybe testing it out a little bit would be a better approach.

Steve: Yup. And I think in my experience I’ve seen that hints are used quite often but I’ve see that plan guides are used very infrequently. I just want to take a second to jump back to a previous podcasts where I talked about one of my things I would change around SQL Server I think was the term hints on plan guides. And that hints aren’t really hints, they are really commands. And plan guides aren’t really guides, they are commands that’s say, “You will perform this way.”

Carlos: Yeah, exactly.

Steve: I look at hints and plan guides oftentimes the first thing I’ll do on performance tuning is pull the hint out and see how it performs without it, and oftentimes things improve. But I think that they are kind of emergency band-aid type response that when you’re out of other options it maybe something to consider as a short term solution.

Carlos: Sure, and I don’t mean to say they shouldn’t be use. When they are appropriate they are appropriate. But I think, again kind of the whole rebooting the server, right? Like, “Oh it’s slow, let’s reboot it.” Stepping away from the knee jerk reaction. There are going to be instances where they are called for and where you’re going to be the hero for implementing them.

Steve: Yup, and I think plan guides are amazing, they are extremely awesome but they are extremely dangerous and I don’t want anyone to think of this podcast what we are talking about here is recommendation to say go try out plan guides for your performance tuning. If you’re hearing that you should translate to go learn all you can about plan guides before you ever try it because there are some negatives in there. If you apply a plan guide it may cause trouble when you try and recompile a stored procedure that’s being utilized or the plan guide is associated with.

Carlos: And again kind of going back to query store, that’s the flip side or other angle is going to help you understand is, “Hey, this query is running with a plan guide or a hint, just so you know.” It has been difficult. It’s something you really have to spend a time with so again, the knuckle dragging Neanderthal that I am to kind of understand, ok what’s going on here? How are these plans changing over time? So it does take some practice and just hanging in there. So you will get a little bit frustrated but hang in there and eventually it will come.

Steve: Yup. So I think just kind of a quick recap overall so basically the SQL Server plan cache is where all the compiled query plans are stored. They get compiled at the first time they are used or when someone indicates that it needs to be recompiled. And it’s kind of sized dynamically by SQL Server. You don’t really have a lot of control over that and there are some things you can do to adjust that like optimized for adhoc workloads and parameterization. And I think hints and plan guides can oftentimes cause trouble but they are kind of a last ditch attempt to try and fix queries.

Carlos: So again, ultimately we would like your feedback on this and one of the areas and kind of talking, reviewing this topic again is we would like to try to make Database Health Monitor better so we’re listening to your feedback. We’d love for you to take a peek at those reports. We’ll make sure that we put them on the show notes page and list them there. We’d like to get some feedback, so as you use them what do you like about them? What else do you want to see? How do you use them? Is there some other tool that you’re using to look at the plan cache. We would be interested in hearing from you about that and you can use any of the options of social media to leave us a comment or a thought there.

Steve: Or you can leave on the podcast show notes page as well.

Carlos: That’s right. So our episode URL today is sqldatapartners.com/plancache. Again thanks again for tuning in to this episode. If you want to connect with us on LinnkedIn I am @carloslchacon.

Steve: Or you can find me on LinkedIn @stevestedman.