Episode 105: When is enough, enough?

How can you tell when a change is enough of a change?  How do you know you didn’t make too big of a change?  Steve and I discuss some thoughts around what you might look for when you make certain changes.  Some of the components we cover include Max degree of parallelism, cxpackets, virtual log files, the number of log files for a database, backup retention and memory.

Episode Quotes

“Then I think on the cost threshold for parallelism, again it’s going to take a little bit of knowing your system.”

“What it really depends on oftentimes in most businesses is how much money you are losing for every minute that that database is down.”

“If your system is performing great no reason to even talk about more memory unless it’s growing and you’re planning for the future.”

Listen to Learn

01:37 About the Companero Conference
02:54 Max degreee of parallelism, knowing the sweet spot
06:28 Max degree of paralellism , number of cores
10:08 CXPACKETS Waits
12:05 Cost threshold for parallelism
14:54 Virtual Log Files (VLF), growth and issues
18:53 What makes VLFs a bad thing?
23:23 How do you know if you have enough, not enough or too many?
29:00  Number of log files for a database
30:39 Backup retention periods and scenarios
37:35 More on memory
41:50  Page Life Expectancy

Transcription: When is enough, enough?

Carlos: Companeros welcome to Episode 105. It’s good to have you on the program again today. Steve, how is it going?

Steve: It’s good. Today’s topic is on when is enough enough. So Carlos, when is enough enough.

Carlos: Yes, that’s right, so this question was asked to me personally during a session as we were talking a little bit about parallelism and the change made to the cost resolve for parallelism. So the question was, ok now that you’ve made this change how do you know that you’ve raised it too high and so I thought we could talk a little bit about kind of extension of this idea from base lining which we’ve talked a little bit about in previous episodes, kind of best practices and now as we start putting in some of these things some of the sweet spots if you will and how we know that we’re in that sweet spot things to look at. So that’s kind of our topic for today.

Steve: Alright, so do we have any companero shout outs to mention?

Carlos: We do. We want to give a companero shout out to Daniel Temensken. He is a PFE at Microsoft. Daniel says, “Thanks for what you do.” He tries to plug-in when possible for us to keep up the good work, and so Daniel we’re thankful you Microsoft guys are keeping an eye on us and keeping us in line.

Steve: I don’t know about the keeping us in line I haven’t heard any feedback on that side yet. We’ll see.

Carlos: It’s only good and I don’t filter all of that to you, only the bad stuff. Ok, we’re still planning for the Companero Conference in October and one of our speakers, Mindy, she’s been on the podcast earlier and while we were chatting with her she gave us a little information about herself and what she’s going to be presenting at the conference.

Steve: Yes, I think one of the things I like about the conference is the whole single track concept and I’m going to get to see all of the presenters this time. Well, I’m not going to see me unless there is a mirror. But I’ll get to see all the sessions and I’m looking forward to seeing Mindy’s session as well.

Carlos: Yeah, that’s right. I think it’s going to be good. Again, just the ability to be able to review kind of see things from a different perspective. I think it’s going to set up a lot of questions for people to continue their learning. So let’s see as we get into the episode today. The show notes for today’s episode will be at sqldatapartners.com/enough.

Steve: Or at sqldatapartners.com/105 for the episode number.

Carlos: Yeah. I guess let’s go ahead and jump into this conversation. We have a couple of items or setting if you will that we’ll talk about. The first one, let’s go ahead and kind of tackle the max degree of parallelism. How do we know when we’re in a sweet spot?

Steve: Alright, time out for a second.

Carlos: Sure.

Steve: I’m getting a bunch of noise coming from somewhere. Hold on I need to close the door or something.

Carlos: Sure, sure.

Steve: Alright. It’s my neighbor has a leaf blower. Can you hear any background noise because of that?

Carlos: Yeah, I was going to say I could not hear it but I was talking at the same time, but nothing really stood out.

Steve: Ok, it sounds like he has moved around the other side of his house now. Sorry about that, so let’s jump back into the max degree of parallelism.

Carlos: Alright, I think this idea so I guess we’ll start with kind of the best practice and that is. What we’re looking at is. I guess we should start this conversation with all of these answers are going to start with it depends, right?

Steve: Yes, it does.

Carlos: And so we’re going to talk about some of these things in kind of lose general generic terms but that doesn’t mean you should necessarily adopt them as gospel. Obviously, your individual scenarios or your environments are going to
dictate some of this and so we’ll try to address a handful of them but we won’t be able to address every scenario.

Steve: Yup, but as far as max degree of parallelism, I don’t like to talk about that alone. I like to always talk about cost threshold for parallelism at the same time because I think the two of these goes so hand in hand. I think that there is a lot of confusion and just sort of downright bad defaults on both of this. And I guess both of these originated somewhere in the 90’s back SQL Server 65 or 7 or somewhere around there. They may have made a lot of sense 25 years ago. But the defaults you have on those today are not so great and I think that it comes down to the fact that hardware has evolved a whole lot since then.

Carlos: Yeah, Adam Machanic has his great line there where he says that these only make sense if you jump to a time machine and go back to guy who developed this at Microsoft, and use his computer to execute your queries.

Steve: Right, right. And unfortunately or maybe fortunately in this case we don’t have a time machine. So the default for the max degree of parallelism is zero which means unlimited parallelism. The default for the cost threshold for parallelism is five which means any query that has an estimated plan of greater than five it will be considered as a parallel query.

Carlos: Sure. The reason for max degree of parallelism is zero is because. As you begin to install SQL Server and you could think about it from their perspective, they don’t know how many CPUs you have, right? And I guess dynamically trying to figure that out and all the different scenarios that you might have as far as how much stuff you have on the box with SQL Server which I think is listening to a degree if you’re probably listening to the show. SQL Servers by itself, again there are going to be exceptions to that and so you’re going to see that conundrum that they are in there.

Steve: Yup, so the max degree of parallelism there is a lot of different opinions out there on what it should be set to. But most of them fortunately are kind of in the same range and typically what I see on this and what I like is close to, and I say close to because there is a lot of opinions there, but close to a max setting of 8. Or a max setting of the maximum numbers of cores you have or the max number of cores per numa node, and whichever number is smaller out of those.

Carlos: Right, so there is a little bit of Math involved there, right? It’s all the variations that are going to play into that. So taking into example that number 8, right, if you only have four CPUs, right you’re number is going to be four.

Steve: Right, but if you have four CPUs that are split amongst two numa nodes which usually numa is not that common with that few of CPUs. But if it was you would want to set it to two in that case. Sort of part of it around there, I mean usually with smaller number of CPUs like less than six or eight CPUs you’re probably going with the number of CPUs 99% of the time unless you have numa enabled. But when you get into SQL Servers with many CPUs this is where it can get really interesting. So if you’ve got a server with 50 CPUs, or 30, or 80 CPUs or even more than that. What could happen here is that a small query that probably shouldn’t go parallel could go parallel and could be run on all of those CPUs. I need to start saying this differently. They may be running all those cores because generally it’s not CPUs. It is multiple cores per CPU. So be run amongst all those cores so and that can be really inefficient. And generally what I see even on
servers with lots of cores is that 8 is a good setting. I mean it’s what a lot of people are recommending. It’s kind of the best practice to use there as a maximum because what you get into is when you have more than 8 there is a lot more work involved in bringing back all of the results from each of those different CPUs or different cores doing the work. There is a lot of work involved in sort of collating the results and waiting and if one of them takes a little bit longer there’s maybe a lot of waste going on there. You’ll see that show up because you may have a lot of CXPACKET Waits which maybe we should take a second to talk about that because CXPACKET is one of those wait types that is oftentimes there is a lot of misinformation out there around it. Some people say, “Yeah, I don’t have to worry about about CXPACKET Waits that’s just means that parallelism is occurring.” For the most part that’s probably true most of the time but you can get into a point where there are things that will cause excessive CXPACKET wait, one of them being incorrect settings on your parallelism. And when I say on your parallelism it’s really on your max degree of parallelism setting and your cost threshold for parallelism setting because those kind of go hand in hand there.

Carlos: I guess if we’re going to jump here so that question is we’ve made the change. How have you decided to make your change? You’ve now made it, how do you know, like how can you determine that you’re in a better place. I think there are a couple of components there. One I think the overall percentage of waits because CXPACKET will probably continue to be in there but you should see other waits to kind of float up. There should be a better mix if that’s the right word if for example CXPACKET waits was taken up 99% of all the waits. That should float down a little bit.

Steve: Yup, and usually what I look at there is as if more than maybe 50% of your waits is based on CXPACKET that maybe there is something there you should take a look at. If it’s more than 90 there’s probably there you should take a look at. If it’s up to 99% then there certainly something you should be looking at.

Carlos: Then I think on the cost threshold for parallelism, again it’s going to take a little bit of knowing your system but you just want to go and take a peek at, which is easier said than done but kind of getting in and knowing those queries what the big queries are and ensuring that they’re still parallelizing, right? And then do you feel like there are queries that are taking up lots of CPU that might benefit from parallelism and then again that’s kind of one of kind of decision as to whether try to tweak that or not.

Steve: Yup, and I think that there is kind of a balance there because you don’t want every query to go parallel. But the flip side of that is you don’t want every query to be single threaded either. And what the cost threshold for parallelism is really doing is it’s saying at what point, at how expensive of a cost are we going to decide that parallel processing in that query maybe better, and the default for that is 5. And if you look at most queries that you are running today I would assume that most people that are running queries or looking at queries for performance, they’re going to have a cost of more than 5. There sure there are some quick ones that don’t, and what that means is most of those queries are going to be processed in parallel. And that may not be a bad thing depending on your server but there are some queries that are low cost but they are greater than 5, so maybe they have a cost somewhere between 5 and 50 or 5 and 60, that they would probably run more efficiently if they weren’t ran in parallel.

Carlos: Well that’s the tricky thing I mean even though they exceed that threshold, parallelism is available to them but that doesn’t mean they will always run in parallel.

Steve: That’s right because you’ll end up with two plans. You will have a parallel plan and a non-parallel plan and with that the query optimizer will decide which one is better. I mean oftentimes from my experience it will go with the parallel plan at that point but not always.

Carlos: Got you. Yeah, again that’s another tough one. The answer there is that you’re
going to have to know a little more about your, watch your environment and see how the changed that you’ve made, how it’s affecting your numbers and then even drilling down to the query level if you will to determine if more changes are needed. So from there we’re going to jump into log files.

Steve: Alright, so the thing that usually comes up around the conversation of log files is virtual log files or VLF. And just a little bit of background before we jump into the enough is enough part of it but the way VLFs work is that as your log file grows, every time the log file grows additional VLFs are created or these are segments inside of that log file that are broken up that can be used individually one at a time. So when SQL Server is writing into the log and it always does this sequentially it will pick a next available VLF chunk that it can write to. It will fill that up and then move on to the next one, and the next one, and the next one after that. And those stay used, well if you’re in full recovery model, they will stay in used until you have a log backup. Now, if you’re in simple recovery model they will stay in used until all of the transactions that are in that like they are using that VLF until all the transactions have completed.

Carlos: And then that can be marked as “deleted” or being able to be over it.

Steve: Yeah, available for reuse basically.

Carlos: Available for reuse that’s a better word.

Steve: So what happens though and this part varies a little bit between different versions of SQL Server. SQL Server 2014 and 2016 did a little bit better job in how virtual log files grow but it used to be that when your log file grew if it was smaller than a certain size that growth would have four virtual log files associated with it. I think that size was around 256MB. Now, if it was between like 256MB and a gig you ended up with 8 virtual log files. And then if the growth was greater than a gig you ended up with 16 virtual log files. So with SQL Server trying to guess based off the size of that file growth how it could chunk that up appropriately so that you get sizes that would be more reasonable to use. And then with SQL Server 2016 and 2014 there was some changes around that so that when log files grow, with the smaller growth sizes you would oftentimes only get one VLF rather than several VLFs at that point. But the problem that you run into is that a lot of people or lot of databases have some growth settings initially and a lot of the defaults would either grow by 1MB or 1% with a starting size of 10MB. And as it grew you would end up with all kinds of really tiny virtual log files. And what that meant is that if you have a big transaction it would have to span multiple of these VLF files that were really tiny.

Carlos: Sure, and again kind of more work because it had to deal with multiple little files to get in and out.

Steve: Yup, I mean depending on how database was created it will oftentimes if you are an accidental DBA you don’t know about VLFs you might never have checked this but I’ve seen some that have had over 100,000 VLFs in a single log file. But why is that bad? Part of understanding when enough is enough is to know well what makes that a bad thing? I mean it’s just a lot of chunks in a big file. But what makes that bad is a couple of things. One is that when a transaction runs it needs to use multiple of those VLF chunks in order to be able to write to the log, and with big transactions you got to have multiple of them in use which may make it harder for the SQL engine to be able to write everything it needs to happen there. But the flip side of that and this is the real killer is that when you’ve got
that many and you try and restore your database SQL Server has to go through and recreate all of those VLF chunks when you do the restore. So part of restoring the database is allocates the data file, allocates the log file, and while it’s allocated in the log file it’s writing all those chunks. And I’ve seen restores of databases that took 8-10 hours because the VLF count, and then in the same database after reducing the VLF to a more reasonable number took an hour to do the restore.

Carlos: Wow! You know, obviously disk speed and all of that comes into play as well, right?

Steve: Oh yeah, and that’s just the same database in the same disk doing the comparison there. I’ve seen it be 8-10 times long because of a lot of VLFs.

Carlos: That’s an interesting point because normally we talk about performance, like we’re talking about application performance. Like my user is trying to get information out of the system, right, that scenario. But in this case one of the bigger performance killers is like an RTO perspective. I now need to restore that thing, “Well, I’m not going to be able to get the performance to meet the expectations of my users”, and that could be a problem.

Steve: Yup, and that’s one of those. I always try and think about that as if the system is down and you have upper management looking over your shoulder or calling you on the phone continuously saying, “Is it done yet?” And you’re just there twiddling your thumbs waiting for the log file allocate and not being able to tell them. I mean you’re thinking, “Oh, it’s got to be done soon.” But it could 6 or 7 or 8 hours before it’s done. I think that’s one of those things that misunderstanding of the VLFs there could lead to what people end up referring to as an RGE or a Resume Generating Event where you oftentimes, if you tell management you got a 2-hour recovery time but it turns out to be 10, that may be the end of your job depending on where you work.

Carlos: Sure. Now I hope that they will give them a little bit of leeway as long as they can get backupped. Now if they can’t get backupped that’s a different story. But yeah, that would be a rough place to be if they were that tight with the leash there.

Steve: And what it really depends on oftentimes in most businesses is how much money you are losing for every minute that that database is down.

Carlos: Exactly, that’s right.

Steve: I remember years ago when I worked at Amazon.com one of the things there that they measure for any outage was how much money did we lose. And if that money is a few hundred dollars it could be very different than if it was a few hundred dollars.
Carlos: Right. Yeah, I know that’s true. I think it helps put things in perspective. And again that kind of goes back to the culture of recognizing the value of things being up and then hopefully if that’s the window and you’re pricing things that way which I think again as administrators we could probably do a better job of saying, “Hey, what, you know what…” And I guess I’ll use the ChannelAdvisor guys those have access to selling things outside is a little bit easier. Customers purchasing products that’s easier to tally that downside or the cost there. But to be able to calculate that and say, “Hey, look you know what guys if we can’t do this then we’re going to lose X number of dollars. It’s going to cost Y to put it in.” The ROI does make sense at that point kind of a thing.

Steve: So then with virtual log files, how do you know if you have enough, not enough or too many?

Carlos: Yeah, that’s a good question. I was thinking about this and you put it on the list. I mean, so in my mind and luckily right, as knuckle dragging Neanderthal I haven’t had too many experiences having problem with this, Maybe at least that I have recognized. I think as long as I have a more consistent size that’s kind of where I feel better about things. What about you, Steve, when do you think enough is enough there?

Steve: Well, ok so really what it comes down to is how long does it take you to restore that database. I mean, that’s the key thing that I look at on VLFs. So if you have an environment where you’re doing a regular restore from production to a test server or a development server, that’s a great way to know. But if you don’t have that hopefully you’re in an environment where you test your backups. And if you have a backup that takes you let’s say an hour to run but 6 hours, assuming it’s similar hardware then that could be an indication that you might have a VLF problem there. However, that alone is not the only indicator. I have a script that I created and you can get to that on stevestedman.com/vlf, and it does the DBCC LOGINFO command and then puts that into a temp table and then does some sort of visualization with sort of a character based bar chart in the query output window.

Carlos: In the result, you could actually see the size of the individual files and kind of gives you an indicator as to how big they are.

Steve: Yup, yup, and with that, and again there’s a lot of opinions out there, it always comes to it depends. But my rule of thumb is that any time it’s over 1,000 VLF files in any one database or any one log file associated with that database that’s something that usually I want to deal with that right away. Anytime it’s over a couple of hundred or maybe 300 VLFs in a single log file that’s something that I like to deal with but it’s not super urgent. And just keeping it somewhere in that range. I think opinions will vary but I think most people who have experience with VLFs would agree that more than 1,000 VLFs can be an issue. And that many will also agree that more than 500 is something that wants attention as well.

Carlos: Right, and I think that’s a great point is that you have a metric to go back on and that is your restore time, right? So that can be your benchmark. That can be your feel good. You know now obviously there’s only so much performance you can squeeze out of that, right? It’s not going to go from 10 hours to 5 minutes I don’t think.

Steve: No. But it may go from 10 hours to 1 hour, and that could make the difference between staying up all night to deal with an issue or at least getting some sleep that night. Now the flip side of that is you don’t want to have too few of VLFs either. I mean, if you have too few like let’s say you only have 16 VLFs in your whole system. That would mean you only had 16 chunks of the log file, and when of them, depending on how things cross over between the VLFs that they would me marked as in use and they would stay as in use until all transactions touching them were complete or until they have been backupped. It’s kind of a balance there.

Carlos: And it also depends on the size, right? I mean if I have a new database, even like your DBA database that you have your own internal stuff in, 16 might be fine.

Steve: Oh yeah, absolutely.

Carlos: Then I have the production system with all my transactions on it that might be a different story. And again, I guess, I lean a little back on that because if I’m growing them in equal chunks, if I grow the log in equal chunks, then I guess I’m kind of trusting a little bit that they are going to grow in the best way, and the equal sizes they have equal number of VLFs so a little protection there and the database kind of indicating how many VLFs there will be for the size that I’ve specified there.

Steve: Right, but I think part of that comes back too to having your log size large enough that it doesn’t have to auto grow over time.

Carlos: Yeah, great point. That’s right.

Steve: The auto grow is kind there as in case of emergency this is what we want to grow but hopefully we’ve got that log file big enough that it’s not regularly growing. So do we want to talk about how we fix those?

Carlos: No, I think we’ve kind of mentioned them in different places or we can comeback on them, I mean, even in your performance tip. I think we’ll say that on different episode.

Steve: Ok.

Carlos: The only thing there on log files is so number of log files. We talk a little bit about sometimes dividing up or creating multiple data files. TempDB is kind of the classic example although it can be use in our other databases as well. But what do you think, number of log files?

Steve: It’s simple, one per database. The reasoning behind that is that log files are written to sequentially and whether you have one or you have 20 it’s still only ever be writing into one log file at a time.

Carlos: Yeah, exactly. So that idea of being able to use thread, parallelize, or how you want to think of that, being able to use each of those files without having to wait on something else won’t apply to the log.

Steve: Right, and I think the misconception there is that if I create two or three or four log files that it will be load balance between them as transactions are written but it doesn’t work that way. It just starts on one and uses until there’s no longer available space on it and we go to the second one if there is a second one. And it really just doesn’t make any sense especially if they are on the same drive. It really just doesn’t make any sense to have multiple files. I don’t believe I have ever come across a use case with SQL Server where I would have recommended multiple log files.

Carlos: Right, interesting.

Steve: So enough is enough, one is enough, two is too many on the number of log files for a database.

Carlos: That’s right. So another I want to go to and that is backup retention periods. We talked about, so ultimately we get into a scenario where our backups might be taking a little bit longer because we have two much, well I guess one history but I guess we’re using the word retention is how long we should keep them for?

Steve: Right, and then there is this sort of pack rat view of that which we keep them forever. And then there is the we’re tight on disk space view which means we keep them for as little as possible.

Carlos: Or just delete the ones that I don’t need so I can make room for the ones that I do need.

Steve: Right, right, so hopefully there is some balance in the middle there that can find. I think that from the DBA perspective, I mean I would like to have as many around as I possibly can because when something goes wrong that you’ll need a backup to recover from for instance corruption, oftentimes people don’t catch it immediately that day. It might be even a week or depending on your monitoring longer than that before you know that there is something wrong that you might need to go pull something out of the backup. Now, usually if you need to do a full recovery from a backup and you’re just going to abort the current database, usually you know those events pretty quick. But it’s the type of event where you realized something was missing, maybe some rows out of a table were deleted three weeks ago and you really need to restore that database and just pull those rows back in. It’s hard to know exactly what that retention should be.

Carlos: Right. I’ve worked for a couple of .coms and another scenario is not just the data but it’s either a data change or the table of table change, right? So a column gets added, a column gets removed, whatever and then all of a sudden because the reality is this change control process maybe got side step or whatever because it’s not good history. Nobody can remember exactly when the change got made and now all of a sudden it’s affecting things, so that ability to be able to go back and restore one from a couple of weeks ago and say, “Well, it was either this way or not this way as of this state.” I feel like been able to at least get things off of my plate by being able to provide that information.

Steve: I think one of the things that comes down to there on the backups and how many is enough is sort of a granularity based on how old the backups are. So for instance for me if I had unlimited space or a lot more space than I ever use for database backups, I probably wouldn’t keep every single backup. And most of the backup solutions that you have built in with SQL Server or other scripts available out there, they don’t really consider the granularity of backups over time, meaning you have a small retention window. After X number of days all backups are going to be deleted, or maybe after X number of days all your full backups are deleted, after different period your differential backups are deleted and after different period log backups are deleted. Hopefully you keep your full backups around the longest because you can’t use the other two types without those. But one of the things that I’ve come across when dealing with corruption is somebody is able to discover a corruption and it’s been in their system for months. And I’ll ask, “Well, do you have a backup from when before the corruption occurred?” And then the answer is oftentimes “No”. But sometimes somebody else will say, “I’ve got a backup from 18 months ago.” Will that help that all? And sometimes depending on how fast your data is growing or where the corruption occurs and 18 month old backup might be nice to at least see what was there or compare to what was there on older data on the database. So one of the things I like to do in a perfect scenario is maybe, like let’s say you’ve got enough space for one month of full backups but rather than keeping around a month of full backups taken let’s say every single day, which most people don’t have the space for that. But let’s say you did I’d rather keep around a week of full backups and then second week would have maybe backups from every other day and then the 2nd to 4th week or the 3rd and 4th week you might have backups once a week and then your retention period beyond that would be maybe a single full backup once a month that you keep around.

Carlos: Something like that, right.

Steve: Yeah, and I know that’s a lot harder to manage, and it might be as simple as you just have a job that once a month it takes one of your backups and copies it off to different network location.

Carlos: Right, we used to do that for auditing like the end of the year backup, you know, database for auditing purposes. So that auditors would come April, we have to keep this thing around for a time.

Steve: And my experience has been working with different areas of management is that oftentimes they don’t always understand exactly what it means when you talk about the backup retention, and when you explain to them that if we have a 2-week backup retention that means we have zero backups that are older than two weeks. Oftentimes that creates a bit of fear there that will lead to possible solutions.

Carlos: And of course one of the classic entrances into the cloud because nobody wants to go out and buy more disk space for just backups. The disk space has become very commoditized, very cheap in the cloud and so being able to store that information for a little bit longer. That’s kind of a lot of people’s first taste with getting a cloud technologies into their environment.

Steve: Yup, and I think that backups are great way to start with it.

Carlos: Well, let’s see we have a couple of others. I think I want to skip to TempDB. Do you want to touch CPU and memory? Or you want to call this an episode.

Steve: Yeah, I think it would be good. I mean you’re saying we want to skip TempDB. Yeah, because enough is enough on CPU and memory, I’d always take more.

Carlos: Yeah, that’s right. So ok, let’s hit on memory then just for a second. Yes, enough is enough, that’s a great question for that and there is this saying I hear somewhere that more memory will cover up a multitude of sins in the coding and all these issues that you could have in the database.

Steve: Yeah, a great way to fix I/O problems is with more memory.

Carlos: Yeah, exactly that’s right and the database just lives in the memory then no I/O problems.

Steve: So interesting on that I worked on a SQL Server on a few years ago where it was having some performance problems and it had, if I remember correctly, 64GB of RAM and they increased the memory from 64GB to 512GB of RAM.

Carlos: Oh, wow, so that’s substantial.

Steve: Yeah, very substantial.

Carlos: Exactly a license leap I think. I feel like 128GB is the.

Steve: We’re already on Enterprise Edition. Yeah, but that kind of a jump and at that point in time I think the cost for the memory was somewhere around $10,000 +/-. But that basically got rid of all the I/O issues that we were having there. Well, it still had a few I/O issues when you restarted the instance because nothing was cached in RAM. But once the instance is up and running for a bit it had so much stuff cached in RAM. I mean it was so much faster because it never had to go to disk to get anything. Of course it had to go to disk when there are rights but most of the performance issues around reads on this database, and it just took care of all the performance issues at that point. That was probably 3 or 4 years that continued to run without any significant performance issues simply by adding that much memory. Now, the side was effect was it also made it so that some of the developers not have to worry about performance tuning. So that when the database eventually grew to the point that that wasn’t enough memory for it, well they may have had more difficult performance issues to deal with at that point.

Carlos: Sure, and then it all comes down crashing down at that point. You’re probably talking about another system to go to the next level of memory at that point because computers are getting more robust. You know, once you start talking about terabytes of memory those are different systems.

Steve: Yup absolutely, so the question on how much enough is enough. Well, on that specific system where we set to 512GB of RAM. The thing I noticed that when it was normally running, for probably the first year that that was running it never exceeded about 400GB of memory used.

Carlos: Oh, that’s interesting because that was probably the size of the database.

Steve: Yeah, when on that one when everything was cached up and whatever the size of the database plus Temp tables and everything it was using, I mean it really didn’t exceed 400GB. But then a few years later it grew and it eventually got up and hit that limit but what that told me, when sort of it had that flat line or right around 400GB just sort of it, was that we perhaps bought too much memory and I kind of bite my tongue as I say that because it’s hard to say too much memory. But the fact that we never used more than 400GB indicated that if we had put 400GB of RAM in there that would have been enough.

Carlos: Sure, the application would have worked just as fine.

Steve: At that point in time.

Carlos: Yup, at that point. Yeah, so how do you know when enough is enough? I mean, obviously I think there is the indicator that people have lots of different thoughts about but the Page Life Expectancy (PLE) which I think has kind of been kicked around a little bit. I think for better or for worst somebody wrote a whitepaper at Microsoft to kind of came up with a recommendation that was quickly adapted as the standard. So in environments where we don’t have 512GB of memory, how do we know when enough is enough?

Steve: Yeah, and I think that comes down to sort of balancing the Page Life Expectancy with the page faults and knowing when something has to be bought in from disk versus when it’s able to be read from memory and looking at the overall performance. I mean if your system is performing great no reason to even talk about more memory at that point unless it’s growing and you’re planning for the future. But if you’re having performance issues there, whatever it is, whether you’re doubling from 4GB to 8GB, or doubling from 32GB to 64GB. Memory is oftentimes a cheap way compared to alternatives to improve that performance. So one of the things I like to look at because you brought the Page Life Expectancy was to watch how that Page Life Expectancy grows over time. One of the things that if you chart it over time and how it’s growing. If you’re running a regular CheckDB job, oftentimes that will skew your numbers because when CheckDB runs it of course has to bring the entire database in the memory bit by bit. And oftentimes when that happens you’ll end up pushing a lot of pages out of memory and it will skew the numbers on your chart there. But if you weren’t running CheckDB, how long does your Page Life Expectancy continue to grow? And if you chart that you could can sort of see pattern where it will grow and grow and grow and then at some event happens. And that event might be CheckDB or it might be nightly ETL process or it might be some job that runs that has to pull in a whole lot of data on a regular basis. But if it continues to grow and grow and grow throughout the day until you hit a few of these events that tells me that most of the time you’ve got enough memory and it’s only those certain events that you hit that would have prevented it from being read from disk if that data was cached in memory. And if those things are happening in the middle of the night or they are not impacting the actual business at that point. Yeah, no big deal, I wouldn’t worry about it. But if it’s the kind of thing where it’s an ETL that kicks off at midnight and it runs through until 10AM and it is impacting business in the morning well you may want to consider more memory to help with that. Let me just finish one thing on that first. So that being said though, you would want to understand that it’s definitely a memory constraint before throwing memory at it. Because I’ve seen people throw memory at long running ETL jobs and then they found out that it has no improvement because the bottleneck is not the memory, it’s something else in the system.

Carlos: I guess I’ll kind of go along with that a little bit in to take a peek at, obviously you don’t want to take a look at memory consumption, but the thing that I found out helpful is also just taking a look at the chattiness of the disk. You know, one of the things that I have found is that databases will kind of get grouped together and you have one database that’s kind of like the mission critical or very important. It’s rising in this rank, maybe not mission critical but it is becoming more and more important. But then you’ve talked on these other databases that are kind of one offs or not as “important” and then you found out that they are the ones that kind of being chatty, or they are taking up more space than they should. Those are the situations where for the system that you’re most interested in maybe you have enough and it’s just a matter of moving that lesser database of somewhere else so that the system that you care about is not impacted.

Steve: Yup, that’s a great example and that’s a great reason to have different instances running in different VMs, and that you could constraint some of those less important databases to not chew up the memory that way.

Carlos: Right. Ok, so I think that’s going to be our episode for today – Enough is Enough.

Steve: Enough is enough.

Carlos: Yes. Do you agree with our list? Give us your thoughts, let us know. You can leave you comments on social media or the website at sqldatapartners.com/enough or on sqldatapartners.com/105.

Steve: Oh, I just totally tripped there. Didn’t I?

Carlos: That’s fine.

Steve: Sorry, Julien. I’m making this hard for you to edit today.

Carlos: So let us know if you agree with our list. If you want to leave us a comment, you can do so on social media or you can do so at sqldatapartners.com/enough.

Steve: Sorry, I am just blowing it today because I was thinking about our LinkedIn and Twitter at the end. Let’s do one more take on that because I just spaced. Oh, sorry.

Carlos: No, that’s fine. I guess is there anything else we need to include before we end or do we want to wrap up there as well?

Steve: No, I think we could just wrap up there because we’ve kind of already included everything.

Carlos: Ok, sounds good. So that’s our list, let us know if you agree. If you have other comments you can reach out to us on social media or you can leave as a comment at sqldatapartners.com/enough.

Steve: Or at sqldatapartners.com/105.

Carlos: You can always connect with us on LinkedIn. We love hearing your comments and connecting with more of you. You can reach me I’m @carloslchacon.

Steve: And you can get me on LinkedIn @stevestedman, and we’ll see you on the SQL trail.

Episode 104: Keeping up with Technology

Do you have any experience with [Insert random technology]?  Your heart starts to race and your palms get a little sweaty.  You don’t want to say no–we’re tech folks–we know stuff, but there are so many new things to learn!  How are you supposed to keep up with it all? In this episode, we chat with Eugene Meidinger about his thoughts on keeping up and his ideas on the most important learning components.

Episode Quotes

“Keeping up with technology itself, like it’s impossible.”

“One of the important things is having awareness on what the problem is and what the challenges are.”

“One of the things that we’re afraid of is our skills decaying.”

Listen to Learn

01:08 How do you keep up with technology?
01:43 Eugene’s points on keeping up with technology
05:20 People who keep up with technology
06:13 How to stay relevant when it seems impossible to keep up with technology?
07:28 Generalization and specialization
13:03 Developing mastery and expertise
15:40 Steve’s experience in teaching a DBA class at a university
17:04 Generalization examples, job interview process
18:14 Rich mental model
20:25 Analogy of keeping up with technology as radioactive decay
23:00 Three things to have a longer “half life” with IT knowledge
26:30 Big Data or Pokémon site
29:20 Things that last: People Skills
30:31 The idea of having a periodic table of skills
31:30 Understanding theory, fundamentals and internals
35:03 Discussion summary
37:03 SQL Family questions

Compañero Conference
How the SQL CAT team monitors databases on Linux
Big Data or Pokémon?
Eugene on Twitter
Eugene on LinkedIn
Eugene’s Blog

About Eugene Meidinger

Starting out as an accidental DBA and developer, Eugene Meidinger now focuses primarily on BI consulting. He has been working with SQL Server for 5 years now, and is certified in Querying and Administering SQL Server 2012. He is a Pluralsight author on Power BI and also co-leads the Pittsburgh Power BI user group.

 

Transcript: How Do You Keep Up With Technology?

Carlos: Eugene, welcome to the program.

Eugene: Thank you! I’m very excited to be here.

Carlos: Yes, it’s good having you. You have been a guest perhaps unofficially before on a couple of our panels when we were up in Pittsburgh and then Baltimore. You contributed to the conversation, we’d met, started talking and we want to get you on the program so thanks for being here.

Eugene: Yeah, definitely.

Steve: I guess we should say, welcome back.

Eugene: Well, I’m happy to be playing a starring role. I’m still mad at myself the first time because you’re supposed to say your name whenever they gave you the mic and I forgot to do that, so I’m just like Guest Speaker 3 or something like that on the first one.

Steve: The unknown voice with no credit.

Carlos: Yes, so we apologize. But thank you for being here and chatting with us today. This is actually an interesting topic and we had this as a SQL Family question and I thought that you had an interesting take on this. So the question that we have is how do you keep up with technology? It’s all over the place and of course we’ve even introduced since then kind of SQL Server in the News segment and it’s just amazing all of the different things that continue to come out of Microsoft. Let alone all the other companies out there. So I’ll ask you the question, let’s get started here. How do you keep up with technology?

Eugene: I think you have to just not sleep ever and then you’ll be fine. But for everyone else, anyone who happens to have a family or a significant other, or kids, hobbies, or just regular human body you’re not going to do a very good job of keeping up with technology. I think in many ways it’s not a very well defined goal. I think it’s more of an emotional fear. I think we’re afraid of two things. I think we’re afraid of losing our job or becoming so irrelevant or obsolete that we can’t easily change jobs anymore, that’s the first thing. I think there is a large number of us who fear becoming that COBOL developer who’s never updated his resume, and maybe has a good job now but there is a recession or they want to work somewhere else and they’re out of luck. I think that’s a fear that’s driving a lot of us but then the other question or the other fear.

Carlos: And I think there’s one, maybe lesser fear but I feel it’s kind of out there is, you know whatever, a social situation, “Hey, what it is that you do?” “I do COBOL.” And tech setting and they know what that is and you’re going to get the, “You’re really old.”

Eugene: I can tell you something. I’m still technically at my 20s and I don’t put VB6 on my resume but I know how to write VB6 for similar reasons.

Steve: So are we putting VB6 then on the same category as COBOL now?

Eugene: I would say technologies that I want to avoid.

Eugene: No, Ruby is basically a VB6 with some prettier syntax. I mean you could make the argument. Yeah, no, it’s definitely look down upon for being behind so one main thing is you want to keep your job. But then also you want to keep your friends and family, right? Because I joked earlier that, ok well, you could spend all of your waking hours reading books and watching videos and doing all the stuff and you probably do a good job of keeping up with technology but for me personally 9:00-10:00 PM is sacrosanct. We do not mess with that time. That is our date hour. Me and my wife are hanging out no matter what.

Carlos: Very good.

Eugene: Yeah. It’s important and so there’s balance. I think really what people want to know is how do I keep my job? How do I do it in a way that doesn’t cause me all this grief and anxiety and frustration? Keeping up with technology itself, like it’s impossible. I mean, you follow all the different things that are coming out with Azure. They just talked about CosmosDB where they took DocumentDB and then they slapped on four other different no SQL database models, right? And you’ve got SQL Server 2017. I really hope we’re not switching to an annual model. But they put Python in there. They’ve got all these other changes going on. There’s just all these different stuffs and so you’ll look at all of the things and I just don’t[00:05:00] think, the way people to find it’s possible to keep up. There really is just too much stuff. Maybe 30 years ago you can keep up with SQL but today you can’t and if you count everything, if you count all these different changes.

Carlos: Yeah, this question perplexed me for a while, and I actually asked it when I was on SQL Cruise which is another reason why we’ve been inspired to do the Companero Conference because I was impressed and I felt like there were a couple of people that did a pretty good job of keeping up. But I’m not sure, and not to say that they are not keeping up, but the more that you follow them, the more that you kind of see some niching going on and the more that you see content sharing, right, so they’re kind of sharing what other people are doing. Similar to what we’re doing here. We don’t know all the technologies but we’re bringing people who do and can talk about it. So that’s one interesting facet that I’ve seen there. Sorry Steve, you’re going to say something?

Steve: I was just going to say given all these, I mean, it’s nearly impossible to keep with all technology or even all things in SQL Server. But you need to keep up but you need to keep your job as you said and keep your friends and family. So what do you do? How do you go about staying relevant at that point?

Eugene: I think one of the important things is having awareness on what the problem is and what the challenges are. I think there are a couple of different sources of where this is actually a challenge, so one of the things that we’re afraid of is our skills decaying. We’re afraid of being that COBOL developer and our knowledge becoming less and less relevant over time. That’s one challenge. There is a challenge where we’re worried about all these new technologies. I think the cliché example is JavaScript Frameworks. It seems like there is a new one every 6 months and you don’t know which is the right horse to bet on, which is the right choice to make. I think two really big things, just talking about generalization and specialization. In my mind, specialization is how you pay the bills. You have to pick a specialization and a degree of specialization. You need to figure out, “Ok, what do I want to go deep on?” And it doesn’t have to be Itzik Bengan deep. It doesn’t have to be David Klee deep where you’ve picked one singular thing and you are the “world’s expert”. But you have to pick something to go deep on and so that’s going to require focus. Focus on terms of what things are you not learning, what is your specialization, just setting aside time and that’s going to pay for the food today, that’s going to pay the bills today. But then the other piece that hole like, do I learn Angular kind of piece or in the data world, do I learn R, do I learn Python, do I learn Docker? That’s going to make sure that you get paid 10 years from now. Generalization makes sure that you put food on the table a decade from now. And that’s less about focus and that’s more about time. When you listen to podcast you get this exposure and you’re generalizing. You’re dealing with these unknown unknowns. I think the very first step is deciding do you have a problem where you don’t have enough specialization? Have you not gone deep enough or is the problem that you need to generalize more? Do you need to be more aware of what’s out there? I think for a lot of people they are scared of all the new stuff but really they still need to make sure that they know where they want to go and where they want to focus on for their career. I think the first thing you need to do is decide what’s my actual problem? Do I need to go deeper or do I need to go wider? And what am I doing to deal with that.

Steve: And to complicate it even more, I mean in some cases it might be do I need to do both – go deeper and wider. And that could be more subjective.

Carlos: When I think about it, I feel like at least going through the U.S. education system, right? The three of us have gone to college and that’s kind of the route that we took. You get some exposure there so that’s kind of the generalization if you will. You start in Information Technology you get your first tech job. From there, I[00:10:00] think the most important thing is to go deep. Pick a couple of areas and that could be in a couple of different ways so tech stack. But also even just like an application of stack. More and more we hear from the CIOs and some of the things they are looking for in addition to the tech is I want to know the business. So kind of understanding the pain points and how technology solves those things. And I think once you kind of get deep and again like you’ve mentioned, just one area then it will be easier because you understand the full gamut. It will be, “Ok, where do I want to go next?” How can I take what I know and then apply it to the next hurdle or the next specialization area?

Eugene: Yeah, I definitely agree with you there. I mean, I think for a lot of people like if you are out of college your mission is to get past working at helpdesk. Your job is not to be learning Docker right now. Your job is probably not to be learning PowerShell or Hadoop or whatever the cool new next thing is. You’re right, when you’re coming out of college your job is to get enough specialization that people want to pay you money to do something. But part of that going deep too like you said is that. You know, I do martial arts and there is definitely a big difference between no belt, and white belt, and green belt, and all these different things. And I’m a green belt right now so I’m halfway there at the school that I go to. Sometimes you have to learn how to develop mastery in something. If you’ve never become an expert in area, again I’m not talking like elite top 1% expert. To me expertsy starts whenever you first present to your local user group or you write a bunch blog post; anything where the stuff has to go in through your eyes and come back out your mouth that’s starting to develop expertise. It’s on the far end of it.

Carlos: I guess I’ll throw another option there because I’m a big fan of Lunch and Learns. I think unfortunately managers don’t buy into it. The culture is, “Oh yeah, Lunch and Learn, you go bring your own lunch and make some poor shmuck present on something.” I wish that they would just say, you know what, again it could be like small groups pay the whatever it is, bring in pizza whatever, right, so that you can come and learn this. But that would be another option to say, “Hey, co-workers or group, I’ve learned something.” In fact, Doug Parnell, who is going to be speaking at the Companero Conference. One of the criteria they have for where you can go to conferences or get other training is his ability to able to bring that back and then explain to the group what it is that he learned which is interesting. So that’s not deep specialization. It’s just I’ve listen to it, I have some comprehending, and now I’m going to get at least further enough along that I can now explain it to somebody else.

Eugene: Yeah. Anything that’s going to be testing your mental model or something is going to have you that. And like I’m saying, I think that when you learn how to develop a certain level of mastery that becomes repeatable. Like you said, when you come out of college you need to learn how to go deep and once you’ve done that successfully and you’ve actually gone truly deep somewhere then now when you switch over to Hadoop or something like that you can do that. For me, I get that with speaking where the first couple of presentations that I gave there was a lot of fear and anxiety, and a lot of work. And now I’m at the point where I understand kind of the backbone of a good presentation and so it’s a lot easier for me to say, “Oh, I need to give a presentation on Power Query in two weeks or something like that.” And start putting together that outline, putting together that structure because I know what goes into it. Just the same exact thing with understanding what goes in to actually developing mastery somewhere even if that’s a journey man level so to speak and not a true expert.

Steve: So interesting, with that really the key is on developing that first thing that you’ve mastered. It’s not mastering it. It’s figuring out the process of how to master it so that you can then translate that to the next thing you have to learn.

Eugene: Yeah, absolutely. I think a big part of that like we talked about is understanding the difference between, all these different learning things. Are they giving you exposure or are they giving you mastery? Are they helping you with those unknown unknowns, like “Oh, I didn’t know that Spark was a thing.” Or are they helping you develop more of a mental model of how that stuff works and I think[00:15:00] the big dividing line for that in a lot of cases is is it active learning? Is it something where you have to write or type or speak or code or something so that you can actually test that model that’s in your head. Because you can read all the books in the world or listen to all the blogs, or listen to all the podcasts but you need to have the rubber hit the road at some point, and that’s truly how you develop a sense of mastery and expertise somewhere. Again why I say that I think mastery starts with that first user group presentation or that first blog post because that’s something that really test your knowledge. Make sure you actually understand it at all.

Steve: Interesting. I can think of an example on that occurred in my experience was about 10 years ago I was asked to help teach a class at a local university, and it was just a DBA class and it was not the 70-461 but it would have been the equivalent of what the 70-461 exam was then. Then like right I was about to start doing it the person who’s going to help out bailed out on it so I was all on my own to go teach this 10-week class at the university. And for me that was an incredible learning experience because it pushed me beyond what I knew at that point and it made me learn at not to the point that I could just talk about those things but to the point that I can actually teach those things. And I think that was one of those things that jumping into it I never expected that to happen but I had to go deep on a whole lot of topics over that 10-week period. By the time I came out of it I was at a whole different level on what I knew about those kinds of things. I think your example of being able to take it as input and then give it as output through a presentation is a great way to learn at least in my experience.

Carlos: Then the next benefit. I have to think again kind of because now that you’ve mastered. You know, you’ve done that specialization as you go into the generalization components if you will so i.e. talking with others at a conference, listening to the podcast, talking to a vendor, talking to a co-worker, a potential employer and things like that. You can then pick up on how their topic whether that’s a technology, an idea, a process, how that overlaps into what you already know or how it doesn’t, and then be able to speak to that to help that conversation continue to flow. I guess I’m thinking more of a job interview process because that’s kind of what we were started with as job security, “I’m afraid, can I get a job?” And I can’t say that I’ve gotten all the jobs that I’ve ever applied for. That’s not true. But I feel that ability to be able to speak to the things that they have brought up has definitely been at least something that they had to consider in looking at different applicants.

Eugene: Talking about that job interview, even just talking with people, I think that by having a rich mental model, a rich understanding of something it gives you the capacity for analogy, even if it’s an awkward analogy or strained analogy, at least gives you that option. A good example is all this big data stuff. At some point I want to start learning about Hadoop, and Spark, and all these other technologies, and right now I’m still at that exposure phase. I don’t know pretty much anything but when I start looking into them. You know, I was joking with Kevin Feasel, one of your big podcasts cameos that wait a minute, Scala is just like Haskell but different, or F# but different. Or that Spark is basically a service bus but different, or Hadoop is kind of like whatever the SQL data warehouse project is, that appliance kind of thing that they sell. I forgot the exact name. It’s like Parallel Data Warehouse or that sort of thing. So whenever you have some area that you gone that richness with when someone talks about something in a completely different area you at least have the option to go, “Well, I don’t know anything about that but from what you’ve said it sounds a lot like X.” Or even something simple. When you understand how a transaction log works with SQL Server you’re going to be able to make some really good guesses about how it[00:20:00] probably works with MySQL, or PostGres, or Oracle, or something like that. There is a lot of those things that will translate. And even if it’s not a one-to-one translation at least now you have a jumping board whereas if you are a jack of all trades you don’t really have a good way to tell if that comparison, that analogy feels right or not.

Carlos: Yeah, interesting. Now, to jump back in here you kind of have an interesting analogy with keeping up with technology. You model it after radioactive decay.

Eugene: I do. Well, I think it’s a good way to think about it because again if we talk about the beginning and how keeping up with technology is this nebulous anxious sort of thing. It makes me think a lot about when we talk about the cloud. Which originally was just some guy going, “Oh, this internet thing is undefined I’m just going to draw a cloud.” And we decided that’s our branding, right? That’s our marketing plan. Keeping up with technology is whatever it makes me not feel so nervous at night when I go to bed that I’m going to lose my job. That is keeping up with technology. I wanted some mathematical way because I’m a giant nerd of thinking about this, of actually working through this. And to me radioactive decay makes a lot of sense because when you’re dealing with, let’s say you have a pound of Uranium. I’m no physicist but I learn some basics in school. You’ve got a pound of Uranium. That Uranium is going to have something called a half life which simply put is just how long to have half of it. You could apply that to bunch of things but radioactive materials are pretty consistent and that half life is stable. And so I think that IT knowledge also has a half life. Now, what you say it is can vary. Now, Allen White, he says that every 5 years you have to retool yourself. I remember one the first time I was on this podcast he said that and I said, “Well, I’ve been doing this for five years does that mean I have to start over?” But in college I would joke about the same thing. I’d say, “Half of what you know is useless in five years.” And that’s how it really feels. And maybe it’s 10 years or 20 but the idea remains, but let’s say it is five. Well, you can mathematically model that, right? You can say, “Ok, what percentage would I retain each year so that in five years I’ve only have half of that knowledge.” And it turns out that percentage is 87%. That means that if you know 100 things that are not COBOL. You know, 100 things that are still relevant today then if your half life, your IT half life is five years, that means that 13 of them either fell out of your head or no longer applicable, right? 13 of them are either VB6 or something you haven’t done so long you forget how to do words, or DTS or whatever.

Carlos: You kind of know it but you wouldn’t want to be asked to do it again.

Eugene: Right, and so that kind of gives you a way forward because if you think of it that way then we’ve got three knobs that we can twist to try and improve how much stuff we know so that we’ve got a longer half life ourselves, a longer shelf life, whatever you want to think of it as. The first option is that you just learn more stuff. You just shove as much stuff in as you can.

Carlos: So instead of 100, it’s 150.

Eugene: Right, exactly. If you need to learn 13 things a year just to tread water then if you can learn 20 or 40 or 50 or whatever then the total amount of relevant knowledge you have is going to increase. Do you want to go deeper into that right now or do you want to go through all of the three.

Carlos: Let’s go to the other three things. I think that would be good.

Eugene: Ok. The second knob that you have is you can learn more of the right things so that’s about having a better focus. That’s about having a plan. That’s about improving the signal to noise ratio because you can spend 160 hours in your entire week reading Twitter and Hacker News but you’re going to learn about local elections or Go Lang or Rust or some local startup or what Zuckerberg is up to this week. Even the technology things may not be relevant to where you want to go or what fits your knowledge or just there’s a lot of junk out there. There is a lot of low quality materials so if the first thing is learn more things. The second thing is to learn more of the right things. Learn more of the things that fit what you want.

Carlos: So staying away from bleeding edge stuff and away until you start to see some more adoption. Maybe early adopter is the phase. You’re like, “Ok, that’s what I[00:25:00] will jump on to because I’m seeing it more widely used.”

Eugene: Yeah, I think one of the strategies with dealing with the bleeding edge stuff is make a low investment with that. So that’s why stuff like this podcast is so great because you can spend an hour while you’re doing something else and get enough to be conversational at bleeding edge technology and then later on you can figure out, “Ok, this fit with my career. Now, I want to go deep.” So that’s the second thing is just learn the right things. The third know that we have is that radioactive decay, that how quickly does my knowledge become obsolete and that relates to what you just said as well is learn things that last. Learn things that last longer. So things that don’t last are stuff tied to a specific version. So the exact feature set that happens to be in SQL 2005 is perhaps not too useful to you. But understanding how to use some of those features that came in there or understanding some of those advance window functions that came with 2012. That is going to last longer. Certain types of technologies are just immature. Again, I joke about stuff like Angular where they’ve been breaking releases every 6 months but you have that big data space. It’s the hot new thing but I’ll tell you what there is a great site called like Big Data or Pokémon and it will give you a big… It’s true!

Carlos: Nah, I have to look it up.

Eugene: Go and look it up. So it will give you a name like Empowla, or Horsea. I forget some of the other ones. And they’ll say, “Is this a big data program or is this a Pokémon?” And then you’ll click on a button and it will tell you if you’re right or wrong. And you’re going to be wrong a lot of the time. It’s true. It’s great. It’s the best site ever.

Carlos: Ok, here we go. So I’m here, https://pixelastic.github.io/pokemonorbigdata/. We’ll put it up in the show notes. So the first name is Horsea. I happen to be a Pokémon player for the kids, for my children. I have 5 kids.

Eugene: Sure. Yeah, family bonding. I get it.

Carlos: That’s right. So Horsea, big data or Pokémon?

Eugene: Are you asking me?

Carlos: Yeah. I’m asking to the group.

Eugene: I’m pretty sure that one is a Pokémon.

Carlos: Yeah, I’m going Pokémon too. Steve?

Steve: Yeah, I’ll with the group on that one. I’ve never heard of that big data.

Carlos: Yeah, here we go. It is a Pokémon. Ok here we go, Vulpix.

Eugene: Ok, that’s definitely a Pokémon.

Carlos: Definitely a Pokémon.

Eugene: I had a try with it. I promise you.

Carlos: Here is a softball one, Hadoop.

Eugene: That is a big data.

Carlos: That’s definitely a big data. Here we go, it’s a native one that I’m not sure of anyway, Spoink.

Steve: I’m going to guess it’s a big data.

Eugene: Yeah, that’s sounds like something is going to make it for a big data company.
Steve: Sounds like a tech thing.

Carlos: Oh, it is a Pokémon. Look at that. Ok, that is funny. So I don’t know if I should thank you or send you a nasty email now that you’ve introduce me to the site because I’m going to have to go through and.

Eugene: It depends on how much time you waste.

Carlos: Exactly.

Eugene: So the point that I was making with that is that when you have so many of these big data technologies, even within Hadoop you’ve got all these goofy names. You got Pig, and Sqoop, and Flume, and Hive and HTFS and all that stuff. Because it’s immature you don’t want to make a huge time investment. These are things that are going to decay quickly because it’s going to be like some sort of ultimate battle and by the end of it one is going to standing with the crown. And you don’t know which one it is right now.

Carlos: Now, there’s a lot more players in it but it almost reminds me of, what was it? Blu-ray? And what was that technology?

Eugene: It was something like DVD, HD DVD or something.

Carlos: Yeah, DVD or something.

Eugene: Yeah, exactly or even going back VHS and betamax and all that kind of stuff. And so bleeding edge technologies are something that don’t last. But let’s talk about what things do last and we had to some of these things. But things you’re going to learn that last. One of the biggest one is people skills. People do not change or if they do it’s much much slower in terms of centuries than it is the years with technology.

Carlos: So decades, generation.

Eugene: Grammar doesn’t change that quickly. I can promise you. So if you’re going to learn how to write a good email or I have a blog post about how to write a good abstract, you know, that’s going to last the test of time along at the same time,[00:30:00] speaking – public speaking skills. You guys do consulting and I’ve learned myself that if you can stand up in front of 50 people and pretend like you know what you’re talking about you can do it too. Learning the trick of, “Well, I don’t know but I think it will work this way I’ll get back to you. I’ll give you an answer.” Those kind of soft skills are timeless, truthfully.

Carlos: The thing you’re intuiting there is we’re just making the stuff up, aren’t you?

Eugene: No. I think I implied it. I don’t know if I intuited but the distinction is lost on me.

Steve: So it would really be really interesting as we go through these different items if there was like a periodic table of skills that you could look and say, “Well, the half life on public speaking is 200 years. But the half life on big data is 9 months.” And try and do a comparison that way to figure out, “Ok, if you need to increase your skills overall.” What are the ones that you can either increase or going to last for a long time versus what can you learn quickly that might be a risk but it may pay off in the short term but you know what it’s going to be different 5 years from now.

Eugene: Yeah. I would say the people skills are definitely the noble gases of the skill world because they are not reactive. They last forever. But another thing that last long is I think, you know we talk about it going deep, understanding theory, fundamentals and internals. Going that one layer below and understanding how something actually works because it’s so much easier to tranche for that. But it also lets you make certain guesses and inferences. I’ll give you a perfect example. I have literally thanked Paul Randall twice for his transaction log course because it saved me so much for understanding that like for example dealing with availability groups. If you don’t know how the transaction log works on an internal level, availability groups are such a huge pain because you’re like, “Why I can’t sync this?” Or you say, “Do I have to take backups on both sides?” But if you understand how it actually works then you can intuit a lot of things. You can intuit, “Ok, if I’m taking a backup right now is the transaction log going to keep growing while I’m still doing the backup or will it stop?” That kind of stuff. So we talked about three different things: learn more things, learn the right things and then learn things that last. Things that last is going to come down to the deep stuff fundamentals, internals, some of the hands off stuff. And then it’s going to be those people skills. It’s how to write, how to read, it’s how to communicate, it’s how to learn in general, that kind of stuff. So those are I think the three different approaches you can take because the first two increase just your inputs, and then the last one decreases that radioactive decay. So if you know 100 things, if your half life, if you can shift that from 5 years to 6 years. If you can make that tiny little shift then still learning just 13 things a year, you’re going to end up knowing a 120 instead of 100. So slowing that decay you’re going to know more relevant stuff as a result over time.

Steve: Interesting. As you say that I think I’m really glad I’m not a JavaScript developer because I think the half life there would be…

Eugene: 6 months.

Steve: If even that maybe.

Eugene: Like I said, I know that Angular is coming out with like build number changes like full number changes. I think the plan is supposed to be every 6 months or something like that. And I’m still mad about SQL is coming out every 2 years so I don’t know how I will deal with that.

Carlos: Yeah, that’s right. Different worlds, right? You know the dev ops level on that side.

Eugene: It’s sneaking over the SQL world for sure all the dev ops.

Carlos: It’s well on its way. Well Eugene, great information and I guess we should note that if people wanted to extend the conversation a little bit or actually here you present this, your presentation at the GroupBy Conference would be available[00:35:00] and I’m sure it will be posted by the time this goes out actually.

Eugene: Yeah, we expect so.

Carlos: We’ll make sure that’s included in our show notes as well.

Steve: So I guess then just to kind of wrap it up at this point.

Eugene: Yeah.

Steve: Well, before going to SQL Family, just summarize a little bit of where we’re at.

Eugene: Oh sure. Yeah, ok, I can do that. Just to summarize everything you have to figure out, “Ok, what it is my real problem?” Is it that I need to go deeper with things or do I need to be learning more things? And then if I’m going deeper I need more focus. I need a plan. I need scheduled time because doing active learning is hard. It requires focus. That’s the fuel for deep learning. The fuel for generalization and broad learning is time. But you can listen to podcast while you’re exercising or doing the dishes or commuting. You can learn some of these things without giving it your full attention. And you don’t often want to give it your full attention because it’s so volatile. But really a lot of it comes down to three big things. If it’s like this radioactive decay where our knowledge is continually fading in relevancy, you can either learn more things which means putting in more time, more energy, or more money in some sort of way. You can learn the right things by say leaning on curation making sure you’re dealing stuff that’s good quality or having a plan and making sure that stuff fits in within your plan. Or you can learn things that are going to last longer; that are going to last more than five years and not become irrelevant, that aren’t just a hot new thing. Generally, that comes down to going truly deep and learning internals or fundamentals or theory. Or means learning people skills or business skills, things that haven’t changed nearly so rapidly over the past 10, 20, 30 years, things that sometimes don’t change for generations. So that would be my general advice with trying to keep up with technology. You may not be able to truly keep up with technology but you can find a way to keep your job and keep your friends without so much angst and so much anxiety.

Steve: Alright, very cool.

Carlos: Good stuff.

Eugene: Yeah.

Carlos: Shall we go ahead and do SQL Family?

Eugene: Sounds good to me.

Steve: Let’s do it. So how did you first get started with SQL Server.

Eugene: So it was largely by accident if I’m being honest. I took a database course in college and that was using MySQL as backend. I was a TA for that class later and so the different professor was using Access. And then later I did a non-credit intern project and did all the development work and that was using MySQL. Up until my first long term job, my current job, no experience with SQL, didn’t know it was a thing. And then I’m looking for a job after my first one and the job says .NET/SQL developer. And I’m like great, I always want to do software engineering, do a lot of programming, this would be perfect. Well, I thought it’s going to be 80% .NET and 20% SQL and it was flipped. Half of that was DBA stuff and I remember my first month looking, googling up the differences between a view, a stored procedure, and a function because I didn’t know any of that at the time. I could do my SELECT *, I could do my WHERE and that was about it. But I just learn on the job and I got involved and then I find out that, “Oh, user groups are a thing.” And I start going to local SQL user group in Pittsburgh and then I found out SQL Saturdays are a thing. I’ll tell everyone here. Don’t go to the after party because you’ll end up as a speaker. I got cornered by Gina Walters who was running the group and Rick Heiges who was a former PASS board member, and they’re like, “You should present.” And I said, “I’m not qualified.” And they said, “You should present anyway.” And so I gave my first presentation on execution plans. I was terrified but I loved it and I just kept going from there.

Steve: Alright, good stuff.

Carlos: Now, in all that time working with SQL Server, if there is one thing you could change about it what would it be?

Eugene: I know this had been said before but licensing. I would change licensing. If there was just one simple guy like I get, ok we got like Free, Express and we’ve got Standard, and Enterprise. Microsoft wants their money they see Oracle doing their thing I get it. But then you’re throwing stuff like, ok if you have a cold standby, that one is free. Well, in 2014 we change that now you have to have software assurance for it to be free but the moment you start taking backups, you’re doing production works so doesn’t count anymore and all these little nuisances are just really overwhelming. So licensing by far I would change.

Carlos: And then if you have that license you could take it to the cloud, but then you[00:40:00] have to

Eugene: Yeah, now you got hybrid.

Carlos: Failing over, and if you’re in the cloud for too long and that’s different licensing.

Steve: Yeah. That’s definitely one that would be worth straining out a little bit. So what’s the best piece of career advice that you have ever received?

Eugene: I’ll give you two because the best piece of career advice I know of I got out of a book so I don’t know if I’d count that receiving it but there’s really great book that I was given by a friend in my first job, and it’s How to Have Confidence and Power in Dealing with People which sounds really fancy but it’s a lot of common sense stuff of just how to work with people and talk with people and that kind of stuff. For someone who is this introverted nerd who didn’t know how to work with other people it was big. And the biggest thing out of that book, the best career advice that I’ve ever found in my career is “paraphrase what people say”. Repeat it back to them to make sure you’re on the same page. Just ask, “Hey, do you mind if I paraphrase that to make sure we’re on the same page.” And then just repeat back what you heard because there are so many times that you heard something different than they said and even of you got it right it lets them know, “Ok, he understood”, and they can relax a little bit so that’s been huge for me. As for received, probably definitely something that’s recent and sticks in my mind is from Erin Stellato where I talked to her about, “Hey, I want to get a job and big data or data analytics or something like that.” And she said, “Make the job that you want to have.” In the sense that instead of thinking, oh I’m going to have to find some other job. Well, I can look for opportunities to say, “Hey boss, I did a little bit of R with some of our internal metrics and here is what I’m able to find.” Or just something that shape the job that I’m already in to something more of what I wanted to be three years from now or something like that. That’s something huge.

Steve: Ok, great.

Carlos: And not to bang this drum companeros here, forgive me. But I think that idea is if you can tie the technology to a business scenario I would be willing to wage your 99% of the time you’re going to get to do that project. You know, assuming budgets and all of that stuff are all in order. But if you can prove value to the business by it, much easier scenario, much easier conversation than, “Hey, I want to do big data.” I have this problem I think I can solve it. Now having said all that our last question for you today, Eugene, if you could have one superhero power what would it be and why you want it?

Eugene: Yeah, I’m tear with this question because I’d want to be able to learn mildly useful things really quickly. Because I feel like most superpowers would be just way too obvious, way too intrusive like, Carlos, if you’re flying around the work or whatever people are going to notice and then you’ve got paparazzi and all this kind of stuff, right?

Carlos: Got you. There you go.

Eugene: Or if you’re some super genius that you can just touch a computer and tell what’s wrong then people are going, the FBI is just going to kidnap you and dissect you and figure out what’s going on. But there are all these minor little skills that I mentioned and that are useful but no one would go, “Hmm, I wonder what happened to him?” Like I want to learn lip reading someday or lock picking or something that my wife and I are learning right now is sign language. Like she is fully capable hearing, no problems there at all. Well, ok maybe sometimes she can’t hear me as well. But we’re learning sign language because one it’s just this cool thing. But two it legitimately is something useful in these occasional situations. So if you are in a loud concert or you’re 30 feet away from each other you can still communicate. And right now our repertoire is pretty limited. We mostly can say, “Hey, I’m going to the rest room.” “Oh, look at that cute child.” But we still get some value out of it right now. So my superpower would be learning all these like mildly useful little skills really easily but nothing that would attract notice by any authorities or other people.

Carlos: Lots of attention.

Eugene: Yeah, right.

Carlos: So I’ll second you there on the sign language. My wife and I took a class while we were in college together. It hasn’t been super useful outside of teaching our kids when they were growing up some sign language like terrible tooth time they can’t quite talk. They want to communicate that’s been the best thing there for it but yeah, super cool. Eugene, thank you so much for being on the program today.

Eugene: You’re very welcome. It was a pleasure.

Steve: Thanks, Eugene, really enjoyed it.

Episode 103: Plan Reuse

When we write our queries to the database, SQL Server has to go and figure out the best way to go and bring back the data you asked for. A query plan is created to help SQL Server remember how to get the data for this query. It takes time to make these queries, so the database wants to limit the number of times it has to create the plans so it will try to reuse the plan as much as possible.

Our topic for today’s episode is query plan reuse and the pros and cons with this concept. We will also touch on the concept of parameter sniffing–a technique SQL Server uses to try and figure out the best values to use for the execution plan with the hopes the plan will help the most queries. Special thanks to James Youkhanis for the suggestion.

 Episode Quote

“The concept behind this is it’s there to make things a little bit faster by reusing cache plans.”

“Parameter sniffing is a good thing because without it SQL Server wouldn’t be able to optimize your plan for any kind of parameter. But occasionally it goes wrong.”

“I think it kind of comes down again to kind of knowing your system and understanding the problem”

“Optimized for adhoc workloads is one of those parameters that we most of the time will recommend people turn on”

Listen to Learn

4:53  SQL Server in the News
5:00  Ola Hallengren scripts now on GitHub
6:45 What is plan cache?
7:48 Description of T-SQL and its execution plan
10:15  Scenario in regards to statistics and indexes, and data types
11:30  One-time use query plan cache
12:22  SQL Server and the importance of memory
12:50  A specific problem with one-time use query
12:55 Parameterization
17:30  Parameter sniffing
20:25  Stored procedure and plan cache, parameter sniffing issues
23:55  Options to solve parameter sniffing issues, recompiling
27:28  Controlling plan cache size
28:10  Plan cache and flash array
29:27  Idea of ad-hoc workloads
32:30  Needs parameter reports and examples
38:15  One-time use query reports
38:50  Instance level memory report
39:40  More about hints, recompiling and plan guides

 

Transcription: Plan Reuse

Carlos: So companeros, welcome to Episode 103. Thanks for tuning again to another great episode.

Steve: Yeah, Episode 103. Wow, so this episode is on cache plans and plan reuse which is one of my favorite topics in SQL Server. I know I’ve had a lot of lively debates over the years around this.

Carlos: Yeah, that’s right. I think it’s one of those things where from a performance perspective so indexing and some of the other objects and then you got to figure out to the internals of how SQL Server works. Yeah, it can a bit confusing a little.

Steve: And this topic came to us from James and he suggested that we talk about plan cache and plan reuse. I want to thank him for that suggestion.

Carlos: Yeah, we apologize. He was suggesting, gosh it’s been, I’m embarrassed to say how long have it’s been but it was during a time when we had a slew of interviews kind of lined up. It was kind of push to the back there but we’re glad to finally circle back around to it. We have a couple of shout outs this episode.

Steve: Yes, so one shout out came from sqlgambo on Twitter and this was in regard to a post that I had tweeted about Database Health Monitor, and he came back and said why I am building these tools of mine. I guess he hadn’t seen Database Health Monitor and then he and I actually chatted through private messages on Twitter for quite a bit and learn some stuff about what he is working on. And learn that he really likes with Database Health Monitor, so yeah, good to connect there.

Carlos: Yeah, very cool and it’s kind of interesting. We were talking before we started recording here about the history of database health monitor and how it kind of started from SQL Server reporting services reports. I was there, I was building my own home grown and came across Database Health Monitor. Lots of other tools out there, that’s one of the very nice things about the community and making those things available.

Steve: Yup, definitely.

Carlos: So Companero Conference coming up again on October – October 4th and 15th. Interestingly enough I was just at Nashville for a health conference this week trying to make some connections with some hospitals, and one of the things that stood up to me was the importance of unstructured time. So in that conference they were trying to because it had a mix of panels and speakers, and then vendor type sessions which weren’t horrible but they are still vendor sessions. So they were just trying to get through them. They kind of plow through it and they had this conference in a day and a half, a little bit more. I didn’t felt we had enough time just to talk, like let’s understand what it is. Are you being affected by this problem to be able to keep the conversation going? Anyways, so I thought give me some perspective on this idea of creating some structured content from over a session or from a panel. But they aren’t giving the unstructured time for people just to be able to talk, connect, right? Where are there similarities and commonalities? What I might want to pick up with this person after the conference. What conversations do I want to maybe talk about tomorrow or things like that? And so I found that again kind of single track conference, other way we’re going to do ours that I would have enjoyed a little bit more unstructured time with some folks.

Steve: Interesting. Yeah, I think that that unstructured time can be more valuable relationship building time that you have at a conference. And I think going to a conference and just getting the material, I mean, really you could do a lot of that on YouTube. But going there and making those contacts with people and being able to have time and talk about what it is you’re doing or when we do a session on performance tuning to have time to talk afterwards about issues you’ve ran into or problems you’re trying to work on can be incredibly valuable.

Carlos: I agree. I am looking forward to our conference and putting that on. We hope Companeros will join us October 4th and 5th in Norfolk, Virginia. You could take a look at companeroconfernce.com and we’ll make sure we have the link on the show notes page again as well.

Steve: And now on to SQL Server in the News.

Carlos: This has been up for a couple of weeks now but thought it was interesting. Many of you may be using Ola Hallengren scripts. He has decided to put them out on GitHub, kind of made them available. Obviously they are free to download but the now the difference being that you can actually suggest changes to his code and I know that a couple of people have done that already so it would be interesting to see what happens with those scripts as a result. Kind of going back to our community episode we talk a little bit about this, so it would be interesting to see what happens.

Steve: You know, we didn’t really talk much about Ola scripts on the community episode because it wasn’t really a community contribution project. It was sort of something he has built over time but now that it’s out on GitHub maybe it will become something more amazing than it is already based on community contribution.

Carlos: I believe it came out on GitHub after our episode so I wonder if we are not influencing people out there, Steve.

Steve: Yeah, who knows.

Carlos: Can we take credit for that, you know, SQL community you can thank us for Ola putting his stuff on GitHub.

Steve: Whether we were the cause or not.

Carlos: Yeah, sorry, butterfly effect or something. Ok, so today’s episode can be found at sqldatapartners.com/plancache.

Steve: Or at sqldatapartners.com/103 for our episode number.

Carlos: So again, ultimately what we are talking about is execution plans, plan cache, and plan reuse. So first, I guess let me back up and kind of from the 10,000 level view, what is it that we talk about when we are talking about plan cache.

Steve: Well from the high level it’s basically the plan cache is a piece of memory or junk of memory in SQL Server that keeps track of every query plan as it gets compiled. And if your queries are written in a way that it can reuse the plan it can then grab and reuse one of those existing plans rather than having to recompile that every time. And the concept behind this is it’s there to make things a little bit faster by reusing cache plans.

Carlos: Right, so allow me to go back up just a slight bit higher and take the idea that we use T-SQL to tell the database what it is that we want out of the database. So T-SQL has been described as the language of what is it that we want but it doesn’t tell the database how to go get it. So the database has to decide how best to do that and when the query comes in like as you mentioned, it’s going through a process to decide, “Ok, well this is what you want. How do I need to go and get that data?” And so as a result it’s going through a look and say, “Ok, well I think this is the best way and I want to create a plan.” I want to create an execution plan or a way to go and get this so that way every time you want it I will know how to go and get it. There could be many different ways almost like kind of going from Point A to Point B, lots of different ways to get there. And it has to figure out which way is the best or at least which is its going to use on a regular basis.

Steve: Right, and it can be a very expensive step in the process that compiling and figuring out how it’s going to go about getting the data.

Carlos: Well, it’s interesting. So they mentioned expensive and I guess kind of this but there is a cap, a couple of milliseconds, which all of a sudden I can’t remember how many it is, that it will what that is. And I thought, well gosh milliseconds isn’t sound all that long of a time but I think it’s all a matter of how busy your server is and then how many executions are coming to SQL Server as well.

Steve: Yup. And I guess to put it in perspective with milliseconds there. I mean last week I was working on a query with a client where we were trying to increase or decrease the run time from about 800 millisecond down to about 150 milliseconds. And milliseconds could make a big difference there, and this was a query that was being run continuously like through website traffic, web service
traffic and all kinds of things. It was being hit quite often so a difference between 150 milliseconds and 800 milliseconds meant a lot to the performance of the system.

Carlos: Sure, that’s one of the great point, right, is the frequency not just of all of the queries but of that specific query, because if it had to recompile every single time and you’re adding milliseconds on there then you’re just kind of piling everything back up and it’s going to go and redo a lot of that work every single time.

Steve: Yup, and the work that it’s doing there is it’s going out and it’s looking at statistics and what indexes are available, and what data are you looking for, and what are you filtering on. And it puts all those things together to figure out what is the best way that a SQL Server engine can you go and get your result set for you. And if it wasn’t so smart it will just say, I want to go out and I’m going to do a full table scan on every table you’re looking at and give you your results brought back together on your JOINS. But that just wouldn’t cut it in today’s world. Maybe databases 20, 30 years ago might have done that but today there is so much going on with indexes and big tables and different ways to get the data. There is a lot of options there to look at. I mean if you got a table with 30 indexes on it versus a table with 2 indexes there might be more work that has to happen there when it’s figuring out what is the best plan to use.

Carlos: Sure, and then we kind of get into data types, right? That plays a role as well. There a lot of things that it has to look at and consider.

Steve: Yup, so what happens after that plan gets compiled is it gets put into this memory location called the plan cache and those plans are kept around with the hope, SQL Server hoping, that they will be reused so it doesn’t have to do that work again but sometimes they never get reused. What you end up with is you end up with what could be called the one-time use query plan cache where if things are changing in the query and they are not identical you end up with all this one time use queries in the plan cache that can kind of clog things up and sometimes push other things out of the plan cache that would be useful to be reuse.

Carlos: Like you mentioned, going back to that idea, so the plan cache is a space of memory where your data has to store as well, SQL Server read everything from memory, right? So has to be able to read that stuff there and so if you’re not using it that mean that there are, memory is kind of a vital thing and a finite thing as well, that you are using those resources in a way that’s not helping your SQL Server go faster.

Steve: Right, so think of it this way. Let’s say we had a table called ‘podcast’. And in there we have the list of all of the podcasts that we know about and in there is a column called ‘host’. And you just said, SELECT * from podcasts WHERE host = ‘Carlos’ If you run that query it’s going to create, I mean by the first time you run it it’s going to create a cache plan and then if I come along and run that same query a moment or two later or a few minutes later and I run the exact same query SELECT * from podcasts WHERE host = ‘Carlos’ It’s going to not have to recompile that. It’s just going to use that already compiled plan and it’s going to save some time there. But then if we change it up and Carlos runs the query to say SELECT * from podcasts WHERE host = ‘Carlos’ and I say SELECT * from podcast WHERE host = ‘Steve’ that’s going to be two different plans because the queries are different. By different if you just look at the text of that entire query and if it’s not exactly identical meaning turns in the same place, space and the layout is exactly the same and the difference is we’ve changed the WHERE filter to say Carlos or Steve, that’s shows up two different plans in the cache. Now imagine if this was like, SELECT * from customers WHERE customer_name = ‘Fred’. Or customer_name = ‘Mary’, or customer_name = any of the 10 million customers you have in your website. You could end up with many many of these one-time use queries that may not or maybe they get used once or twice or three times while that customer is there but they end up chewing a whole lot of memory, and the way you get around that is you use parameterization. And the way that works is instead of saying, SELECT * from podcasts WHERE host = ‘Carlos’ you say SELECT * from podcast WHERE host =, a parameter. And that parameter is there and when that query gets run, whether you are running it through code you’ve written, website code, or reporting services or wherever. Instead of passing through the text string of Carlos or Steve as the parameter it passes through a separate parameter that says compile this without knowing what the parameter is necessarily. Just compile it and then we’ll fix up that parameter after the plan has been compiled.

Carlos: Then one of the ways that it goes and figures out but maybe parameter it should use as a default, is it well look at those statistics to say, “Ok well, I see Carlos is in here 100 times, Steve is in here 10 times.” Like, “Huh, Ok, I see there is a tilt towards Carlos so I’m going to assume that more people are going to query Carlos than Steve.” So I’m going to make some decisions potentially based on that distribution. This can be a good thing or this can be a bad thing.

Steve: Yup. An example of that, think of it as if you were looking up for customers by zip code. And now imagine that you are in a small town running a business and maybe that small town has one zip code, maybe two zip codes, and most of your customers are local so most of the time when you’re looking up customers you’re looking them up based off of those one or two local zip codes. Well, then you get a customer that is somewhere on the other side of the country in a different zip code. It might be that 99% of all of your customer are in one or two zip codes but then you have this 1% that are in other zip codes. What can happen with that is that the plan can assume that most of the time that’s being run with something that requires more work because it has to scan more customers based off that zip code distribution but then when you run it for that one customer without one zip code that doesn’t match your local zip code. It could get to that customer with less work but it doesn’t because it goes through the normal path of that pre-compiled plan to find the information there. There is a term for that and it’s called parameter sniffing. Where when a plan gets compiled the first time it gets compiled it looks what are the parameters that are being passed in and it figures out a good plan that’s going to work for those parameters.

Carlos: Based on everything that I know as of the data that’s all in these tables what is the highest probability of what’s going to come in and let me make it easiest for kind of the 80-20 rule if you will. That’s where I’m going to go and try to get that.

Steve: Yup, so then let’s say you have that customer example by zip code and you are looking it up by zip code and the very first time or when the plan gets compiled you use one of these odd zip codes that’s not a very common zip code in your system. It may then look at it and say, “Ok, there is only a very small percentage of the rows in our result set that use that zip code so what we’re going to do is we’re going to do an index seek right to that zip code location.”

Carlos: An anomaly in that sense. Like, “Oh, all the data must become an anomaly.”

Steve: Yes, but then if it was the first time you compile that plan and it was one of the common zip codes it may look at that in an example of instead doing a seek which we go right much quicker to where you’re going for the smaller set of data. It may say, well a majority of the rows in this table are in this zip code so instead of doing an index seek we may do an index scan or even a table scan because that’s going to be the most optimal way to bring back everything that you are looking for based off of that initial parameters. So what you end up with is that when that plan gets compiled and that’s compiled the first time you run it or if it gets pushed out of memory or somebody flags it to be recompiled that the next time it is run it gets recompiled. But if that time that it gets recompiled if you’ve got good parameters that represent a regular data set in your system you get really good performance out of your query. But then if it happens to get recompiled with one of those unusual parameters that causes your plan to do something different then what is optimal most of the time you could end up with a really inefficient plan, that ends up bogging down the system or really hurting the overall performance.

Carlos: Sure, and what gets a little bit weird and you may think I guess why would this affect me, this should be a problem. If you’ve ever seen scenarios where one time query runs pretty fast and then all of a sudden it doesn’t and then maybe later in the afternoon it runs again fast. That’s a common symptom if you will.

Steve: Yup. And I guess with that the common scenario that I see as a freelance consultant as Carlos and I are. I see that you will be working with a client and they’ll come and they’ll say, “Things have really slowed down. We haven’t changed anything on the database and something has really slowed down on this one specific process or job.” Or whatever it is that’s calling into to use this query. And then you’ll go and look at it and you’ll find here is the stored procedure that’s the root cause of the problem. You’ll copy it out. You’ll copy and paste it and put it into your Management Studio and change it around so you can run it, maybe not as a stored procedure but in line. And you run it and everything runs great because with the changes you’ve made to go run that in line it gets a different plan. And then you look and you think when you look at weight statistics and something like and you can see that like at noon today that’s when things just tipped and it went bad; and prior to that everything was running great. So what could have happen to cause that and what often happens is that you have a stored procedure that gets pushed out of the plan cache for some reason. The next time it is ran it is run with this unusual parameters sets which causes it to get a bad plan and then every other call into that starts using that bad plan. Or it was good plan for that one parameter but it’s a plan for the other parameters. And I’ve seen that take a stored procedure that normally runs in under a second and cause it to run like 7-8 minutes when it’s something more complex. Then everyone they hear that and they grip and say, “Oh, the database is horrible and everything broken.” And this is all really because of the parameters sniffing and parameter sniffing is a good thing because without it SQL Server wouldn’t be able to optimize your plan for any kind of parameter. But occasionally it goes wrong and you end up with the wrong plan based off of plan reuse on that stored procedure.

Carlos: And you may be thinking how big of a problem could this be. Now, I won’t say it’s the only reason but the new feature of the query store feature is basically trying to solve this very problem. And so it’s big enough of a problem that Microsoft has built a tool to help you combat it. And I think it’s one of those things that I think a lot of, at least I can remember going back and being frustrated and I think a lot of times it probably had to do with parameter sniffing.

Steve: Yup, and most of the time I ran into parameter sniffing issues it starts with customer service department who’s working with customers getting lots and lots of complaints that something has gone wrong or the system is down and not responsive. And then it leads to then eventually jumping into the database and finding, ok here is the problem, and then there is a lot of different things you can do to help mitigate the issue with that point. I mean one of them is people will put the option recompile on a query or the recompile hints on a stored procedure and that can cause more problems or different problems.

Carlos: Well, let me explain that. Let’s just go through that just for a little bit. So now I guess what we’re saying is to mitigate that you can tell as it goes through the process of actually creating the plan you can get, there is some we call hints, because a stronger word is plan guide but that is slight different. We’ll touch that in a minute but you can tell SQL Server, “Hey, SQL Server you should do it my way.” Or in the example of the recompile you’re saying, “Hey, every time you run this I want you to recompile.” I don’t want you to use the plan you have available to you. I want you to throw that plan away every time and figure out how to do it again. And so that’s an option that you have but there are risks if you will associated with that and you want to be careful about how you go about or when you go about implementing that option.

Steve: Yup, and one of the knee jerk reactions that I often see when I explain to people that, “Well, here is what happened. You got a parameter sniffing issue. The query got a bad plan, or the stored procedure got a bad plan. Here is what we did to fix it. We force it to recompile or we change the query around.” They often think, “Well, can we just force all of our stored procedures to recompile so we’ll never have this issue again.”

Carlos: Yeah, exactly. How do I eliminate the problem from never happening again.

Steve: Yup. And then the answer to that is, well you could but however by doing that you would eliminate any of the benefit that you get from the plan cache and be able to reuse plans. And depending on the system and the performance there, that could make things far worst over time depending on the load. So there are things you can do. You can go in and perhaps change the queries or change how things are working in the stored procedure or maybe have a stored procedure that calls another stored procedure that does something different. I mean there is a lot of different, there are probably 20 different ways you could go to figure out how to do this right. I guess we don’t have time to go through all of those now but sort of the knee jerk reaction is just make it so that everyone recompiles and that’s not a good thing to do.

Carlos: Well at least by default. I mean that may be an option that you pursue but don’t make that your first choice.

Steve: Oh yeah. And I’ve certainly used that for a single stored procedure or a couple of problematic stored procedures. Used the recompile option on them and every time they are run they get recompiled. And it’s just because of how they’re written and they are in positions where they could be rewritten but the overhead of recompiling those is cheaper than the time it would take to go rewrite those.

Carlos: Right, but I think it kind of comes down again to kind of knowing your system and understanding what the, so you understand the problem, “I believe I have a parameter sniffing issue.” What do I know about this either query, procedure, view, whatever it is and do I know any history about it? Can I go find some of that out to then understand what makes the most sense?

Steve: Yup. And we could probably go for hours on parameter sniffing but let’s shift back a little bit to sort of the generic plan cache topic now. So one of the things that often comes up with the plan cache is people say, “Well, how do I control the size of the plan cache?” And, you can’t. It’s something that’s dynamically sized by the SQL Server internally and it depends a lot on the server load and what memories are available. One way to control it is just put more memory one the server but that’s not a really good answer.

Carlos: Well, another feature that they added, I’m forgetting the edition, I want to say was it 2014? It seems like it was older than 2016 but maybe I’m remembering wrong. And that is when they added the ability to add flash, so if you have flash on the SQL Server you could actually expand the plan cache to use that flash array to give you more space when you have an issue like this. So to kind of indicate the gravity of problem Microsoft is putting solutions out there around the plan cache and its size. Even if they are not giving you the controls of like in Oracle to say this is what it should be.

Steve: Right. So the best way I’ve found to deal with the plan cache if you’ve got stuff that is getting pushed out or a lot on one-time use queries in there and things like that, it is to better understand what’s in there and then it might be and I’ve worked on systems that I have tens of thousands of different queries run against them and then it turns out there is a dozen queries that are really the big offenders in hogging up the plan cache was one-time use queries. And you can go in and work and optimized those dozen queries to use parameters or do whatever needs to be done there. And oftentimes with a small amount of work you can have a really big impact on the plan cache there.

Carlos: This is where the setting, is that the right word, for optimized for adhoc workloads comes in. So this idea of this adhoc is that, “Hey, I have a one-time use query. I have a whole bunch of those. What I’m going to do is instead capturing or keeping the full blown execution plan is I’m going to just keep a little stub. The first time it gets run and then when it get run the second time then I’ll keep the whole thing and make use of the being able to run it more frequently.

Steve: Yup and that optimized for adhoc workloads is one of those parameters that we most of the time will recommend people turn on.

Carlos: Yeah. I know we’ve talked about it before. I only ever heard of one person complaining about it which we actually talked about, like it was in Episode 99. Mindy brought it up, you were in the panel in Baltimore. I remember Wayne Sheffield talking that he’d seen some CPU spike but I think. Again, obviously you have to test in your environment, right? But it seems like it’s almost one of those standard features that you can enable now.

Steve: Yup, and that’s why I said we almost always recommend it. Not always but almost always. So then I guess, I mean as far as understanding what’s going on with your plan cache, and I know we talked about Database Health Monitor a little bit earlier but in the very beginning when I first created Database Health Monitor some of the very first reports that I built were around understanding the plan cache because I was working on an environment where it wasn’t well understood and I needed a way to built a show what’s going on with the plan cache.

Carlos: Sure. And I think at least in my first interactions with that are giving you the top queries because it will keep some statistics about the plans and their executions. You can go and start to interrogate that a little bit. Generally, from a performance perspective that was the first time I remember kind of going and taking a look. It’s like, well what are the top plans by CPU, or by memory or just how long it ran, something like that.

Steve: Yup. And there are four reports in there that I usually look at that are right around understanding the plan cache, and one of them is just called the plan cache report and it’s a pre database report. And what it will do is it will show you the first 500 largest plans in the plan cache and it will show you how big they are. So you can go and see, “Oh wow, we’ve got 12 plans that are very similar that are taking up 30K each. And you do the Math and you add it all up and realized, wow some of this add up real quick to taking up a lot of your cache. Another one that’s really handy is the needs parameters reports. And what it does it goes through it and analyzes the queries that are in the plan cache and it looks for things that could be parameterized and then it groups all of those together. So if you had 1,000 queries let’s say customer name in them that was hard coded in the query, it will go through and say that by fixing this one query it would reduce your plan cache from a thousand instances of that same or similar plan to be one reusable instance.

Carlos: Now let me ask you a question on that because I guess this is where I’m drawing a blank here, it was a gap, right because I thought that even. So we talked a little bit about stored procedures versus adhocs so views or inline queries. But I thought even though I was in line query and I’ve been written with like framework or something. If the SQL Server gets that query it’s still going to try parameterized it. Even those on store procedure I guess is what I’m saying. Obviously, the very reasons why I would have that but in that scenario what do you then go about doing to solve for them?

Steve: Well, take the example earlier where we’re querying the podcast tables, SELECT * from podcasts WHERE host = ‘Carlos’ or host = ‘Steve’. If you are running that code and it’s actually running that exact query hard coded from whatever application it is. But that’s what’s ending up in the plan. It’s hard coded with Carlos or Steve in there. That is taking up just for those two queries, two or sometimes four, plan cache entries. And let me just clarify that when I say it’s two, that’s the obvious, one for the query that is looking for Carlos one for the ones looking for Steve. But sometimes you will get a parallel and a non parallel version of that query in there so sometimes a single query will have two different plans in the cache. But to go back to what you’re looking for there. If the application is passing through hard coded strings like that each one that it’s passing through will get a different plan, so then that’s really all the needs parameters report does is it goes and finds the items in the plan cache that are very similar with everything besides the parameters.

Carlos: So I guess let me ask the question this way then.

Steve: I don’t think I answered that, did I?

Carlos: I think you did answer it. I think I asked the question wrong.

Steve: Ok.

Carlos: So I am misunderstanding and SQL Server will not in every instance try to parameterize your adhoc queries.

Steve: Yes, that is correct. And the way to tell that is to look at what’s in the plan cache. And if what’s in the plan cache contains those hard coded values for a name for instance, then they haven’t been parameterized. Or the other way to look at it is if you run it with two different parameter or two different names in that value do you get two copies of that in your plan cache? And if it is then it is not parameterizing it.

Carlos: Ok.

Steve: Now with that, I guess the thing I like to do is find the ones that are the most commonly ones that need to be parameterized. This only works if you have access to the code because if you’re running an off the shelf application where you can’t change any of the code you might not be able to do this. But if you are a development organization and you’re building an application, if you can go in and find that these are the queries that end up using the most out of the plan cache. They have big plans and they are called thousands of times and then you can figure out which one needs parameterization and go parameterized a couple of those you can often times have a big impact on the amount of those one-time or low use plans that are in the cache.

Carlos: Again, to connect the dots here. There is where we’re actually going to the code and instead of using that hard coding you are going to use like sp_execute or changing the way that you are making that call to the database.

Steve: Right, and I mean for instance if you are working in pretty much any programming language that I’ve ever seen or work with that allows parameterization, you usually pass through some kind of a parameter value like a variable name in place of what were you would be filtering on that hard coded string, and then in the code you say, here is the query patch it up with these parameters and then execute it.

Carlos: So the developers have to make one more step in putting that dynamic, I’m assuming it’s a dynamic query linking it all together. Before they send it to the database they need to make one more step to help us out of it.

Steve: Then we can have a whole other conversation for probably an hour for parameterization and the impacts that it has on preventing SQL injection or helping prevent SQL injection. I mean there is another benefits to parameterization just besides the performance. Maybe we’ll save that for another time. So another report, there are a couple of others that I look at is the one-time use query report. Did I already mentioned that one?

Carlos: I think we may have touched, we talked about it but this is just to show us how many queries have just been executed in one time.

Steve: Yup, and that’s a handy way to see how many of these are there. And if you look at your database and there’s 2 or 3 or a couple of dozen, you probably don’t have to worry about it. But if you find out that there are thousands of them there then maybe it’s something you need to look into.

Carlos: And then that’s where that optimized adhoc workload comes in.

Steve: And then the other report in Database Health Monitor that I really like to use and understand is the instance level memory report where you can go in and see how much memory is being used by each database, but it also shows you how much memory is being used by the plan cache. And it’s interesting, on some server the plan cache might be using more memory than some of your databases are. I mean it depends on the size of your database and performance and things, and the load, but it’s just good to understand how big it is. And I guess I said earlier you can’t really control the size of it but you can control the size of it by reducing the amount of one time used queries either through optimized adhoc workloads or by adding parameters.

Carlos: Influencing what’s in there.

Steve: Yup, yup.

Carlos: So another thing we will touch on, we talked about it earlier, as we talk about the ability to manipulate a little of this, right? So through using hints, recompiling, the other one is plan guides but we also want to say. Again, take it for what it’s worth but in my years as a database administrator I’ve only seen hints and guides used in two instances. And so I think sometimes particularly on the forms we kind of see that get rushed, again this idea of, “I want to make sure it never happens again so I’m going to put this very extreme process in place.” When maybe testing it out a little bit would be a better approach.

Steve: Yup. And I think in my experience I’ve seen that hints are used quite often but I’ve see that plan guides are used very infrequently. I just want to take a second to jump back to a previous podcasts where I talked about one of my things I would change around SQL Server I think was the term hints on plan guides. And that hints aren’t really hints, they are really commands. And plan guides aren’t really guides, they are commands that’s say, “You will perform this way.”

Carlos: Yeah, exactly.

Steve: I look at hints and plan guides oftentimes the first thing I’ll do on performance tuning is pull the hint out and see how it performs without it, and oftentimes things improve. But I think that they are kind of emergency band-aid type response that when you’re out of other options it maybe something to consider as a short term solution.

Carlos: Sure, and I don’t mean to say they shouldn’t be use. When they are appropriate they are appropriate. But I think, again kind of the whole rebooting the server, right? Like, “Oh it’s slow, let’s reboot it.” Stepping away from the knee jerk reaction. There are going to be instances where they are called for and where you’re going to be the hero for implementing them.

Steve: Yup, and I think plan guides are amazing, they are extremely awesome but they are extremely dangerous and I don’t want anyone to think of this podcast what we are talking about here is recommendation to say go try out plan guides for your performance tuning. If you’re hearing that you should translate to go learn all you can about plan guides before you ever try it because there are some negatives in there. If you apply a plan guide it may cause trouble when you try and recompile a stored procedure that’s being utilized or the plan guide is associated with.

Carlos: And again kind of going back to query store, that’s the flip side or other angle is going to help you understand is, “Hey, this query is running with a plan guide or a hint, just so you know.” It has been difficult. It’s something you really have to spend a time with so again, the knuckle dragging Neanderthal that I am to kind of understand, ok what’s going on here? How are these plans changing over time? So it does take some practice and just hanging in there. So you will get a little bit frustrated but hang in there and eventually it will come.

Steve: Yup. So I think just kind of a quick recap overall so basically the SQL Server plan cache is where all the compiled query plans are stored. They get compiled at the first time they are used or when someone indicates that it needs to be recompiled. And it’s kind of sized dynamically by SQL Server. You don’t really have a lot of control over that and there are some things you can do to adjust that like optimized for adhoc workloads and parameterization. And I think hints and plan guides can oftentimes cause trouble but they are kind of a last ditch attempt to try and fix queries.

Carlos: So again, ultimately we would like your feedback on this and one of the areas and kind of talking, reviewing this topic again is we would like to try to make Database Health Monitor better so we’re listening to your feedback. We’d love for you to take a peek at those reports. We’ll make sure that we put them on the show notes page and list them there. We’d like to get some feedback, so as you use them what do you like about them? What else do you want to see? How do you use them? Is there some other tool that you’re using to look at the plan cache. We would be interested in hearing from you about that and you can use any of the options of social media to leave us a comment or a thought there.

Steve: Or you can leave on the podcast show notes page as well.

Carlos: That’s right. So our episode URL today is sqldatapartners.com/plancache. Again thanks again for tuning in to this episode. If you want to connect with us on LinnkedIn I am @carloslchacon.

Steve: Or you can find me on LinkedIn @stevestedman.

Episode 102: Monitoring Availability Groups

One of the newer features in SQL Server is availability groups, which can help solve a number of business problems.  As administrators, availability groups introduce some complexity as we are tasked to make sure the Recovery Point Objective (RPO) and Recovery Time Objective (RTO) can be meet for these servers.  The complexity comes because we have different instances that must work together, but they don’t always see eye to eye.  In this episode, we chat with Tracy Boggiano of channeladvisor about how they go about monitoring their availability groups and the pros and cons of the out of the box tools.  Our discussion touches on process of  the availability group process and Tracy has posted her scripts on her bog for you use as you look at reviewing your environments.  I think you will enjoy this discussion.

 Episode Quote

“You just need to allocate the resources and play with the stuff in your staging environment and make sure you have resources”

“I much prefer having a query and using PowerShell and just running in the multiple instances

“We use a third party monitoring solution for our monitoring rather getting a whole bunch of SQL agent alerts”

Listen to Learn

– Monitoring availability groups
– Data synchronization on availability groups
– Asynchronous and synchronous mode
– A review of RTO, RPO and SLA
– Errors and agent alerts

Tracy on Twitter
Tracy on LinkedIn
Tracy’s Blog

About Tracy Boggiano

Tracy BoggianoTracy is a Database Administrator for ChannelAdvisor. She has spent over 20 years in IT and has been using SQL Server since 1999 and is currently certified as a MCSE Data Platform. She also tinkered with databases in middle school to keep her sports card collection organized. She blogs at tracyboggiano.com. Her passion outside of SQL Server is volunteering with foster children as their advocate in court through casaforchildren.org.

Transcription: Monitoring Availability Groups

Carlos: Tracy, welcome to the program.

Tracy: Thank you for having me.

Carlos: Yeah, it’s great to have another companera here with us. You are going to join us on October for the Companero Conference and so we appreciate that, and for you being on the show today.

Tracy: I appreciate you having me.

Carlos: Yeah, and I’ve been telling Steve we’re going to get through all those ChannelAdvisor folks. Now, we get to cross another one off our list.

Steve: Are there any DBAs there that we haven’t talk to yet?

Tracy: There is a couple left. Yes.

Steve: Ok.

Carlos: In fact, one of them I actually, so I met when I was down there in. He is a big fan of PSSDiag and so I talk with him about maybe coming on and seeing if he could convert me.

Steve: Oh, that would be interesting.

Carlos: Yeah. I’m like, it seems like a little. Anyway, that’s for another episode.

Carlos: Oh boy! Yeah, I’m not sure. Not today’s episode.

Steve: Speaking of which today’s episode is on monitoring availability groups.

Carlos: Yeah, that’s right. I think this is going to be near and dear to anybody who is trying to setup availability groups and ultimately that idea that you have DR scenario/situation and you want to make sure that the data is getting over to the secondary node. That you’re not going to lose that data that you kind of meeting your SLAs and so being able to ensure that just makes a lot of sense. And so, Tracy, it might be helpful why don’t we take a minute and just review that data synchronization process as we talk about availability groups? What are the components and what are some of the pieces that we need to be looking at?

Tracy: Ok. Well, the first step is that the log has to flush to the disk, and you have a log cache that it caches the records into that it has to send over to your secondary server. And it is stored in this area called the log capture area and it sends across the network, and at some point it gets an acknowledge commit whether you are in synchronous at asynchronous mode and that depends on what mode you’re in. Once the log is received on the other side it is stored in another cache into disk. And on the other side you have a redo thread that sits there and replace the pages back into disk. And you have a performance counters that capture your logs and queues, size, and the rate that it’s sent, and you also have the redo size and redo sync rates that are all captured and performance counters on both sides so you can monitor those.

Carlos: Right. Now, so we talk about asynchronous versus synchronous, right? So when I send it over, writes to that second node and then writes to the log, that hardening process. If it’s in synchronous, once it writes the logs, is that when I get the acknowledgment back? Or does it actually go through the redo process before it will acknowledge that?

Tracy: Once it hardens to the log it is sent back as committed. And when you’re in async it sits and hardens on your primary it’s considered committed.

Carlos: Ok.

Steve: So when we are working with that what are the things that we really want to monitor or keep tracking to make sure things are working well.

Tracy: One of the first things you want to make sure you don’t have is a lot of network latency. A lot of times this technology especially if you’re in async mode you’re looking at a DR situation or you had your secondary site in a secondary location like an AWS or different data center. You don’t want to have too much network latency. The other being that you want to make sure that your secondary storage aren’t slow and not able to process the data as fast as you need them to. Some companies like to go skimpy on their secondary servers and not have them as their primary servers. And they can get behind just because they don’t have
enough memory or CPU resources. So you want to keep an eye on those redo queues and make sure if it’s actually able to keep up or not. And that’s where it’s important to keep up with your SLAs and make sure that you’re actually meeting those or not.

Carlos: Right. Now, this is very interesting concept because you think, “Ok well, all that secondary server has got to do is process the log”, but on busy systems that can be quite a bit of work. And to keep up with that and again, ChannelAdvisor, you guys get to play with all the cool toys that some of the higher transaction rate systems you actually needed a pretty hefty server, secondary server, just to be able to keep up with that even though it wasn’t actually doing any “OLTP” workload.

Tracy: Yes, we had a system with a 256GB of memory on the primary side that we still needed 64GB of memory on the secondary side just to process the redo log to keep it current. I mean, we still are able to take 25% out. You know, we try to go less than that and we were trying to run in the cloud which can be a little bit more expensive and we weren’t able to do it. We started off at 16GB and it just wasn’t performing.

Carlos: Now, was that synchronous or asynchronous?

Tracy: Asynchronous.

Steve: So then when you say it wasn’t performing there, was it just that the continuous load was so great that it couldn’t keep up? Or was it that there were big sets of transactions that were kicking off at some point in time that were causing it to backlog and not catch up?

Tracy: We just had too many transactions going. We were hitting, 20,000-30,000 transactions per second and it just backing up on the secondary side. One we bumped up the memory it’s plainly a memory problem just trying to read all those pages into the buffer pool so it could update it. And once we bumped it up to 64GB it was able to keep up.

Steve: So then with that secondary backing up, if it’s backing up to the point that it can never caught up, what is the outcome there in the availability group? Does it eventually crash because of that or is it just very delayed?

Tracy: It’s just very delayed. It’s behind to the point you are not able to failover if you’re meeting your SLAs. In this instance we were just testing our AGs for ourselves. It wasn’t a production instance for us as far as we wanted to failover to it. It was just us testing our AGs on a production instance but not for a disaster recovery situation. But for us to see what we would need in order to setup a DR situation and we discovered that, “Hey, we’re going to have to allocate some resources to it.”

Carlos: Alright, now that’s another interesting question because you are failing over and it was just your test, so when we talk about not keeping up what is your window there? Is it 1 minute, 5 minutes, an hour, a day?

Tracy: Our service level agreement is set to 4 hours. But we were finding that the server is up to a day behind.

Carlos: So again, just to put in some of the perspective some of that, right? I wasn’t like you were asking for milliseconds or something like that. I mean 4 hours is significant, right?

Tracy: Yeah. I say you just need to allocate the resources and play with the stuff in your staging environment and make sure you have resources. We played in staging but we don’t have the transactions per second on the current staging that we have in production.

Steve: So I know when you’re monitoring presentation there are a few acronyms that you brought up along the way being RTO and RPO and SLA and how those apply in the availability groups scenario. Can we maybe talk a little bit on those and cover what that means to the availability group with the RPO and RTO and how
those apply to SLAs.

Tracy: Yes, the RPO is your Recovery Point Objective and that tells you how much data you are able to lose so that measures how many megabytes, or gigabytes, or data you are allowed to use so how much data has change that you can lose. Your RTO is how much time you can lose, so that’s how long it is going to take to do that redo on the other side. So if it says it’s a day behind that’s a day you have to wait for that redo log to play. These two are measured differently. You could have a gig of data behind but it will only take an hour to replay it or it could be the reverse, it could be a day behind and a gig of data to replay. It depends on how your system is setup. And those two combined to make your SLAs, your Serious Level Agreement, what your businesses agreed to allow you to lose data and how much time they are allowing you to recover that data.

Steve: Ok, great, so when we look at the built-in dashboard for availability groups. How well does that really do the job for you to be able to monitor and know what’s going on?

Tracy: When you are looking at a single AG it pretty much does the job for you. If you’re in our environment where you have 100+ databases, 100+ servers and you’re looking to have AGs on all of those, not so well. But you have to look at, you have to connect everyday and then to see what’s going on.

Carlos: So kind of broad brush strokes, it’s cumbersome. You have to connect to every single one of them.

Tracy: Yeah, you have to connect to every server and bring up that gone down through the dashboard in Studio Manager and bring it up. That’s why I much prefer having a query and using PowerShell and just running in the multiple instances and see which ones are on trouble, and going and checking those out. But overall is you have a couple to manage. The dashboard shows you what you need to know.

Carlos: Ok, so it’s pretty good there. It’s kind of meet your needs from an individual availability group or perspective but just that once you start getting with more than one then you guys are going to start rolling your own.

Tracy: Yes.

Carlos: So I guess talk us through some of the ways or some of the things that you’re looking at to accomplish that?

Tracy: Well, we’ve used the queries that the dashboard runs in the background. And one thing we’ve created is some PowerShell command lines that we can run and then it returns in a data grid and then we can sort and filter. And we can just run that again to any of our AGs that we have and fun data. We also have more focus onto our team who loads a lot of stuff into Grafana for us so we can see data.

Carlos: Yeah, you guys likes to meddle, who likes to tinker.

Tracy: He has our performance monitor, counter talking about before our logs and queues, our redo queues and the rates all logged into Grafana for us so we can go there and view them by instance if we need to. So we’ve got a couple of different solutions that we have that we run with right now.

Steve: Yup, so when we talk about your PowerShell scripts that you’ve got there. Are those something that’s internal and private or is that something that’s available for other people who take a look at.

Tracy: I actually have a blogpost to my website that you can pretty much plug in any SQL script you want into and run against multiple SQL Server and turn it into grid, so that you can run any troubleshooting script you want and it will run across any list of SQL Server that you provide and return it in a grid.

Steve: Ok, we’ll have to include that in our show notes, link to that.

Carlos: Well, another kind of interesting we’ll call it a hack because I like to call things hacks today. But that little cheat if you will of have a SQL Server 2 something. You pull up the manager, or the wizard, or the dashboard, whatever and then you pull up profile, I’m assuming or extended events, if you so choose and then you just figure out how it’s getting all its information and then you use that down the line.
I think that’s a great way to be able to grab that stuff and understand what the SQL Server team has put together to look at.

Tracy: Yeah, I saw what I did. First, I pulled up the dashboard. I’ve seen it only comes with like 5 columns and it’s like that’s not very useful. It has a add/remove columns out to the side so I started looking at the columns names and I was like, “Hmm, these are the things that are useful to me and that’s what I went into the DMVs to find.” And that’s what I added to my queries.

Carlos: Interesting. So I am curious and maybe it’s different because I know you guys are on AWS. So in Episode 76, we had Jimmy May on and he was talking about some of his testing experience and the improvements they’ve made particularly in their redo log and whatnot. It just made me think are there scenarios or real situations or is there anything that you can do so let’s just say maybe the scenario would be, I guess it’s not really a cluster, so patching. This may not work but I want to use the patching scenario. Maybe not exactly what I’m looking for but you need to failover. You need to move from one node to other for whatever reason. And let’s say that you’ve gotten behind, what are you going to do or what kind of things are you looking for to help you get back up to speed? Does that make sense?

Tracy: Yeah. In our situation, the AGs we currently have it run in production are synchronous on premise ones because we don’t have our DR stuff that’s setup currently for AGs. We have a different solution for DR at the moment. So those are synchronous and up to date but if you were doing an async environment because we have done this to do migrations. And to migrate from 2014 to 2016, we use AGs, so we set up AGs as asyncs and then when we got ready to failover we set them in syncs. Like the day before so that we get caught up and then we did the failover. It would be the same though with patching. You would go ahead and patch your server, set it to sync, but it force it to catch up, and once it’s caught up you’d failover to the patch server.

Carlos: Got you, ok.

Steve: Ok, so then I guess one of things that I’m interested in is that with the monitoring that you’re doing it sounds like you’re doing your own beyond what the built-in is just so you can get across multiple servers all at once. But are there additional things that you’re monitoring that aren’t part of what you would see through the normal dashboard there.

Tracy: The only thing I say I’m doing differently is the rate. The rates in the dashboard aren’t updated and laugh because Microsoft only updates those when data is being sent across. But if you’re not actively sending data the rates aren’t updated so you got some monitoring around your rates. It may not always be accurate so I do the monitoring base on the performance counters DMV rather than the DMV that just stores the rates for availability groups. So I hava a as part of my downloads for my presentation I have a thing that captures it, captures it again and actually calculates the rates that you can see the difference between what DMV for availability groups say and what rate actually is. Just in case you need to know what the rate or what the real rate of actually use as occurring. But other than that it’s pretty much straight what the queries are running on the dashboard.

Carlos: Ok, great.

Steve: Now, there is another feature or option that is available to us to be a little more proactive if you will as far as letting SQL Server notify us when it has a problem. A lot of us are going to be familiar with the alerts. I don’t know if they are called agent alerts but these are the 19-25 or 20-25 and then those like 18-32, 18-24 etcetera because of problem breathing from disk and things like that. SQL Server can say, “Hey, I had this problem, just thought you should know.” We have a
couple of those for availability groups, right?

Tracy: Yes, I think there are about 5 of them. There’s an error 1480 that lets you know if the server failed over. There is an alert for when the data movement has suspended for some reason. I’ve seen it suspended because the disk is out of space for example. There is also one that will take when it resumes. So if it decides to resume by itself or somebody actually puts a button and resumes it. It will tell you if the AG goes offline for some reason and if you’re in synchronous mode it will tell if the AG is not ready for an automatic failover. You got to set up for automatic failover. All those numbers are available using my, you can look those up online or download my slides.

Carlos: Sure, and I suppose something like that, not keeping up some tinkering will need to be involved to kind of make sure that you have the right secondary formula because I would have imagine you don’t want to be getting a lot of those emails. You get a couple of like, “Ok, more memory than the secondary”, or something, right?

Tracy: Yeah, that’s why you wouldn’t configure your secondary for automatic failover if you didn’t want to know if it wasn’t ready for automatic failover for example.

Carlos: Well, fair point. And I guess the automatic failover. Those are probably going to be a little more similar in size.

Tracy: Yeah. You definitely don’t want to have those set up exactly the same. That way when it failed over you got the exact same performance going on because those are going to hopefully happen at 3:00 in the morning. And you necessarily wouldn’t get paid when something fails over automatically either because hopefully that’s what you wanted so you could sleep. That’s why we invented AGs, to sleep.

Steve: Yeah. Nice, so when you’re monitoring your availability groups are there any weight statistics that you normally take a look at or keep an eye on just to see how things are going?

Tracy: There are a few that you can keep an eye on. The one to watch out for is there’s one the HADR_SYNC_COMMIT to tell if it’s taken a while to commit on your side or not. I’ve seen that one pop up when it’s just waiting to commit. We know so a lot when we were low on memory. The other one is write log. That will typically occur if it’s taken a long time to write a log on your secondary. Other than that I haven’t seen any in particular there pop up in our environment.

Steve: Ok, so with the sync commit is that when it’s doing the synchronous or asynchronous mode and it’s doing the commit and you wouldn’t see that if you’re running an async mode.

Tracy: I’ve seen it in async and sync. It’s just whenever it’s doing a commit.

Steve: Yup, got it. Alright. And then as far as any extended events, is there anything we should be aware of or keep an eye on there when we’re working with availability groups.

Tracy: Mostly extended events it creates always on health session by default for always on and everything you need is pretty much in there as far as events that you need including all those agent alerts that we’re talking about.

Carlos: It creates a separate event, separate monitor or it just includes them in the default extended event trace.

Tracy: It creates a separate one for you called always on and it includes all the events that you need to know including all those alerts, the agent alerts we were talking about. Those are all included in there and as part of the demos that I do at the presentation. I’ve got queries that query those out and we actually as part of our internal monitor just query those directly rather than receive SQL agent alerts. We use a third party monitoring solution for our monitoring rather getting a whole bunch of SQL agent alerts.

Carlos: Yeah, that’s right. I mean once you get so so many instances you’re going to want kind of a complete solution there.

Tracy: We don’t want so many emails.

Carlos: Yeah, exactly. Well awesome, Tracy, great information on availability groups. We do appreciate it.

Tracy: No problem.

Carlos: Should we go ahead and do SQL Family?

Tracy: Sure thing.

Steve: So Tracy, how did you first get started with SQL Server?

Tracy: Well, I was a developer at a company that wanted some reports and Access wasn’t good in it as far as querying stuff so I imported some data into SQL. Well, I had to learn a lot of stuff. I then I just kind of got hooked ever since. I took some training classes on it and next thing I know I was a DBA. I’m not a developer anymore.

Steve: Very nice. It’s interesting to see how often Access is the lead into SQL Server for people.

Carlos: Yup. Tracy, I know you guys are using a lot of features over there at ChannelAdvisor but if you could change one thing about SQL Server what would it be?

Tracy: Right now, I wish hackathon would quit writing so much to my error logs. I’m getting gigabyte log files every night on my system drives and they’re driving me crazy.

Carlos: Oh wow! That is a lot of data.

Tracy: And it’s just informational messages.

Carlos: And there’s no flag or nothing to do to turn that stuff off.

Tracy: Nope.

Tracy: I’ll stick with that one for right now.

Carlos: And so what’s your say about that? Surely you’ve, he knows about it.

Tracy: I don’t know. Brian has mentioned though. I’m not sure what Microsoft said about it yet.

Steve: Alright. What is the best piece of career advice that you’ve ever received?

Tracy: Just to always keep learning stuff. Feel change a lot in IT. What you know now is going to change. As you can tell SQL Server. We’re getting releases like every year. And now we go learn a new operating system.

Carlos: Yeah, never stop learning. Tracy, our last and favorite question for you today, if you could have one superhero power what would it be and why do you want it?

Tracy: I want to be able to teleport.

Carlos: Teleportation, ok, and why is that?

Tracy: I spend a lot of time in my car doing volunteer work and it will save me a lot of time.

Carlos: There you go, very good, awesome. Tracy thanks again for being in the program with us today.

Tracy: Ok, thanks for having me.

Steve: Yup. Thanks, Tracy.

Episode 100: Role Reversal

Something a bit strange happened in episode 100–almost like something out of the twilight zone, but don’t take our word for it.  Check out our latest episode as tell some of the stories that led up today.

SQL Server Podcast

Transcription: Listener Q&A

Kevin: Hello friends, welcome to the SQL Data Partners podcast. The podcast dedicated to SQL server related topics which is designed to help you become more familiar with what’s out there, how you might use those features or ideas, and how you might apply them in your environments. I’m your host Kevin Feasle. Today on the podcast we have two special interview guest, Carlos L. Chacon and Steve Stedman. Carlos and Steve, welcome to the program!

Carlos: Oh, thanks for having us, Kevin.

Steve: Thanks, Kevin, this is exciting.

Kevin: Absolutely, so we’re on Episode 100 of the SQL Data Partners podcast and oddly enough you also have a podcast. It’s weird how that works, huh.

Carlos: Yes, there is an interesting turn of events here.

Kevin: Carlos, what made you decide to start podcasting?

Carlos: Wow, that’s a great question and I guess I will say that the front if I knew how much time and effort it was going to take I don’t think I would have started it. So I knew that I wanted to engage other folks and start talking about SQL Server in a kind of a long form way. I’ve been doing a bit of blogging. Ultimately, looking to help my consulting practice or re-launch it really in a way. And so that kind of content marketing, so taking the long view of having content available to people kind of interact with find, search engine optimization, things like that. I’ve been doing some blogging. I tried to do some videos and just found that difficult. At that time there were only two SQL Server podcasts in iTunes which is the main place where people go to find podcasts and then there are lots of apps that will carry the podcast that are in iTunes and Google Play has come out there. Ultimately, I thought, “Gosh! There is only these two.” And Greg’s SQL Down Under hadn’t new episodes hadn’t been there for a while and so I take in John Lee Dumas’s course on podcasting and thought, “Hey, you know what. Why not, right?” Let me jump in, let’s see what happens here. I guess I will try to do 10 episodes. So before I actually officially launched I’ll do 10 recordings, see if I like it. See if I can actually get 10 people to sit down with me and talk. And what’s weird, so I started the podcast when I was in Costa Rica. I took my family over there for two months and we were down there. And while I’ve done my first interview actually in Argentina at a SQL Saturday, I kind of officially started doing interviews in Costa Rica. So that’s the kind of the long answer to why I started it. I though there weren’t very many people at that time doing podcasting and I thought I would give it a try and kind of see what happened, and wanted to commit to do it for one year.

Kevin: So when it comes to things that are time consuming, things that are kind of beneath the iceberg, what are the most time consuming parts of creating a podcast episode?

Carlos: Steve, I think you will attest to this. The first is just getting use to hearing your own voice.

Steve: Sure, and realizing when you’re doing that that you don’t always have to do it over again and just because it doesn’t quite sound the way you were hoping it sounded.

Carlos: Right. You don’t have that ability to do the editing, right? In the written word you could edit it, “Oh, that doesn’t sound quite right, let me go back. Let me tweak that.” As you do that audio-wise, just hearing yourself repeat the same thing over and over gets a little cumbersome. You know, trying to remove all the “uhms” and “ahs” and whatnot. In the beginning I wasn’t using an editor, I am now. That actually happened in Episode 29. That started happening and so I would have to edit my own. So first is scheduling the guest, picking the topic, creating the agenda, actually having the interview, making sure that I had questions so the prep work associated there, then editing it, writing show notes, getting links together for the show notes page; so those are some of the pieces that are involved. But the biggest piece in the beginning, again, I had those 10 episodes and I had told people that in August 2015 is when I would first start. So August came around and people started asking me, “Hey, have you launched that podcast yet?” So literally, again I was then in Costa Rica, that week I spent getting everything ready, and did a lot of editing. And that was really probably the biggest piece in the beginning that just took so long was just listening to everything again trying to figure out, “Ok, is this ok to keep?” Again, you don’t
know what people are expecting. You don’t want to disappoint the people that you’ve interviewed, you know, all those things. Those are the components I wanted to.

Kevin: Yeah, I remember really early on when we first got the other I think it was Episode 13. You had a little piece of paper where you’re writing down, ok this mini minutes, this mini seconds, that’s where somebody said something really bad. You got to cut that word out.
Carlos: Right. No, exactly, yes so I guess it’s interesting that way that processes changed a little bit. We’ve gone some good feedback from the show and now the processes, we actually just record it, you know how it is. I take it off for transcription and then I get the transcription back and I edit the transcription and then Julien our editor, great guy, will actually then edit out anything that I don’t want there. I mean, in addition to all the “uhms” and “whatnot” which he does I think a great job of. So that’s some of that how that processes changed a little bit. Because when I was doing it, yeah, I wanted to write that down because I wanted to try to speed that process up.

Steve: Just to add a little bit more on the time consuming parts of it. I mean, the one that Carlos does most of the time is the scheduling of the guest. I know that one takes up a lot of his time. But then once we have the guest scheduled it’s a couple of different recording sessions that we go through in order to get an episode out. We’ll have the session with the guest which can be one or more guest and that’s usually at least a week before the podcast airs. And sometimes this is much as 3 or 4 weeks at a time when we have a lot in the queue there. But that’s usually about, I don’t know, a half hour to an hour of preparation time we go through there to be ready to talk about whatever the topic is. And then it’s usually about a half hour to an hour of actual recording time, and that’s gives us the section that’s the part that we are talking with the guest. And then about a week before the podcast airs, usually that’s the Thursday before the podcast airs; we do our intro and closing. And that’s where we go in and we talk about the SQL Server in the News. We talk about any mentions that we’ve had out there and then we go in and sort of digest what we talk about with the guest at the end as well. I think that’s usually about an hour of time to put that together.

Carlos: Yeah, that’s true.

Steve: And then once we’ve done that part, or maybe Carlos you can jump in with any additional time but it’s kind of handed off to the process through the editor and through the assistant that we have in getting that all published.

Carlos: Right, yeah I mean, I guess thinking back just because we do have that process now which helps quite a bit but there is still each of those individual pieces to take some time.

Kevin: Oh, I can imagine.

Steve: And then once it’s out then there’s promoting it. And I don’t know I always get around doing it myself but we try and do what we can to promote the podcast through Twitter, LinkedIn or places like that so people know that there’s new episode.

Kevin: Cool, so next question. I’ll start with you, Steve. What episode was your favorite?

Steve: Wow, well, if you would ask me a couple of weeks ago I would have had a different answer but I think Episode 99 I thought was one of the favorites that I’ve gone. If I said that previously before Episode 99, it would have been the indexing episode that we did with Randolph West. But just the whole impostor syndrome conversation that we had with Mindy in Episode 99 that was different than a lot of things we talk about before and I love it.

Kevin: Yeah, I just listened to it yesterday. It was great. Well done, Mindy!

Carlos: Yes, she did a great job.

Kevin: Very much so. So Carlos what was your favorite episode?

Carlos: Gosh, you know that is a tough question.

Kevin: Choose among your children.

Carlos: Yeah, that’s right, exactly. So generally because I am a more the merrier type of person the ones that I have really enjoyed been the ones where we’ve had kind of a panel type discussion. Right, so I think about Episode 59 where we had Andy Mallon and Mariano Kovo on. I think about episode when we had the panel from the DBA Tools folks on.

Steve: Oh, that’s was Episode 91.

Carlos: Yeah, 91.

Kevin: That one was a lot of fun too.

Carlos: Right, so those, even the one that we did which ironically enough you and
Jonathan had that great interchange and I didn’t get it in the program but the ones that we do with the SQL Saturdays where we have multiple people kind of giving their input or thoughts around. I mean, again, not that the individual interviews aren’t fun but by getting different perspectives just makes the conversation flow much easier. Different things come up that Steve and I haven’t talk about beforehand and it enables the conversation to go in different places.

Kevin: Nice. So Carlos, I’ll start with you this time. What has been the most pleasant surprise for you in the making of the show?

Carlos: I think probably the continued relationships that I have been able to have with the guests. Now, that’s not to say that all of the guests are now my best friends because that’s not true. But for the most part, I’m just looking here at the list here; I have continued conversations with my former guests in some ways, shape or form, so I’ve really enjoyed that. I think being able to connect with folks that I wouldn’t otherwise have been able to do.

Kevin: How about for you, Steve?

Steve: I think it’s a lot of the same lines as what Carlos said. But I would go a little bit further to say it’s not just the guest but it’s also the guest and the listeners in that there’s been a lot of listeners who have reached out to me and connected on LinkedIn. I mean a lot of people follow on Twitter. But it’s really nice when somebody connects and you make that personal connection there and getting to know people and sort of extending the reach of who you know in the SQL community.

Carlos: I guess I will add one thing there and that is there had been more than one guest I’ve reached out to and they’re like, “You want me to do what?” I guess I’ll point one out so in Episode 45, so Wolf, up in Pittsburg. He was the nervous wreck, and I said that lovingly. He did not think he had the chops basically which is again ironic for a guy like him. So it took me a while to convince him to, “Hey, let’s do this. Let’s make it happen.” And then when he finally did to kind of see that boost in confidence it was well received. We had some comments on it so that was very gratifying as well.

Kevin: Very nice, so let’s switch gears entirely away from podcasts. Want to talk a little bit about consulting, so both of you are now independent consultants? Yes?

Carlos: Yes.

Steve: Yes.

Kevin: How long have you guys been independent, on your own or together, independent together?

Carlos: Sure, so I’ll let you go first, Steve.

Steve: Ok, and it’s a complicated answer because it has changed over as different things have happened. But I originally started as an independent consultant about 12-13 years ago. And when I did that, I mean, it was going well and then I ended up with one client that sort of ended up taking up all of my time. And then after about 2 years of being “independent” with only one client, they brought me as a regular employee, and I was there for about 7 years. And then it was about 2¼ years ago that that ended and I went back to true freelancer at that point. I said, “I don’t want to go and get a regular full time job because that’s not for me. I like the challenges of consulting and working with lots of different clients.” And then it was about, so I did that. I started my own company doing that, Stedman Solutions, and that’s been doing great. And then about a year ago, Carlos asks me to join him on the podcast. Not in any more of a business relationship than that but I joined and started helping in the podcast, and then about six months ago, maybe 8 months ago was when we decided that we would merge together between what the two of us do much more closely. Now, I still have some clients I work with under my old brand name that Stedman Solutions. But most of the
new work that we are taking on is under the SQL Data Partners brand doing independent consulting there.

Carlos: Yeah, so for me this is my third attempt.

Kevin: Third time’s a charm.

Carlos: Yeah, that’s right, third time’s a charm. In fact, Steve and I were just talking about this earlier and that is one of the things that I wanted to do is make money in the way that I wanted to make money which can be difficult. And so I kind of got fits and starts. I’ve told people before that, so originally I started consulting because I saw other consultants making very high hourly rate. And while lots of people do the hourly rate thing and that’s all very nice and great and whatnot. Just because you have a great understanding of SQL Server at least does not necessarily mean that you will make a great consultant or business owner, entrepreneur and that’s really the most important key is to stop thinking yourself as a database person and just start thinking yourself as an entrepreneur because those things are different and they get attacked differently and so that was part of my learning curve in this kind of stops and starts.

Kevin: Ok, so let’s say we have somebody in the audience who says I’m ready to go independent. Any of my employers who are listening I’m not that person in the audience. But if somebody in the audience is saying, “I’m ready to go independent and hey you just told me that being an entrepreneur is a completely different story. Well, what types of things do I need to think about before I take the plunge?”

Carlos: Marketing. So what kind of problems are you going to solve? From the tech perspective, as a full time employee, people come to us with problems whether that’s a ticket, an alert, but the work comes to us. So now as a consultant the question is how are you going to find the work and what type of work are you going to respond to, and making sure that you understand what that work is and can describe it to other people.

Steve: Yup, I think I’ll echo the same thing there. And I think that when I talk about how this is really my second time in the independent consulting where I had been doing it before and then it turned into a single client. Part of the reason that happened was at that point in time I didn’t know what I was doing and how to go out and make contact with those new clients, how to meet the new customer. And I think that’s something you can do and you can practice and work with is just who is in your network or who do you know that you can make contact with that could be providing you work. It’s surprising that there is people that I have come across I end up doing work with that I never would have necessarily considered as a perspective client in the past. But I think other things to think about for someone who wants to jump out and give it a try on their own is the security behind it.

Carlos: Or lack thereof.

Steve: Exactly, or lack thereof it. Now, I think that when you have a regular full time job most of the time there’s the illusion that it’s fairly secure. And I used that term “the illusion” because whatever happens in people’s lives, full time jobs can come to an end at any point whether it is company going out of business or a layoff or just someone knocking in along with their manager, that job can come to an end. But you generally have a lot more protection legally in different ways as a full time employee, and you have much more security, and that you know if things get slow for the company odds are that you’re still going to be getting a paycheck 2-3 weeks from now. It’s never guaranteed but with a full time position that’s pretty stable. You know that in every so many days you get a paycheck and it’s generally for about the same amount. And I think that when you go into the consulting arena that changes significantly because you run into what they call bench time or a point where you don’t have enough work for a while. And that comes back to finding your customers and marketing and reducing that bench time. But when you’ve got that bench time you’ve got to have, depending on how you’re paying yourself because the customers pay your business and then you pay yourself out of your business, you’ve got to have a buffer there so that when you do have short times that are either bench time or a period where it’s hard to get payments from clients that you can cover it. And I think it would be different for maybe a single person versus someone who is married with kids. But I know that if what I’m doing if suddenly I stop having money to contribute to my family my wife gets a bit worried about that. Alright, so part of what I do to help mitigate that is one you need to have a little bit of savings in place so that if you got a 2-week period where all of the clients decided they’re going to be a little bit late on payments you can wither that out without having a lot of financial pain right there. And then the other is, I mean around that is you’ve got to be kind of really hard with the customers when they are late. And I know that’s a challenging to do but to be able to come back and say, “I can’t keep on working on this project if you’re not going to pay.” Fortunately, it doesn’t come to that often but I think just being in a position of financial stability and I like to use the number of having 6 months of your bare minimum cash that you need to survive in the bank in order to start out doing consulting because when you start out, you are going to make mistakes. You’re going to have more expenses than you need. But there is going to be a lot of things that are challenging in that first 6 months and a lot of them are going to come down to financial challenges.

Carlos: Yeah, and I think just to echo there with Steve. Talking about that transition from the tech space to entrepreneur space so the soft skills becomes much more important there. So he mentioned kind of dealing with client payments but that whole process of just interacting with people. Once you go independent you are just no longer interacting with technology, that Idea is dead, right? Your clients are people and you have to satisfy their kind of needs first if you will.

Kevin: Right, so what point do you guys engage services of say a lawyer or an accountant?

Steve: Oh, great question. Do you want to take that or do you want me to jump in, Carlos?

Carlos: Yeah, so from the accounting perspective, from Day 1, I wanted an accountant there to at least be able to handle some of those things. So kind of goes back to economics if you have taken Economics course. You know, one country makes coconuts really well and the other one does bananas, they trade, so that is kind of the idea of hiring an accountant unless accounting is your business. Get somebody to help you with some of those things because the IRS does not mess around, at least in the United States. I can always imagine for other countries so you don’t want to get started off in a bad foot there.

Steve: And I little more on that, I mean, I don’t want to be an accountant that’s why I work in SQL Server. I wanted to thank SQL Server. If I really want to do accounting I probably would have would taken Accounting in college and gone that direction. Because of that, I mean, there’s a lot of people out there who are great at what they do with accounting and I would rather engage an accountant when it’s appropriate than try and learn all that on my own. Now, that being said, it doesn’t mean that I want to be completely illiterate on the accounting and financial side either. And I think that there are some tools out there like QuickBooks online that make it so that a lot of the stuff that you might normally need a bookkeeper for that you can do yourself. And then you can engage an accountant when it comes to tax time and all the appropriate times that you need to use an accountant there. Interesting story I mean on this when I first started back into freelance a couple of years ago I engage an accountant that gave some really bad advice. It didn’t feel quite right at that time but it came from my accountant so I believed it and then later I found out it was bad advice and that it made my first year’s taxes very challenging to get done that year. And looking back I don’t work with that accountant anymore but I work with accountants and I do a little bit more checking backgrounds and get a better understanding of who they are before working with them.

Carlos: From the legal side, generally, that’s just in the review process so it’s going to vary state by state and of course obviously country by country what the requirements are for setting up a business. Generally, so at least with me I had an attorney just kind of review some of those things or at least consult to make sure I was doing the right things. My accountant actually helps quite a bit with some of the legwork to help reduce some of that cost.

Steve: Yup, and I think that, I mean the key is use lawyers as needed. And I think there’s a lot of people who gripe at lawyers in what they do but when the time comes when you really need a lawyer. I mean again I don’t want to be a lawyer myself. I don’t even want to try to attempt that. But it’s good money spent usually because you’re in a position that you have to use a specific expertise that you don’t have.

Carlos: Yeah.

Kevin: Ok.

Carlos: And nothing else again kind of those soft skills relationships you want to be on speaking terms with someone before you have a need for their services. You’ll want to shop that around or get somebody you’re feel comfortable with rather than somebody that you have to have because you have no other choice or alternative.

Steve: Yes, that’s a very good point.

Kevin: Cool, so let’s talk a little bit about Database Corruption Challenge. Steve, what made you come up with this idea?

Steve: Wow, alright, it was interesting and I think that there is a lot of detail that Carlos actually asked me on this on Episode 12 where I first was on the podcast. It started out initially because I do lot of blogging on SQL Server topics. It started that I wanted to share some of my knowledge about database corruption and fixing it and I started writing a blog post about how to fix corruption by pulling data in from non-clustered indexes to try and figure out what was missing. And I realized that anybody could do that. I mean anybody could write a post like that so I thought, “Well, I change it up a little bit.” I’ll go and actually create a single corrupt database and I’ll put that in the blog post as a training exercise to see if somebody, to see people interested in trying to solve that. That was a Saturday. I think I did that on a Saturday morning and I threw it out. I put it on Twitter and a few things. I said, “Ok, no big deal. Nobody found it interesting.” And about 8 hours later though it got some traffic and that Brent Ozar picked it up and he decided he was going to jump in and solve it, and he solved it pretty darn quick. It think his story was he and his fiancé at that point were trying to head out to dinner when he saw this and he stopped what he was doing and fix the corruption before going to dinner. That might have cause a little bit of trouble, maybe been a little bit for dinner but he was the first to solve the first week of the corruption challenge and then he tweeted about it, and that sort of got the fire going there a little bit around more people being interested in it because I think he has a little bit more of reach on Twitter than I do.

Carlos: He can move the internet numbers that’s for sure.

Steve: Yup. After he solved it then a handful of other people jumped in to solve it and it’s at that point I realized, “Hey, this is really interesting. There is a lot of interest here. I’m going to do another one.” And then I kind of quickly made some rules and said, “Well, I could do this for 10 weeks.” And that was my initial plan, 10 weeks, but it turned out to be 10 competitions over about every 10-14 days not
every single week, and it just kind of grew from there. There were about 60-70 people who actively participated week after week and it just kind of evolved at that point. It wasn’t that I ever like sat down and thought, “Hmm, I’m going to build this Corruption Challenge.” It was just sort of a blog post that evolved and became the Corruption Challenge.

Kevin: Yeah. I remember it being a big deal and it’s still really interesting to go back because those corruption issues they still happen today.

Steve: Yup, oh yeah, and I think today I get a lot of traffic if you go to the stevestedman.com/corruption you can get to all the blog posts that I’ve done as well as all the 10 weeks of the corruption challenge. Check it out there and I get a lot of people that even though it’s been 2 years people are still learning from it, and I think almost everything that I cover in the Corruption Challenge is still valid today even in the latest versions of SQL Server.

Kevin: How much did you learn during that challenge? You started out obviously the first database you knew how to do that. You put the example together. When we got to some of the later databases did you know already all that stuff beforehand or did you have to go research more corruption, reasons for corruption?

Steve: Oh, yeah, I certainly did not know all of that when I started. I knew a lot of it but it’s one thing to know about a type of corruption and it’s a new another level to know enough about it to go be able to create it in the database that can then be backed up and distributed to people to try and fix it themselves. And there was sometimes where I thought, “Ok, well here is something I know what the corruption is but it took me 4-5 hours to go and actually build a test database that had that kind of corruption in it.”

Carlos: Right, and then to make sure that, you know, can I fix this. Is this fixable, right?

Steve: Yup, and then I think that the people who participated actively in the Corruption Challenge were incredible to be able to learn from. And I know that the participants in the first few weeks were very helpful but they were also very critical in a positive helping kind of way if anything that I tried wasn’t quite right. And there was one or two of the weeks that I put out a corrupt database and then somebody pointed out some flaw and then I have to go back and correct it in order to make it so it could actually be fixed someone.

Kevin: So of the solutions that you got, what was the most unexpected and interesting solution?

Steve: The most interesting and unusual one that I came across was Patrick Flynn, and I think he is from New Zealand. And I think it was for week 4 or 5 somewhere around there in the competition. It was one that, it was a particularly nasty corruption scenario but what he did, and one of the reason I loved it because I like CTEs, and I actually wrote a book on Common Table Expressions a while ago but it really use CTEs creatively. It was one that I actually adapted and I use it in my presentation at PASS Summit last year on database corruption. But what he did is using some temp tables and CTEs; he was able to use the DBCC page command to extract all of the data in horrible binary format into temporary tables. And then from there used CTEs to manipulate and extract all the data out of those temporary tables and reconstituted into INSERT statements to rebuild the table from scratch. I mean, if we had an hour I could walk you through the demo how it works. There were a lot of really awesome solutions but that’s the one that just jumps out at me as, wow that one was vastly superior. Not vastly superior, it was the most interesting and the one that I enjoyed working through the most. Part of the process when I did that challenge was, it was a competition people would see who could be the first one to solve it so I would throw the Corruption Challenge out there and then usually after Week 2, within about an hour, I’d start getting people submitting solutions and I would have to go through and confirm that their solution actually worked. And that one probably took me the longest amount of time to understand that it worked because it was so interesting and I just wanted to dive in and totally understand every single thing it was doing. I love that example, that’s my favorite out of all of them.

Kevin: Very nice. Let’s switch gears again, we’re going to talk about a very nice conference. Carlos, why did you pick such a hard name to pronounce for Compa Con?

Carlos: Compa Con. Yes, well I didn’t consult you, number one. And then I guess have you tried finding a URL lately, right?

Kevin: This is true.

Carlos: Ultimately, this is an extension, will be honed to be an extension of the podcast. This idea of bringing people together, talking about SQL Server in different ways, you know, ways that people might be using today or think of ways they haven’t consider with new features. You know, just different ways to attack different problems. Like Patrick’s solution for the corruption challenge, sharing that type of information. And so actually before I launched the podcast I wanted a name for the people who listen to the podcast. Kind of create a sense of community and that idea of companero kind of came to mind. I put a little video of this together out on Twitter or on YouTube rather. So companero is a Spanish word for companion and as a missionary for my church I had a companion and so we were companeros. And this person, we worked together, you know, 24 hours a day and this is for 2-year commitment. And so having good companions along the road just help things goes smoother and so again that was this kind of idea for the podcast of we want to get people together to talk about helping you get from one path to the other. And Steve and I are both actually big scouters which we didn’t find out until kind of after we started talking and so that idea of being on the trail, right? You know, known paths versus unknown paths and if you have a guide just how much simpler that makes everything. And so that’s ultimately where the idea of Companero Conference came from and then we’ve been developing that idea with the hopes that people will com. Right, you get access to folks that maybe you don’t know but we’ve. Now, I hate to use the word vetting, it’s not like, you know.

Kevin: Extreme vetting.

Carlos: Yeah, everyone’s records, IRS, background checks, all that stuff know it. These are people that we feel comfortable inviting I guess is the word to share because they knew they would be willing to share some of their experiences and do so in a way that would be positive for those who came. We hope that people will come, get some short experiences, get some help, would be able to ask questions with things that they haven’t yet face. But also then be able to when I get to a trail or scenario that they haven’t experienced before that they’ll be able to reach out and ask more than just Google.

Kevin: So, Steve, what are you looking forward to with Compa Con?

Steve: The biggest thing I’m looking forward to there is being able to meet more the people that we interact with on the podcast and meet them in person. I mean, and whether it’s the speakers that are going to be there or the attendees as well. I mean, I’m excited about the business venture of course in doing the conference but really what it comes down to is getting to know the people. Yeah, that’s it for me there.

Carlos: Alright, I will say one another thing and that is I remember again being a full time employee and not using my training budget normally because the budget was not high enough to go to some of these other conferences like PASS Summit that required to travel across the country and things. And so we wanted to, it’s like could we create something that people could afford within the budgets that they have and still come to something that’s not somebody opening up a book and you’re getting That’s not helpful, I mean. And so that was another element to that is again through the listeners they were getting value out of the podcast. We thought, “Ok well, what value can they get when we get together and can they leverage some of those budgets in a way that it will get approved, that meets the criteria of a conference and also allows them to expand their network a bit.”

Steve: Another thing to add to that that I’m really excited about too with the conference is the office hours concept. I think that quite often you go to a conference, you sit in a session for an hour or half a day or whatever it may be with the speaker and then when that’s over, it’s over. You go back to work a couple of days later and you try and use some of the things you’ve learned. Whereas with this we’ve nearly end of the conference we have an office hours slot where you will be able to meet with any of the speakers that are there to be able to discuss, or talk, or find out more about the topics that we are covered in their presentation. And I think to me that seems like a lot of fun.

Carlos: Yeah, and because the way the setup is we’re going to sprinkle that in with little bit of hands on learning. So yeah, that will be a slightly different take because I think it will be more authentic. One of the things that we are trying to do, I hate to use the word “can”, and we’ll have some scenarios where people can walkthrough individually. But we are hoping that most of this growth is kind of organic in the sense of, “Hey, you know what, Kevin, like I know you are talking about security I’d like you to show me this security thing. Can you walkthrough with this with me?” And then people just start talking, conversations in sew and you’re getting, “Yeah, let’s take a look at that. Here is how you do this.” So still kind of “hands-on” but it’s organic.

Kevin: So the conference itself will be October 4th and 5th in Norfolk, Virginia. I hear there is something involved with a boat?

Carlos: Yes, we’re going to have an evening cruise, so down there and all of a sudden I can’t remember the name of the river but we are very close to the Chesapeake Bay. One of the rivers that shoots off of the bay and of course Norfolk is a big naval yard and there is lots of traffic in that area so it will be very pleasant and it will be in the evening, the sun will be going down so will get to go out two hours out on the boat. We will actually eat dinner there as well and have a little bit of fun. There will be a top deck open air, you can go out and just hang out, again have some conversation or there will be dancing. So there’s three levels, in the second level we will have food and dancing and the third level is just kind of relaxing, you know, enjoy the weather.

Steve: And you are welcome to come along even if you don’t want to be part of the dancing.

Carlos: Yes, that’s right. We want to be very introvert friendly and so while we can’t get that third section just to ourselves. If it’s everyone’s intention we can definitely go over and push everybody outside.

Kevin: I’m claiming the nice spot against the wall. So sounds it’s going to be a blast. How about we talk about SQL Family now?

Carlos: Let’s do it.

Kevin: Ok, so how did you first get started with SQL Server? I’m going to start with Carlos for this one.

Carlos: I think I have the atypical answer, the accidental DBA kind of fits, so I want to be in networking. Networking is what I wanted to do. I did an internship for Cisco Systems. The company that I’d work for was purchased by Cisco Systems and so I wanted to do networking. That’s what I wanted to do. I went to college, I wanted to get my CCNA, all that stuff. My first job was working for a small consulting firm both kind of doing their internal IT. So it was 15 consultants, so I was doing things like email, networking, security and then setting up environments for the consultants so they could test things and whatnot, and SQL Server kind of came along with that as they were doing some of the applications. One of the consultants leaves and goes to work for the State and he calls me a couple of months later and he’s like, “Hey, they have this database administrator position. I think you should apply.” And again I’m harking back to my college days so I took two database courses. I hated both of them. It was adjunct faculty, felt very and I was like, “No way. Like, you’re crazy, right?” And he call me back and he’s like, “Hey, we are having a hard time filling the slot like I think you should consider.” I was like, “I don’t even know how to be a DBA. Like, I don’t really know anything about it.” And he’s like, “Well, this is what they pay.”
And I was like, “Oh, interesting.” Again I was at a job right out of college. I graduated in 2002 right in the end of the .com bubble so I felt fortunate actually to have a job at entry level. And so I said, well you know what. It was a significant jump from where I was. And I said, “Ok, I’ll do it.” They had SQL Server in Oracle there so they had an Oracle DBA and I applied and got the job and so basically went to the Oracle DBA and say, “Hey, how do you this?, and he showed me. And then I have to go figure out how to do it in SQL Server. That’s kind of how that started.

Kevin: Interesting, so how about you, Steve?

Steve: Well, just to echo one of the same thing as Carlos said with databases and classes in college. I had a couple of databases classes in college and I hated them. I could not stand database work the way that it was taught in the university at that point in time. But while I was in college I ended up getting a 9-month long internship working at Microsoft and this was in 1990 when Windows 3.0 had just released and just to set the timeframe there. And everyone they get hired was like in computer science and all from the local universities. They were brought in to work in tech support for Windows 3.0 right after it was released. And I learned a lot there but I didn’t want to work in tech support, and I wanted to be a programmer. And so I did everything I could to try and move from that position and I ended up taking on or working with a couple of other people in an internal project to go create some tools that were needed for the tech support team to use there. And lo and behold there was this database thing that Microsoft had just started selling that they suggest that we use and I never heard of it. And I said, “Well, what we need to do to get you speed on this is send you to Microsoft University which was an internal training course they had then. And for a week long class on how to use this thing called transact SQL. So on December 12th of 1990, I received a certificate that said, I’m qualified to use T_SQL.

Kevin: For the record, I do not have that certificate. I got qualified.

Steve: Yes, and so that’s sort of an Easter egg that I put on my blog. My parents found this in their house, this certificate like 20+ years later, and they gave it to me a couple of years ago and I scanned it in and I put a copy of that on my blog as a blog entry from 1990 even though blogs didn’t exist in 1990. Alright, if you check out stevestedman.com, you can scroll back in time and find that there if you’re looking for something that’s maybe a bit funny to look at. But anyway, so that was a 9-month long gig at Microsoft and then I went back to school and I went to do another internship and back to school and on the jobs and all that. And it seem like every job that I ended up at I ended up needing to do something with SQL Server. And then it just sort of evolved into more and more database work and I finally realized I didn’t want to be a programmer; I wanted to do database side of things. I mean, I still do programming but it is all database related programming now, and it just evolved into the DBA role and I had other jobs along the way like I ended up as a CTO at one point and I realized I don’t really like that as much. I want to go back and do more database work. Started all at Microsoft in 1990 and it just kind of evolved from there.

Kevin: Interesting. So sticking with you, Steve, if you could change one thing about SQL
Server what would it be?

Steve: The way that the check_db command works. Meaning, when it runs it goes out and scans your entire database to make sure that there is no integrity issues there, no corruption. And the problem is a lot of people don’t run that because it takes too long to run. And if there was way to say what I want to do is I want to run check_db but run it for an hour and check as much as you can possibly check and then keep track if that and then tomorrow night I’m going to kick it off for another one hour and continue that check process. That would really be a cool change that would probably help a lot with the way that people do their database checks. I know there’s ways to sort of simulate that by saying I’m going to check some of the tables but if you get to the point where you got a database with just one gigantic table, a way to run it for certain amount of time and then pick up later would be pretty awesome.

Kevin: Makes sense. Carlos, if you could change one thing about SQL Server what would it be and why would it be expanding in PolyBase?

Carlos: Yeah, you took the words out of my mouth there, Kevin. Yeah, you know, it’s funny so I was thinking a little bit about this so I went and answer some of this in Episode 0. But we’ve changed the SQL Family questions since then so this is not something that I guess I’ve had to address and of course I think one of the big things we’ve talked about, so SQL Server setup, even some Episode 98, right the first things you change, lots of things in there. As I was thinking about this, Steve and I were talking, so I’m not a user of it yet but I guess it makes me nervous so I guess I’m not sure there is quite a change yet but something that I hope that they do and that is with the introduction of services or languages like R and PolyBase and who knows what’s coming that they give me the administrator. The knuckle dragging Neanderthal that I am who is not a great programmer, you know, trying to not drown in PowerShell. Give me good tools so that I can understand and be able to react to when other people are using those languages in the database. I realized that’s kind of a tall order but help me help other people because I’m a bit nervous about some of that adoption as it increases.

Kevin: Ok, so sticking with you. What is the best piece of career advice that you have received?

Carlos: I’m not sure if it’s the best but the one that often comes back to and that is, “The money will come.” So when I graduated in 2002, that first job I was making roughly 25% lower than I thought I would be making coming out of college. I was a bit frustrated, right? Even when I moved after a couple of years, in fact that job I took as a Database Administrator position, they actually lowered the pay grade because they couldn’t increase my salary by a certain percentage to fill this thing and so. Anyway, I felt like I was, you know, that initial job my wages were lower than I wanted to be and I was expressing some frustration and the comment was, “The money will come.” If you’ll do the best that you can, invest in, kind of harkening back to our. Well, in an episode that hasn’t been released yet, so Episode 104 we’re going to talk with Eugene that idea of going deep. So go deep and get to be good at something. Get to be good at solving a problem becoming that go to person in your organization for certain problems and building trust and then good things will happen. You know, I’m not a millionaire, there is a limit there. However, we were talking about family I have 5 children. My oldest just turned 15 and my youngest is 2 so some of this risk and some of these other things I have to consider them as well. But as you continue to plot along, as you continue to keep your eye on the ball, whatever cliché you want to use there, then good things will happen and I think that has probably been the best piece of career advice there.

Kevin: Got you, so how about you, Steve? What is the best career advice that you have ever received?

Steve: I think the best advice and it kind of comes down to two but the first one is, “There is no such thing as can’t.” When somebody tells you that they can’t do something or that you can’t do something because technically it can’t be done or whatever that’s just an excuse to go and figure out how to do it. Now, maybe there is an exception to that if there is like personnel rules or things like that and
say you can’t do this things, yeah you should follow those. But when it comes to technology when people tell you that something can’t be done, I’ve always looked at it as a challenge to figure out, “Ok, well how can I do that?” The other career advice I think comes from Yoda, from the original, one of the earlier Star Wars movies and it talked about, “There is no try, there’s only do.” I don’t like to try things. I mean, I’ll try a new flavor of ice cream or I’ll try something new on the menu but I like to do things. And to say that you’re going to try something to me often times, like I’ll try and do that for you, I’ll try and get that job done whatever it may be. It’s kind of an excuse to say, “Well, I tried but I can’t do it, so that leads back to the can’t. No such thing as can’t and there is no such thing as try there is only do.

Kevin: No can’t and no try. Alright, so Steve, if you could have one superhero power what would it be and why?

Steve: Oh gosh! I answered this on Episode 12 and I don’t remember what my answer was but I’m going to go with time travel on this one. Because I think if you could go back in time, I don’t think I’d be interested in going forward in time necessarily but if you could back in time, I guess it would have to come forward to get back to where I am. But if you could back in time and learn from mistakes that have been made either by yourself, or from others, or even hundreds of years ago mistakes that have been made just to experience and see what people have done, I think would be an amazing superhero power.

Kevin: Carlos, can you top that?

Carlos: Yeah, top that. Well, I don’t know that I can top it but mind would definitely be different. So in Episode 70, Travis Wright, he kind of brought up this. He said like, “You know, everybody always talks about kind of the supernatural but like some ability that they would possess.” And he said, “Well, I would think the ability to control technology would be very powerful because then you could get it to do all the stuff.” And you wouldn’t have to worry about flaming suits, or hitting somebody when you go back in time. Your matter dimensional is smashing together whatever. And so I think that as I thought about that so that superhero power of being able to, and I always set a camera with a movie. I want to say it’s terminator but it doesn’t seem right. I think they’re actually putting him in the car and get the ATM to spit out. But I feel like there is some movie out there that they walk up and they get the ATM to just start spitting out money. And so something like that although obviously I would do everything ethical, right, nothing immoral like that.

Kevin: Especially on the record.

Carlos: Especially on the record, that’s right. I think that would be my because then also if I could control technology, I don’t know getting some big drone or something like because previously it was flying. I figure that I can get a technology to zoom me around the place pretty quickly.

Kevin: That’s fair, so thank you very much for coming over to the podcast tonight, Steve Stedman and Carlos “skynet” Chacon.

Steve: And thank you for hosting. This has been great.

Carlos: Yes, Kevin, this has been fun. Thanks for having us!

Kevin: Alright, thanks everybody! Take care now. So that was Carlos and Steve today. It was pleasure having them on and hopefully you enjoyed. If you want to see more go to sqldatapartners.com/100 and please follow us on social media, so we’re @sqldatapartners on Twitter, /sqldatapartners on Facebook, also on LinkedIn. Review us on your favorite podcast platform like iTunes or Stitcher and we’ll see you on the SQL trail.

Episode 101: Inspecting a new Database

Listener Cody Ford wrote in and asked if we could share some thoughts on getting familiar with an unfamiliar database.  While we have done episodes in the past on best practices, this episode takes the approach of what we should look for on a server that is new to us–the components we should document and then review for potential updates.

Do you agree with our list?  Let us know on by leaving a comment on the show notes page.

 Episode Quote

“The foremost one there that I usually look at is backups because things happen and there is going to come a time that you need to use your backup”
“We have to make sure that the mail profile is setup that email can flow out of the system.”
“At the end of the day they can do anything they want and that could be good or bad. So we just want to make sure that… they need to get clearance from us.”
“I think as my experience has been the database diagrams are only as helpful as the culture of your environment.”

Listen to Learn

In this episode, we breakdown the sections or components of how we approach a database or instance that is new to us in the following ways:

1. System Availability
2. Admin Setup
3. Security
4. Dependencies
5. Performance Stats

Transcription: Inspecting a new Database

Carlos: Companeros, welcome to the next hundred. This is Episode 101. It’s going to be back with you guys. We’ve got another great episode. Today’s topic comes from Cody Ford.

Steve: Yes, today’s topic is on understanding an unfamiliar database. And his comment was, “How about a podcast on tips and techniques to understand an unfamiliar database.” Such as viewing and understanding dependencies of tables and viewing or creating database diagrams and any other tips to come up to speed on an unfamiliar database besides the knowledge transfer from other employees because you don’t always have that available.

Carlos: Exactly, so that’s going to be our topic today so thanks Cody for that suggestion. I guess one of the things that we will point out now is that, so this is going to be kind of on the investigative side versus the best practices side. So more of I just need to get the information so I need to figure what if anything to change or to be aware of versus here is what I should be changing. Does that make sense?

Steve: Yup.

Carlos: We do have a couple of companero shout outs. First we want to thank Kevin Feasel for all of his help with Episode 100. I know we enjoyed it. I hope you enjoyed it, and so thanks Kevin for taking a little bit of time to chat with us again.

Steve: So we phrase that as “Thanks Kevin for having us on your podcast.”

Carlos: Yes. We want our podcast back.

Steve: Yes, that was a lot of fun.

Carlos: It was a lot of fun so we appreciate that, and of course all the user comments. Those who contributed questions we are appreciative for that. We do have a couple of other shout outs that came to us via LinkedIn.

Steve: Yup, and I think some of these things came from Episode 99 when we started asking people LinkedIn in addition to Twitter. The first one came from Jack Rose and he said, “After hungrily devouring your wonderful SQL Data Partners podcasts for months, Episode 99 finally inspired me to reach out to you as instructed there. Not sure you intended listeners to simply follow you or actually connect personally so I’m trying both. Inspiring work.”

Carlos: Yes, so thanks Jack for listening. We appreciate you’re reaching out. You know, it’s funny the combination and that was purely by accident kind of that combination of both asking for people to reach out on LinkedIn and the Impostor Syndrome. But there was a very high correlation on the number of people mentioned that episode and like, “Well, I’ve thought about reaching out before but now after that episode I’ll do it.”

Steve: Yup, and we have another comment from LinkedIn from Chris Albert.

Carlos: Yes, so he says, “I’ve been listening to the podcast for the last couple of months and love the show. Sad to say I’ve already gone through the whole back catalog of episodes during my commute.” But he’ll keep tuning in, so thanks Chris for tuning in. It is unfortunate that you’ve managed to get through the back catalogs so quickly. Thanks for hanging in there. I know particularly in the beginning they are a little bit rough.

Steve: I know. When I joined the podcast on Episode 50 I had not listened to all of the preview episodes and I went back and listened to all of them as well. And I know it was a little bit sad when I got to the end too. But we’ve got new ones every week so.

Carlos: That’s right we’re going to keep chugging along here and we make it a commitment for another year so we’re glad that you’re sticking around. Ok, so the URL for today’s episode is going to be sqldatapartners.com/newdb.

Steve: Or sqldatapartners.com/101.

Carlos: Ok, so ultimately again the topic for today is becoming familiar with an unfamiliar database. And there’s really tacks or ways to kind of go about this and I know Cody kind of specifically mentioned an individual database. We’re now going to attack this first from an instance level, right, so this is the scenario again where you’re either new to a new job, maybe for whatever reason that database was stood up another department and now all of a sudden you’re asked to take it over, you’re consulting, you know what have you. So instead of focusing just of the database we’re going to take a look first at a couple of things on the instance level and then we’ll circle back to any kind of specific database things.

Steve: Yup, so the first area that we usually take a look at is system availability because
those are the kind of things that if you’re suddenly responsible for this those are the kind of things you could lose your job over.

Carlos: That’s right. Or get calls in the night about, all those things. And you’re like, “Ok well, let me nip stuff in the bud in the beginning.”

Steve: Yup, so the foremost one there that I usually look at is backups because things happen and there is going to come a time that you need to use your backup. And there is a lot of different ways that people attempt to do backups that aren’t always the greatest when you come to restore time.

Carlos: Exactly, and I guess we should say kind of jumping off here as well there is a lot of way to script a lot of these things out and we’re going to be talking about Database Health Monitor and how this can help us a lot with a lot of this as well. But first we’re going to dig in the why around what we are doing here. And so with the backups, some of the things that we want to know, one is are they happening and then where are they going?

Steve: Right. And that where they are going could be a pretty key thing because sometimes you take a look and say, “Backups are running everyday but they are going to the C Drive.” Not always a good thing to do. Other times they are going to a network share that gets purged every night or moved somewhere that as a DBA you don’t have access to where they are getting moved to.

Carlos: A lot of times that location also depends on how long we can keep those databases around so we may have to do cleanup because of a disk size issue when in reality we need to keep those databases around a little bit longer.

Steve: So part of the why behind keeping them around longer is that there is a lot of things that can happen that you may not catch or may not notice for a few days to a week or even a couple of weeks. And if you’ve only got 3-4 days with the backups and you find a problem that you need to pull some data out of a backup from 2 weeks ago but you don’t know about it until it’s already passed that retention window, you could be in a hot water there.

Carlos: Yeah, you would be in trouble. And the last component that we want to talk about there which kind of goes along with our next component, we’ll jump into file recovery options, so what types of backups are we doing here? When we think about file recovery we are thinking about full, or simple; or bulk log, I don’t see a lot of people using that on the long term. That’s kind of a more short term type thing. So full or simple, what am I doing with those backups for the databases based on the file recovery setting.

Steve: Yes, and I think that there, I mean the thing I see is oftentimes if you just install the database with an application or something like that oftentimes you’re not getting it in the right recovery model to able to meet your expectations on the recovery time or recovery point objectives. And I think when we talk about recovery options you can have your opinion on whether simple recovery or full recovery models are better. But it really comes down to what are the recovery time and point objectives that you’re trying to meet there and do those options that it’s currently configured for meet what you’re looking for.

Carlos: Exactly. A lot of times we’ll see databases in full recovery model but then the transaction log only happens once a day for example, the transaction log backup. And so again, this is information collecting. Now, we are going to get this information. We are going to figure out, “Oh ok, well this sounds like an issue.” We will go and have the discussion to go figure out what it needs to be but this is the first area that you want to check.

Steve: Yeah, and I think an interesting story there that I came across like a few years ago on some confusion over the recovery model option was that somebody had a database that was basically working, this was the company that I work several years ago, but it was basically working as a queue. And the data we get thrown into it and then another process would grab the data out of that and move it somewhere else. It was one of those things that it never get backup because it never had any data in it for a long period of time. However, and it was true because data would very rarely reside in that database for longer than 5 minutes.

Carlos: Oh, ok got you.

Steve: And it was SQL Express database and it was set to full recovery model. It had been running for about 4 years without ever having a backup on it and eventually the problem we ran into. You want to guess what the problem was, Carlos?

Carlos: Probably ran out of disk space.

Steve: We ran out of disk space. And this was a database that I never known was in used with this company prior to running out of disk space. But they call me and said, “Well, it’s out of disk space. Why SQL Server is being such a pig?” It’s because it wasn’t configured correctly, so that was one we had to flip over to simple recovery model and then shrink the files down or the log files down. They had plenty of disk space, leave it to simple recovery mode and never had a problem again. That was a scenario where simple recovery model was the right way to do it and it was also an environment where it didn’t really need backups because data was never there longer than 5 minutes.

Carlos: There you go. The other components of system availability are going to be disk space which we kind of just talked about, right? How much do I have left? And then the DBCC CHECKDBs, are they being run on a regular basis. Lots of ways to go about checking that but again that is just information that we want to grab.

Steve: So why is it important to run CHECKDB regularly, Carlos?

Carlos: So we could devote multiple podcasts episodes to this very question, Steve.

Steve: You know, in fact, we have haven’t we?

Carlos: Yeah, we have devoted a couple, that’s right. But ultimately looking for corruption there and I know that I’m just going to leave it as that because if we go further Steve and we’ll take a whole episode just on that one.

Steve: Ok, ok.

Carlos: So the next section then is admin setup. So kind of how are things set up from an administrative type perspective. Some of the things that I’m looking there are: what are the agent jobs, who are the owners and who are the operators of the system? Do we have male setup? Am I going to get notifications for the jobs? Right, so couple of things there. We have to make sure that the male profile is setup that email can flow out of the system. I’m looking at those jobs, what notifications if any do I need on them? Does it matter, right? Again, these are just questions. I’m collecting this information and then I want to be able to go back to the business and say look what needs to get changed here.

Steve: Yup. Another one that I checked around, not just the job owners, but as far as the SQL Server services for SQL Server and SQL Server agent. Who are those running at? That’s one that I’ve seen that sometimes you look and you realize, “Oh, that SQL Server is running as a domain user account of an employee that left 6 months ago.” And recently somebody in the IT Department deleted that domain account and next time you restart SQL Server it may not restart that or it will not restart with that user.

Carlos: Right. You know, it is funny now that I think about that. How many times, how many instances I’ve seen that have actually been the server account is an employee account. For whatever reason, that happens. The other one there in admin setup is server defaults. So again, we are talking about SQL Server settings here, something that you’d see on the advance configuration. Again, what they should be is going to be another question. But I just want to know what has been changed and let me compare to either internal documentation or expectations of a business.

Steve: So that’s where you find out that they’ve turn on auto shrink and auto close and those kinds of features.

Carlos: That’s right, and I guess we should point out that this is a scenario where particularly in the newer versions of SQL Server you can actually set some of that stuff at the database level so there will be a little bit of difference between instance and database. But for the most part I think a lot of those setting are in the older versions of SQL Server at the instance level. But there are things like the shrinking or the closing of connections or whatnot that can still be at the database level.

Steve: Yup. Ok, unto security then. I mean the biggest thing that usually comes up on security is sys admin privileges.

Carlos: And again, so this is the domain or instance perspective, who else has sys admin privileges on there? Because at the end of the day they can do anything they want and that could be good or bad. So we just want to make sure that, again if we’re responsible, if we are the person tied to that server they need to get clearance from us. We need to be ok with that idea that for not pushback on it. I think this is very common particularly when we have third party applications where we’re like, “Oh, just skip sys admin.” Which I know can kind of be a pain sometimes but I think Microsoft, as much gripe or much complaining as we like to do sometimes about some of the security, they have gone to great pains to try to make some of these roles available. I know a lot of times we see for example in monitoring tools. They want DBO, sys admin rights, when view system state, view server state. We’ll get them what they need.

Steve: Right. But I think it’s far easier when asked what permissions are needed for specific application for someone to respond, “Oh, they just need sys admin privileges.”

Carlos: That’s right because they know you won’t go wrong. There won’t be any problems at that point.

Steve: Yup. But it’s not the greatest thing to do. So I like to look that as how any sys admin log-ins do we really have? Is every single user a sys admin? And then the flip side of that is, do we have just one SA log-in and is that the only log-in in the entire database? I know I’ve seen that a few times. We don’t need users. We just use the SA log-in for everyone.

Carlos: Yeah. Again, so those are the things that we’re going to want to document and then potentially defend or say we anyway to make a change here.

Steve: Yup, and I know that’s oftentimes changing the log-in process especially if everyone is sharing an account can be very challenging to do because sometimes it requires code changes, or configuration file changes or things like that, to straighten out eventually but at least getting the understanding of what it is is a good spot to be because then you know who can hurt the server.

Carlos: Right. Well, even then I think let’s just say in that scenario where the application is using the one account. Now, password change might be a little bit cumbersome but not maybe as cumbersome as changing all of the application connection information but at least creating additional accounts for the people that connect using SQL Server Management Studio and things like that. So that at least you can take the password for that application account and put it somewhere where not everybody has it, so little harder to get to, that type of thing.

Steve: Yup. And the place that it’s harder to get to shouldn’t be in the source code that all the developers have access.

Carlos: That’s also true.

Steve: Yeah. I remember going through a DOD audit for security on a company I was working for a couple of years ago. I mean it took months just to straighten out all of the log-in privileges because prior to that happening nobody there really cared about things being secure on that server. It was simply one SA login and that was good enough for the owners.

Carlos: It’s good enough for me, right?

Steve: Yup.

Carlos: Ok, so the next area we want to take a peek at are dependencies. These are objects in the database that the database is going to use or the application is going to use. And again, we just want to know a bit about them, so the first one is user objects in system databases. A lot of times, we as administrators can be the most guilty of this.

Steve: I know I’ve done that. I’ve connected to a database and with the intension of using a specific database but then accidentally created a store procedure in the master database. And one way I help prevent that is don’t use the master database as your default database.

Carlos: If you don’t have another database and on a new server that can be difficult. I’m
even thinking about some of the community scripts, so sp_whoisactive, the blitz scripts. I mean they’re going to default to the master database and so if you don’t have one that might be something to think about as you start investigating or at least being aware of some of those things. Now, again a little bit easier because you know you’re creating them, you can clean them up but if there are other objects in there we want to know why, right? This is all in an effort because of recovery. If that database is dependent on an object in the master database that doesn’t get restored if we have to move it to a new server then all of a sudden I have problems and may not know why.

Steve: Yup. The next thing to look at there oftentimes are triggers. What are the tables that have lots of triggers or complex triggers on them?

Carlos: Yeah, exactly. And again, this is just information so that I know that when I insert a table, or I do x, then something else is going to happen. There’s another component involved there. I know a lot of times I feel like the admin audit. It seems like there is a pretty popular trigger out there so that when administrators do things like create tables, or create databases, or change users, that information gets audited and put into another table. Why if I have instances where all of a sudden that database is not available, the trigger starts failing and then you have issues. So again, you just want to be aware of what triggers are around.

Steve: Right. So it seems like we really have three layers of triggers to think about there. One is just your traditional triggers that are on the tables when things change, when things are inserted, updated or deleted. Then you have the DDL type triggers like you have just mentioned, administrators adding a table or changing a table. But then you also have the log-in triggers that can be setup on a database. That’s one that I came across not too long ago where I’m wondering why are logins are so slow. Well every time somebody is logging in it was inserting a row into a table and that had been running on a database from more than 10 years. Log-in table had never been purged. So it was one of those I asked, do you really care who log-in into the database 5 years ago or 2 years ago? And then we ended up getting rid of the trigger eventually, but for the short term fix was just truncate the table because they didn’t care about it and things were suddenly fast again.

Carlos: You know what, that’s another great example because I feel like most of the time when we get in trouble with triggers it’s because we as administrators are trying to outthink ourselves a little bit there. Then that scenario like, “Oh, let me put a trigger for log-in because for whatever reason that was a requirement. I thought about it. I learned about it and let me implement this because I think it’s going to help me. And then it can come back to bite us because if we forget about it the database changes hands ownership. That’s another thing to take a peek at.

Steve: And some of those aren’t entirely obvious until you run into a problem.

Carlos: Oh, exactly, exactly. Another one, and again so this is I guess more at the database level but going through each of those databases and then looking for disabled indexes. So we talked a little bit about the bulk logs scenario so cases when I have large imports, data warehouses. Sometimes the indexes can be disabled until the import finishes and then they can get re-enabled or rebuilt and the index can be used. Every once in a while however in that process the job fails or for whatever reason the indexes aren’t re-enabled. Those are other things that you’re going to take a peek at because basically we are storing data that SQL Server can’t use anymore and do we want to use it? Do we want to get rid of it? You know, those are the kind of questions we want to start asking with disabled indexes.

Steve: Yup, it’s just overhead that’s not used for anything at that point. And maybe you should be using it or maybe you should be getting rid of it. It really depends.

Carlos: Exactly.

Steve: Yup. So then we also want to take a look at non-trusted foreign keys and constraints. And that can cause some interesting trouble where both around data integrity and around performance.

Carlos: Exactly. We think have that foreign key and we think, “Ok, we are straight.” And
again foreign keys are going to help us. They hurt as a little bit on the insert, it has that lookup, so we get that and that’s probably why they got disabled. Again, lots of bulk inserts and things like that. But then when we are trying to do our queries and particularly the big queries where we are joining a lot of tables even though the foreign key is there SQL Server is going to go about giving you that query differently because the foreign key is no longer trusted. And you’re going to get different results based on that and so again that’s the information we’re we want to know about.

Steve: Yes, and sometimes that different result can be a changed in the query plan that goes from like a full table scan on a table versus not even touching that table based off of those trusted foreign keys.

Carlos: Exactly. And then the last one again we want to be aware of is plan guides. So SQL Server is going to look to those plan guides going to implement them in the query plans. So we just want to be aware is there anything that kind of overriding the compiler and indicating, “No, no, you should do it my way.”

Steve: Yup, but the other thing to consider with plan guides too is if somebody went gang busters and added too many plan guides. There are some issues around plan guides and being able to change stored procedure code for store procedures that have plan guides attached to them or created for them. So if you have plan guides that have been added for instance on a vendor supplied database and you’re doing an update from the vendor of that database you may need to go through and disable your plan guides prior to doing the update and then put them back in after doing an update. So plan guides although they, I mean I kind of think of them as the in case of emergency thing where you only use them sparingly. But you’d better make sure that everyone knows about them because they do have other ramifications that might not be entirely apparent when you use them.

Carlos: And I think this is going to increase because on the newer version of SQL Server with the query store if we’ve gone in, so query the whole premise of this is I’m using a plan that’s less efficient. I want to go back and use this other plan. I’m oversimplifying this, right? But basically we’re putting in, I guess not really a plan guide, but we are putting in some information to say, “Hey, use this instead of what you think is right.” And again those are going to be things that we’re going to want to know about. We may have done them, we may have executed them, or somebody else may have but we’re going to want to be aware that a change was made for whatever reason.

Steve: Yup, and you know it sort of brings me back to one of my about SQL Server, and I probably should have answered this as what would I change in SQL Server if I could, but there are two terms that I really despised. The term plan guides and the other one is the term query hints. They should be called plan commands. It’s not like a guide like, “Oh, we recommend you do this.” It’s a command that says, “You will always do it this way.” And the thing with query hints is they’re not hints like, “Oh, you might want to try this.” They are commands that say you will do this. Anyway sorry, side rant there on nomenclature but these guides are not guides. They are commands that say, “It will happen this way.” So just keep that in mind when you’re looking at plan guides.

Carlos: Yeah, and I think we are going to start seeing more of those in newer versions. And then the last area that we are going to touch on, we can now start taking a peek at some of the performance related things. Now we did mention that non trusted foreign keys can affect performance and whatnot. But here we are actually going to start collecting some of the metrics around performance, and I think ultimately what we are talking about here is establishing baselines.

Steve: Yup, and this can imply instance wide or it can apply to a single database. Either way, depending on what it is you are looking at there. I mean with query stats being able to have a baseline to understand what queries are causing the most wait statistics. What indexes are being used or not being used? What files for that database or instance are being overloaded or have very little I/O on them.
And I think that understanding that can have a huge impact on the way you look at things when you go to troubleshoot that server later.

Carlos: Exactly, and I think with the file stats, I think it is one of those eye opening for me, and let’s say you have an instance and you introduce a new database, right? So obviously the file stats on day one should be zero or near zero, whatever. Well, then you turn that thing on and you want to take a peek at it overtime and you may find out that all of a sudden these thing is going to be, you know, very chatty. It’s going to be a big hog and now you’re going to have store problems. And so again, establishing what those things are. We mention a couple, the queries, the index, and the files stats, wait stats, I guess we should include in there as well as to where I am now and then what I am doing in a week, in a month, in a quarter, you know, that kind of a thing. And of course this is again where the third party tools come into play to capturing that information. For most of this we don’t need to reinvent the wheel. There are things out there already that will do that and Database Health Monitor is one of them.

Carlos: One additional item that I guess we should talk about when it comes to investigating databases and that’s different from the instance level is actually taking a peek at the database diagrams. This could be kind of hit or miss. I think the diagrams are helpful but the tools that we have natively are great at helping us with this.

Steve: Right, and I know that if you use the database, I mean I like to stay away from the database diagrams in SQL Server Management Studio. And I think we talk about this many episodes ago but the reason I don’t like that is that people like to use it, kind of like Visio, to be able to make some changes and diagram things and print it out. But so many times I’ve seen that people don’t realize that when you go into database diagram and start drawing lines that’s actually making changes to the database in adding foreign keys.

Carlos: Right, live editor.

Steve: Yup, live editor and I’ve seen that cause problem so many times. I generally say, stay away from that. What I like to do if I want to look at database diagram is I like to use Visio. And Visio on certain versions, I don’t remember the exact Visio Enterprise or Visio.

Carlos: Yeah, it went through a weird transition where it wasn’t there for a little while. I
guess you could still reverse engineer it but some of the database objects. I think it was 2012. This goes back to our episode on the SQL Server data tools. During that transition this also was also affected by that.

Steve: Right, so with Visio I like to go in and just either on a smaller database take all the tables and import those into Visio so you can see where all the foreign keys are or if it’s larger database just bring in a subset of the tables that might be around specific areas that you’re looking at, bring into Visio. In that way you can visualize it and if you have access to like a large size plotter it’s really kind of handy to be able to print it out and have like a poster size diagram of the database on your wall. And I’m not really big in favor of wasting paper but that’s a really useful way so that when people come to talk to you about a database you can just stand up and go to the diagram on the wall and make sure you’re both discussing the same thing.

Carlos: I love that idea. I think this also harkens back to the idea of is this a third party app or is this something that we are developing in-house? I think a third party app might, there might be still some use for that but definitely the in-house development. I mean there’s, I don’t know how many conversations I’ve settled just because we pop it up in the database diagram and say, “Ok, this is what I’m talking about. Is that what you’re talking about?” “Oh no, I’m talking about this thing.” “Ok, well, there you go.” Now, I have never had access to the plotter. We always had to go like Kinko’s or whatever and they’re never quite big enough or we are piecing them together, all these legal paper and then we have to tape it together. So I’ve migrated beyond that. I think as my experience has been the database diagrams are only as helpful as the culture of your environment. Meaning, that if you’re willing to use them and look to them and adopt them as like a change management process. They become much more helpful. If it’s something that you don’t care about and you’re not like you’re making changes to the database without a diagram first being updated then it becomes less helpful, right? Almost like source control type thing, type idea. So because of that I’ve kind of moved on even passed Visio and I like the Embarcadero tool. It’s this expensive. It’s like erwin, so erwin was kind of the cream of the crop. Embarcadero kind of came in underneath but lots of the same functionalities. I think one of the things that you suffer from like in Visio is particularly once you get beyond like 10 tables. It becomes very hard to see that in one document and so that ability to kind of section them off into different areas or group them just becomes very very helpful. But again still, it is helpful I think, and the biggest ways in one, data types, relationships, and then understanding where things go, so again that idea of diagramming. This is my table, this is an object. What’s going to go in this object? “Oh, ok you’re describing something that doesn’t quite match that description. Maybe it should go over in this different table or we need to think about that differently.

Steve: Yup, oh yeah, I can think of an interesting story where I was working with a client a few years ago where we are working on some reporting queries and trying to figure out how to use some tuning or how to get the data that they needed in there. And so I went in and I thought, “Well ok, I don’t understand this database. It’s a third party vendor that provided it.” And the first thing I did is I opened it up and start looking at foreign keys and I realized that there was not a single foreign key on over 200+ tables in the entire database.

Carlos: And you’re like, “Ok, yes!”

Steve: So it’s like, ok well the only thing we have at that point is to try and guess what is intended to be treated as a foreign key is just how things are being used in queries. And I found that there are a lot of places that people were joining on values that shouldn’t have been joined on and wondering why they didn’t work. And if there just been foreign keys in the database it would have made a lot more sense to someone trying to get the understanding on what you can join on.

Carlos: Yeah, I know exactly. Ok, so Cody hopefully that’s helpful. So I guess to kind of recap a little bit there if we go back and put in some of those groupings, so first is system availability, then we have admin setup, we take a look at security,
dependencies and then performance. I guess we should add the sixth category in there and that is database diagram type of information. We’ve shared a couple of different stories there about how to do that. But again lot of just depend on your culture and whatnot.

Steve: Yeah, I want to say thanks Cody for the great question that gave us a good topic for this episode. And Carlos, I don’t think we have a next hundred episode entirely booked out yet. Do we?

Carlos: Not yet. We are shy by just, I don’t know, 99 or so.

Steve: Yes, the reason I bring that up is if anybody has any questions, or ideas, or topics that you would like to have us cover in the podcast, let us know, just like Cody did here. I will be happy to consider it for the podcast episodes.

Carlos: That’s right. I guess so speaking of the future, Tracy Boggiano who is going to be speaking at the Companero Conference. She’ll be on next week. And then we have Eugene Meininger–I’m pretty sure I’m saying that last name right. I’ve told myself I would say it right the next time I said it and actually I did that. But he’s going to be on the program as well.

Steve: Speaking of the conference.

Carlos: Yeah, we just have over a hundred days to go and so we’re looking forward to it. You can still register at companeroconference.com. August 4th and 5th we’ll be down in Norfolk and we hope to see you there.

Steve: Yup, I’m certainly looking forward to it.

Carlos: Yes, so again if you want to check out the show notes in today’s episode that will be at sqldatapartners.com/newdb.
Steve: Or sqldatapartners.com/101.

Carlos: And of course we really enjoy you reaching out to us on social media. You can do that on Twitter, or at LinkedIn. I’m @carloslchacon.

Steve: I’m on LinkedIn @stevestedman, and on Twitter @sqlemt. We’ll see you on the SQL trail.

Episode 99: Imposter Syndrome

Impostor syndrome (also known as impostor phenomenon or fraud syndrome) is a term coined in 1978 by clinical psychologists Dr. Pauline R. Clance and Suzanne A. Imes referring to high-achieving individuals marked by an inability to internalize their accomplishments and a persistent fear of being exposed as a “fraud”.  In this episode of the SQL Data Partners podcast, we take on a topic that is not technical; however, might play a very important role in some of the opportunities we take–or miss out on.  Today’s guest is Mindy Curnutt, a 3x MVP and a real chance taker as she recently decided to become independently employed.

In this episode we talk through the idea of imposter syndrome and give some examples of how it affects us, but we also try and tackle ways we can identify it and steps to help us overcome.  I think you are going to find this episode very interesting and we hope you enjoy it.

 

 Episode Quote

“I mean SQL Server, I’m so passionate about it that it doesn’t feel like work to me.”

“Coding involves regular failure.”

“It’s ok to be wrong. You’re not expected to be perfect.” 

“Don’t let that impostor syndrome stop you from taking advantage of opportunities.”

Listen to Learn

– Description of impostor syndrome
– Effects of impostor syndrome in one’s career
– Symptoms of impostor syndrome
– Impostor syndrome in IT professionals
– Tips to overcome impostor syndrome

Mindy on Twitter
Mindy on LinkedIn

About Mindy Curnutt
Mindy Curnutt

Mindy Curnutt is a 3-time Microsoft MVP holder (SQL Server 2014 & 2015 and Data Platform 2016) and has worked with relational databases since 1995 and SQL Server since version 6.5. She has been involved in the development of the following systems: transportation management & maintenance (TMS), JIT manufacturing (MRP), sales/accounting (ERP), customer relations management (CRM), medical billing & audit, and US Govt / IRS Taxation  (forgive me). Mindy is the Lead Partner at Mindy Curnutt & Associates Consulting, which specializes in Microsoft Data Platform & SQL Server architectural guidance, performance tuning, training and Remote DBA Services.

Transcription: Imposter Syndrome

Carlos: Mindy, welcome to the program!

Mindy: Thank you. Thank you so much. It’s my pleasure.

Carlos: Yes. It’s good to have you and being from a nice state like Texas, you’re right in the center, so Steve and I have a little coast rivalry here and so you here we could play nicely.

Steve: Yeah. We can’t claim that Mindy is an East Coaster or a West Coaster on this one.

Carlos: Yeah, that’s right. So thanks for coming on. It’s great to have you and ultimately today we’re going to be talking about impostor syndrome, what it is and how this might affect us in terms of what we’re able to do in our careers, what we’re kind of willing to reach out and kind of take risks and then even get into how widespread this is or how many people this can affect. So I guess let’s go ahead just to get the conversation going, why don’t you give us a definition or some insight into what impostor syndrome is.

Mindy: Well, impost syndrome, and I’ve just learned about this only three years ago. Although it was actually coined, the term was coined way back the late 70s, so it’s been around for quite some time. I learned about at CodeMash outside of Cleveland. It was about three years ago there was a woman who did a presentation on impostor syndrome and it just hit me right between the eyes. It’s one of those things that’s so obvious when somebody explains it to you but you really never gave it any thought and then once you realized that it’s actually a thing, for me at least personally, it was a game changer as far as my confidence level because it undermines your confidence. And it’s your own thoughts that are doing it which is what’s so bizarre. So impostor syndrome is it’s actually only seen in high achieving individuals which is very interesting. And it’s basically the inability to internalize or recognize your own accomplishments as the accomplishments that they truly are, and having this persistent fear that you’re going to be exposed as a fraud or that you don’t really deserve the position or the respect that people are giving you. You know, that you’re being asked to be a speaker, you’re being asked to write a book, you’re being asked to participate as a manager in some process at work and you feel like, “Wow, this is great. I’m glad that they all think that I can do this. If they’ve really knew me they’d know that I’m not really capable of all these and I’m really not that special.” And to find out how many other people feel that way which in the tech field is the majority of the people feel that way secretly. It’s very interesting.

Carlos: Sure. In fact, you actually have a little bit of survey data around this.

Mindy: Yes, I did. Interestingly enough after, when I decided I was going to do this presentation I went out and did a survey. I sent a definition of impostor syndrome out to the MVP list, the Data Platform MVPs, and I said here is a definition of impostor syndrome. I have a survey, completely anonymous and would all of you please if you have time go to the survey and it is simply a YES or NO, have you felt this way anytime in the last 5 years. And what was it, 86%. I had 42 Data Platform MVPs respond and 86% of them admitted that they had felt that way in the last 5 years, so that’s recently. That’s not just sort of in the beginning of their career or whatever. And then when I went actually internally at my own company and asked 11 IT professionals that were managerial positions or higher, if they had felt the same way within the last five years. I got an 81% response. And the stuff that I’ve read on the Internet is that that’s actually that those are pretty real numbers that it’s very very common. And people just either they don’t recognize that everyone else feels that way. They didn’t know there was a word for it or like myself I just thought it was me.

Steve: You know that’s really interesting because I know the whole impostor syndrome is something that, I mean honestly, I’d never heard of it prior to Carlos scheduling this presentation or this podcast recording. And I went and did a little bit of research on it and look at your slides from SQL Saturday and all that. I thought, wow. I can’t believe I have never heard of this and it’s something that I know I’ve seen and probably experienced at some level here and there.

Carlos: Sure, and I think in some forms it’s easier to feel maybe then than in others, right? And that’s not say that it couldn’t affect a single individual in all different kinds of places. I think obviously the more you feel comfortable in a situation and things like that. It might not be as bad but I think anytime you get into the place where you start comparing yourself to others and what you’ve don’t to the outcome of others. I think it’s very easy to kind of fall into this trap. Or telling yourself that, “Hey, you know, it’s not worth it. Just throw it in the can now.” You know, they are never going to take you seriously.

Mindy: Yeah, well and it’s interesting so statistically the most commonly found in technical fields so not only would it be in IT but it’s very common in the league of profession. It’s very common for physicians and especially specialists to have this, “I don’t really deserve to be a brain surgeon type of thing.” You know, “I’m not really, I know all these people think I can do all these great stuff.” But I think that it’s really exacerbated in IT because of the pace. You don’t just learn something. You have to keep just, it’s a constant constant swim against the riptide to try to keep current and it seems like with. I don’t know if it’s just me getting older or if the pace is just going faster and faster and faster which is I’d maybe it’s both.

Carlos: Yeah, obviously, we’re seeing SQL Server release cycles now, one year, right?

Mindy: Right. Well, and now I’ve spent 20 years becoming just a super super deep and narrow with my specialty being SQL Server scalability and performance. And being able to get to the root cause of basically slowness and whether it is hardware related, maintenance related, architectural design of the database related, if it is the application and how the application is approaching it and that took years to develop. And now all of a sudden I got all these people popping up around me going, “Hey, Document DB is now Cosmos DB.” And I’m like, “What!” While I still know my stuff it makes me, you know, I feel shaky. I think that’s a very unique field to be in. They don’t think that that’s happening in the legal area. Is it, right? I mean, maybe with all the laws changing recently but.

Steve: But what’s interesting with that is, I mean, you talked about where you’re at and what you know how to do there. I mean that’s basically a 22-year education since you first started using SQL Server. And it’s the continued education and that’s more education perhaps than someone who has a law degree or someone who is a doctor may have had along the way. And to get there, I mean a lot of people don’t look at it that way but every year you are out there doing new stuff it’s more training, it’s more education. You’re always learning more.

Carlos: So let’s kind of bend this towards, most of our listeners, the companeros out there, they are working, they are in the workplace, how is this most likely affecting them?

Mindy: Well, one of the things that can be bad about it is that it basically prevents you from getting ahead in your career. You’re less inclined to basically apply for those advanced positions because you’re listening to these voices that say, “You’re not ready”, or “Maybe just a couple more years, or you shouldn’t raise your hand for that, you’re not as good as they think you are”, “I know they are suggesting you should apply for that Director role but really come on you’re not ready for it. You need more experience still because you could fall down and it could be a huge failure and it would be totally embarrassing.” These voices that don’t even, it’s like you want to turn around and like look inside your brain and go, “Shut up!” Right? You could be underpaid. You’re not going in asking for as high of a salary and then you find out that other people have gotten the salary you didn’t come in as aggressive enough because you didn’t feel like maybe you were worth that, right? And there is also the community suffers. You’re not the only one suffering but the community suffers because people don’t raise their hands. You know, right now the PASS Summit, today is I think the last day for the call for speakers. How many people didn’t even submit to the call for speakers because they didn’t just feel they were quite ready for it?

Carlos: Me, I’m raising my hand.

Mindy: It’s like how many SQL Saturdays does it take, right? How many times do you have to actually go out and when will you be ready like it’s next year is going to be the year? And realistically if you ask other people if they think you’re ready they are all going to probably raise their hands and go, “Yeah, you’re ready.” And you are the one who is like, “Yeah, maybe not.” Maybe you’re not blogging, you know, putting stuff down in writing. That’s frightening because it’s in writing for everybody to see and even if you go back and fix it later the way back machine is going to have it out there forever and ever that you said that. You might not be participating on forums because again, I mean, sometimes people can be kind of snarky on those.

Carlos: Yeah, that’s rough.

Mindy: There’s people that are asking questions on Twitter for Twitter help and you might not want to volunteer an answer because you don’t feel like you’re 100% sure and you don’t want somebody to point out that maybe you weren’t exactly right. But you still think that you could help the person but you don’t speak up, so then these people aren’t getting help. So it just doesn’t hurt you, it hurts the community as a whole. It hurts other people that you could be mentoring because we have good things to share and I think people would benefit. Nobody is perfect right? I mean coding involves constant failure. Expecting that you’re going to be perfect is unrealistic.

Steve: Ok, so now you have mentioned that there was some story about the Air Force Academy. Could you share that with us?

Mindy: Yeah, so this is early on. And this is my example of how this looking back with
hindsight. How this really impacted my life and impostor syndrome. When I was 18, so all through high school, junior high and high school, if you asked me, as a child, you know, you asked kids, “What do you want to be when you grow up?” And I wanted to be an astronaut. That was like the whole, the Space Shuttle was big back then and we had Sally Ride. You know, I remember the Space Shuttle, I remember I was in high school when it blew up, right? And that was a big deal. We all watched it in school. But I wanted to be an astronaut. And on the way to being an astronaut I want to fly jets and the only branch of the military in 1987 that let women fly jets was the Air Force. All the other branches you got to fly the transport planes. And I went out and I actually had my solo pilot license that I got through my boss that I worked for. He was very supportive and he had a plane, and I would go wax it all the time, wash it and wax it and get all the oil off the back of it and like buff it all out so we could take it up. And so I actually ended up getting a nomination to the Air Force Academy, and part of getting into the Air Force Academy once you get the nomination is you have to passed a health test and a physical fitness test, and a whole bunch, you know, they dilate your eyes, and they listen to your knees, and they do all these stuffs. And so I knew I had to pass this physical fitness test and there were four parts to it and one of the parts was pull ups which is different than chin ups. Chin ups is where your hands are facing you and that’s actually they’re easier. A pull up is where your hands are facing away from you and they would take a broom handle and put it 6 inches in back of your calf and another broom handle 6 inches in front of your calf. And when you were doing the pull up if you hit either one of those broom handles that pull up didn’t count, and I have to do 5 of them. And I knew this was going to be a problem. My math teacher was letting me out of school, out of class, like half of the class all the time and I would go with this boy and that we were working together in a gym and using that machinery pull the bar down. I mean I got closer but I could never pull my full body weight with that machine. I mean we work on it for 6 months all the time. And it is not like I was heavy or anything. It is just I would get to where my arms were hanging, I was hanging and my elbows were actually perpendicular to my forearm in like and L and I would just shake and I could not do that last little “uhh” to get my chin above the bar. And I would just shake and shake and shake and shake and shake for like 40 seconds, right? Uhhh!

Carlos: I’m surprised you lasted that long. I think at 3 seconds it would be like, I’m done.

Mindy: Oh man! Well, that last part is there is like some muscle across your right where your collar bone is and that area that is what does that last piece and I just don’t have probably any muscle there. It’s just nothing, it is just bone. So I failed the test and they were like, “Yeah, you know, I know you have the nomination from the senator and that they are really hard to get. They only have five open spots but you didn’t pass it so too bad for you.” And so I went away from that and I didn’t go into it with impostor syndrome but as I walked away from that being rejected I started developing this huge impostor syndrome about, “You know, I was the only girl. I was the only girl there and, you know, it would have been bad if I have been accepted because I would have felt so out of place and I probably wouldn’t have made it anyway. I think they might just have picked me because I was a girl and maybe…” I just psyched myself into this, you know, you would have had impostor syndrome. You would have felt so bad there. They would have, not that I would have had impostor syndrome but that I would have been exposed as a fraud if I had gone there, and it would have been a horrible experience and I had been saved from a terrible mistake. And thank goodness that they rejected me because that’s the best thing that could have happened because boy that could have been a mistake. And then so about two weeks later senator calls up and says, “I would like to offer you a nomination to Annapolis. And I had so freaked myself out with impostor syndrome that I was like, “Are you kidding me? I’m so glad I didn’t get to go the Air Force Academy because it would have been such a mistake. I would have been humiliated. I don’t want to go to Annapolis.”, and I turned it down.

Steve: Wow!

Mindy: Oh no, right?

Carlos: And he’s the one calling you. It wasn’t like, you know, he’s reaching out saying, “Hey, I got this. Do you want it?”

Mindy: Yeah.

Steve: Now, you mentioned that it’s only a few years ago that you’ve heard about impostor syndrome but it’s been, I mean there were many years in between from the Air Force Academy opportunity to learning about impostor syndrome. When you learned about impostor syndrome was it just something that hit you at that point, “Wow, that’s what it happened.” Or did it take a while to sort of come around and realized that.

Mindy: So the Air Force Academy experience and the Annapolis I didn’t like have some epiphany like immediately that that’s what had happened there in that particular situation but as the woman was doing the presentation of impostor syndrome. I mean, I couldn’t think of exact times in what meeting I was in with whoever it was with but I definitely thought, “Wow, I have felt that.” So many times in meetings where I really wanted to contribute something and I didn’t interject because I didn’t want to sound dumb in case I wasn’t absolutely 100% right. And maybe the people in the meeting were have used a couple of acronyms that I wasn’t familiar with so I started feeling like, “Ok this is. I’m glad they all wanted to include me in this meeting. I’m not really sure why I’m here because I don’t have much to contribute so I’m just going to sit here and be quiet.” Oh my god! And I had felt like that. I know I felt like that before.

Steve: That’s always interesting when you talk about acronyms and like sort of feeling left out because, I mean, a lot of people will have like their own companies sort of acronyms that maybe different from what people elsewhere use. And it’s almost like being in their secret circle to be able to understand what they are talking about. And I think that it can be very challenging and very difficult to feel part of the group when you don’t always understand what their acronyms are at first.

Mindy: I could totally relate to that. I mean I work in trucking and transportation software. It took a few years before I actually knew what they were talking about when they started talking about “backhaul” or a “dead head” or a “cross dock” where I was like, “What!” I have no idea what they were talking about.

Carlos: Or “smokey”.

Mindy: There was a funny there was a guy that’s like, you know, I was at this seminar once and I was trying to make small talk with another customer at the table and I said, “So what do you guys haul?” And he looked at me and he said, “Reefer”. I almost spit my food on the table I’m like, “Pweaa! You, haul what?” And he’s like, “Reefer.” And then somebody looked at me. My eyes were just big as saucers and some of them they started laughing and they’re like, “Refrigerated stuff.” I’m like, “Oh my god!”

Steve: Wow!

Mindy: Yeah. I guess that’s the term.

Carlos: You know, I have experienced a little bit of impostor syndrome particularly, so I bring up this podcast a lot and my experiences with it and I remember. It has happened a couple of times but one that was poignant kind of stuck out to me is when I interviewed Brent Ozar. You know, from a name perspective, you think direct at the top at that point. I had reached out to him. Go through my standard process to reach out to people. It’s not quite like it is today but back in the day I have this very simple process. He agreed, we started talking anyway the episode went fine. I thought it was fine but I was asking for some feedback and he said, “You know, I would have liked to understood, the agenda basically wasn’t well set and I didn’t like where we kind of went with some of these things. It wasn’t what I was expecting. I wish you would have asked me like we could have fortified that agenda a little bit better.” And I was like, “Oh my gosh”, right?

Mindy: Yeah, you’re like, “Pewww”.

Carlos: And the reason I didn’t do that is because I was like, “It’s Brent.” Like who am I to tell him that this is what the agenda needs to be, right? So I kind of left it soft and mushy a little and he would have benefited from a little bit more structure or me going through a process and kind of giving some feedback as to and so what we would have talked about. And so I think kind of going to that point of the community if you will at large or the people that you’re involved with. It’s not just you. Other that you’re working with can also be affected by you not wanting to participate or not thinking at your place to participate.

Mindy: Yes. Yeah.

Carlos: And I guess we’ve talked about the survey a little bit why it is destructive, so what can we do to help ourselves? If we find ourselves coming that funk what’s the remedy?

Mindy: Well, simply knowing that it’s a thing, for me was 80% of the fix. For some reason it was almost as if someone have snipped these strings that have been holding my wings back. I don’t know how else to explain it. I was actually able to recognize when it was happening and go, “Ok, I know that is. I know I’m not the only one who feels that way so stop it.” Now, it doesn’t completely, you know, made it where the thing doesn’t try to rear its head up but it’s knowing that it’s actually a voice that is shouldn’t be speaking to me and I can ask it to be silent instead of just listening to something that you’re not realizing is destructive. I don’t know why just to knowing that it was something that others and seeing how many other people felt that way. It was hugely freeing, so that’s the biggest thing. The other suggestions are just forgive yourself for any negative stuff and that’s really hard to do but like I said, “Coding involves regular failure.” Being occasionally wrong and like last night I twitted something that was wrong.

Carlos: Oh boy!

Mindy: Yeah, and regarding the PASS Summit, and regarding what the definition of an abstract versus the description is then it caused a couple of people to get confused and then I got an email from PASS HQ and they said, “Actually, that’s not correct.” And I went right back on Twitter and I said, “Opp, I was wrong. This is actually how it really works.” And I think before I knew that impostor syndrome was a thing I would have just stick my head in the sand, “Look at the people who just saw me say something that was wrong”, right? It’s ok to be wrong. You’re not expected to be perfect, and other people don’t expect you to be perfect. The other things there are some suggestions. Maybe print up your resume if you have a copy of it. Take your name off of it. Put somebody else’s name on there. Maybe mail it to yourself or something. Look at your actual, what you’ve done, what your qualifications are, where you’ve worked, what you’ve accomplished, where you’ve spoken, what you’ve blogged about, what test you’ve passed. If you saw all of that and it was somebody else, how would you feel about that person? And is it different than how you are judging yourself?

Steve: That’s interesting.

Mindy: Yeah.

Steve: I have to go back and look at my resume now and see how it looks or how I look at myself.

Mindy: One person had said that she keeps a diary. Well, it wasn’t really a diary. It was like when somebody tells something nice about her or she gets a compliment she would come home that end of that day if she remembered to do it or she might write it down in a piece of paper and tear it up and shove it in her purse. And then when she came home at night, she found those papers in her purse and she like put them in a little money box that she had in her cabinet. And anytime that she started to feeling down about herself or feeling this impostor syndrome type of feeling about, “I’m not worthy, whatever”, she goes and opens up, she calls it her Happy Box. Opens up the box and goes through all these things, all these times that people and the date on their and who said it that they said, “Wow, you just saved me”, whatever amount of time or “You just did this good thing” or “Wow! I’m so glad we called you” or “Oh my gosh that query is so much faster”, or whatever. Because every day goes by, boom boom boom, they become a blur. You can’t remember all the details of all the things that happened to you start to turn into a smear.

Carlos: That’s an interesting concept, right, so taking that because I think where it’s easier to hang on to the negative comments, a little harder to hold on to the positive ones sometimes so that idea of kind of keeping track of that, noting it, measuring it, right? You bring a catalog of it so that you can refer to it when you need to, that’s an interesting thought.

Mindy: Yes.

Steve: So one of the things I noticed in your presentation was a slide when you talked about killing your heroes. What do you mean by that?

Mindy: There is where I can talk about Jimmy. Well, I didn’t kill Jimmy.

Carlos: Thank goodness.

Mindy: Yeah, so killing your heroes. What I mean by that is don’t put people on a pedestal. They are not putting themselves on the pedestal. Brent didn’t put himself on the pedestal. Jimmy didn’t put himself on a pedestal. I did that. And I didn’t really, you know, doesn’t put herself on a pedestal. I know her now. I put her on a pedestal many years ago. I did that, she didn’t do that. So when I say kill you heroes, really what I mean is, don’t be making people into something that’s bigger than life because they are just people. Most of them are really nice people and they would be just like, their mouth would fall open if they knew what kind of pedestal that people put them on. So my example is in 2010, well before 2010, because my focus has always been performance. I was doing a session back at that time. This was back when we had spinning disk and separating the log file from the data files was still critical, right? So I was doing a presentation called “SQL Server I/O Uh-oh” and it was all about how to really maximize your and try to stay away from the pitfalls of the I/O being so slow when everything else had move ahead so much quicker, which for a long time that was huge problem. Jimmy May had written this whitepaper when he was at Microsoft that talked about the stripe to use in fixing the misalignment in the partition from Windows 2003 and how substantial that is, and using the correct allocation unit or block size and setting all that up. And if you do the trifecta of these three things, how you can get 30% or better performance off of your disk and just set up SQL the way that SQL works. And I went over and over that whitepaper when I was building out my presentation, and I was trying to make my presentation fun and so I summed my presentation with a song. I basically took Patsy Cline’s I Fall to Pieces, and I changed the words up so I brought my guitar to the PASS Summit and I play it like, “Don’t do your I/O in pieces.” I basically made this whole parody about doing your I/O in pieces and it was recorded and I thought it was really fun. And I wanted to tell Jimmy that I had written this song, it was based on his whitepaper but I was terrified of him. I was so intimidated by him because he is Jimmy May, and I have put him up in this pedestal. You know, I was just like, if I was in the same room I was like I just didn’t want to talk to him. It was just like, “Oh my god, I’m not worthy.” I actually twitted to him, and I have this in my revised presentation now. I send him a private message on Twitter and I said, “Hey, I did this SQL I/O presentation and I summarized your whitepaper in a song, it was recorded and here is a link to the YouTube video. I would love your thoughts.” And he responded with, “Unfortunately, I’m buried. I’m in Vancouver over the weekend. No internet.” And I was like, “Oh, I feel. I shouldn’t have even have asked him.” And now Jimmy and I are really good friends, you know. I mean, we are, now we’re where he’s like just post something on my Facebook page about something last night. I mean, we talk all the time. So I did that, he did not do that. I did that. Like if I had some of my other friends that said that I wouldn’t have thought anything of it, right? But because I had this weird impostor syndrome like, “I’m the fraud and he is the master and I don’t… Oh my gosh.”

Carlos: So I have to ask, does that YouTube video still exist?

Mindy: Yes, it’s out there. I think there is one out there now of me doing. Yes I think it is. It may not be the PASS Summit. I think it’s me speaking at TransForum, TMW’s TransForum event but it is out there. Yes me singing that song.

Steve: So perhaps we should wrap up the end of the episode with that song.

Carlos: It sounds like a good idea. Oh, let’s see here we go. I’m looking; I can see you holding a guitar. We’ll find it and we’ll put it up in the …

Mindy: I think my name is spelled with an “e”, whoever posted it they put Mendy Curnutt.

Mindy: Well, basically the core of it is impostor syndrome is very very common in tech. 80% something of the Data Platform MVPs admitted that they had it recently. It can prevent you from getting to where you want to go with your career and just recognizing that it’s a thing is a huge huge silencer or like an enabler of getting around it. Man, don’t let that impostor syndrome stop you from taking advantage of opportunities.

Steve: Ok, great advice. Alright, shall we move on to the SQL Family then, Carlos?

Carlos: Let’s do it.

Steve: So Mindy, how did you first get started with SQL Server?

Mindy: Wait, it’s a long time ago, so I moved to Seattle after college, a few years after college. I was playing music as a hobby. I play a guitar, obviously, you saw on that video that I talked about. I play the guitar and sing and then in my 20’s I was always out singing at open mic and stuff. I moved to Seattle because that was during the Grunge period and I just want to play music all the time. So when I moved to Seattle I had to get a job. I ended up getting a job at this water jet cutting company as a project manager. My degree was in Economics. I don’t know where else I’m going to get a job in. And I was trying to do my job and it was very very difficult because they didn’t have, they had computers but they had just like Word Perfect and Lotus 1, 2, 3. There was no actual just in time manufacturing application. Everything was on paper and it was trying to get something through the shop and figuring out what we had in inventory and how long will the vendor times, and what bids have they’ve given me. It was a nightmare so they had just bought Microsoft Office. They had Access 2.0 and I went to some college classes at night at the learned VBA, and I ended up writing myself a program to manage my job which ended up turning into just in time manufacturing application. And it got ported to SQL Server and so it was just out of necessity. I mean it has a logical, my mind wants to make processes and things and relate things to things. I wish I would have known that I had such an affinity for that but I didn’t have really any computers when I was a kid. We didn’t have them and then college you had to like reserve time. Nobody had a computer in my dorm so I had to actually get into an environment where there are computers present for me to realize that that was something that I was really sort of naturally drawn to and good at.

Steve: Wow, very interesting.

Carlos: If you could change one thing about SQL Server what would it be?

Mindy: If I could change one thing about SQL Server. I should be looking at these questions before you ask me them. Where did that thing go? If I could change one thing about SQL Server what would it be? Man, that’s a good question. Can we come back to that? Let me think about that a little bit.

Carlos: Sure.

Steve: Sure, we can do that. So what’s the best piece of career advice that you have received?

Mindy: Oh, that’s an easy one. Yey! Ok, the best piece of career advice I ever received was “What’s the worst that can happen?” Yeah, so I worked for a guy. Right when I got out of college I worked for a guy in a wine industry. Actually I wasn’t out of college. I had to take a break during the middle of college because we’re having some financial troubles paying for college, so I stepped back during my sophomore year and I moved home and I was working full time and then I was going to school at night at a lower, it’s Sonoma State University instead of UC Sta. Barbara. And one day the guy that I was working for, he asked me, “Why wasn’t I going to University of California anymore?” And I said, “Well, it’s financially.” It’s too much money basically and my parents don’t want to get another student loan. And I can’t get any grant because they had too much land and we’re just in a pickle. He helped me basically. He helped me figure out how to get my parents to re-file their income tax for the last two years and get me off their income tax, and then not to take “NO” for an answer. He’s like, “Ok, that’s a problem. How do we get around the problem”, right? “How do we get around it?” So there is why you’re being stopped and how do we backed up and reverse and what other approach can we take, so that was huge. And then I was able to actually get the student loans and the grants and things like that because I was personally dirt poor. But then when I went to go back to the University they said, “No, sorry. You’ve been gone for a year so you’re going to need to re-apply.” And I was just devastated and when I came back he said, “Well, what’s the problem?” And I said, “Well, I have to re-apply and then I’m not going to be able to get in until next year.” He said, “Well, who makes those rules up?” And I was like, “What do you mean?” And he said, “Do you think the Dean? Who can break the rules? You think the Dean can get around the rules?” And he goes, “Why don’t you go down there and talk to the Dean and see if the Dean will let you back in. What’s the worst that could happen?” Well, right exactly where you are now, right?

Steve: Yeah. I guess you could have just say NO, right?

Mindy: Right, so there has been a lot of that. I took that advice throughout a lot of my career of. Don’t take NO for an answer if the NO is just because of the rules are kind of dumb. Like how can you get around it but still not breaking the law or anything? But is there a way to get around something if you don’t think it makes a lot of sense. Is there a way to rightfully get around it? And don’t be afraid to ask and what’s the worst that could happen. You’re exactly where you are right now.

Steve: So did it worked out when you talk to the Dean?

Mindy: Yeah, she let me right back in.

Carlos: There you go.

Steve: Awesome.

Mindy: Yeah, huge lesson.

Carlos: Yeah, yeah. Do you want to circle back to the SQL Server question or should we continue on.

Mindy: That one thing? I think we can continue on. Yeah, because I’m not sure what the answer about that, maybe, I’ll email you afterwards with my answer.

Carlos: That’s fine. Out last question for you today, Mindy, if you could have one superhero power what would it be and why do you want it?

Mindy: If I could have one superhero power I would want to be able to. Well, the first thing I wanted to say was because I had a friend that passed away from ALS and I
would want to be able to put my hands on somebody with ALS and just cure them. But then I thought, wow, then I would be flying all over the world constantly and I’d never get to go to sleep because everybody would want me to cure them.

Carlos: Ok, so the analytical brain starting to pump down there.

Mindy: There you go. I wish I could like wave a wand and cure everyone who had ALS. That would be, right, just to make that go away. It was really terrible.

Carlos: Well, Mindy, thanks so much for being with us today.

Mindy: Thank you! Thank you so much.

Steve: This has been great.

Carlos: Yeah, good information and we do appreciate you taking some time to chat with us.

Mindy: Thanks!

Steve: And I look forward to seeing you when we meet up at the conference, The Companero Conference.

Carlos: That’s right!

Mindy: Great, I’m looking forward to that a lot.

Episode 98: The First Change

In a recent SQLSaturday conference, I walked into the speakers room and asked the question–What is the first thing you change after you install SQL Server?  It was interesting to get their take on the server setup and I think you will enjoy the conversation.  There are various answers on this one but some of the speakers have mentioned stuffs like set auto grow files, SQL Prompt, SQL parameter class and max memory among others.  I would like to thank Kevin Feasel, Jonathan Stewart, Eugene Meidinger, Raymond Kim, Tracy Boggiano, Mindy Curnutt, Thomas Grohser, and Vladimir Oselsky for their suggestions.

 Episode Quote

“I would say that now I’m basically a broken person without SQL Prompt.”

“One of the things that I recommend all of our customers… is an administrative failed logging attempt alert system.”

Listen to Learn

What people say is the first thing to change about a SQL Server installation.

It should be noted that the suggestion on the auto boost was said in jest.  🙂

Transcript: The First Change

Brian Carrig: 226 which is the first thing I would change from the default. Second thing I would change from the default is the set auto-grow all files for user databases. And SQL 2016, the default behavior is that single file growth rather than auto-grow all files. Previously, everybody would have enable trace flag for that behavior that’s ignored now so you have to set a setting that says auto-grow all files.

Mindy: I’m staying away for the totally obvious ones so… let’s do come up with.

Mindy: My name is Mindy Curnutt.  One of the things that I recommend all of our customers put on their servers and that I’ve scripted up and I give to them is like a complimentary script is an administrative failed logging attempt alert system. So it basically goes out in it is querying the error log, event log to look for the 18456… What is it? 18… No, it’s 18, failed log-in attempts basically. I think it’s 18456, event class, something severity, whatever. I used to know it at the top of my head but I’ve gotten too administrative. So anyway and then I have it set where if there is like if someone is trying to get in with an account that has administrative level rights, like SA for example, and someone has try to get in with that account with the wrong password more than x amount of times and it’s configurable within a configurable time window. It will send an email to the administrator so you know someone is trying to guess your admin password and you got their IP address and you immediately know that’s happening.

Carlos: What’s a little scary there is I’m constantly surprise that the number of error, SQL server logs that I see where I’m just constantly seeing like failed SA log-ins. Like back to back to back to back… And normally it’s like some machine that just forgot to change their password but it just constantly banging on the server and you’re like, “Hmm, that seems a little weird.” Anybody else, here we go.

Alex: I would recommend customers take closer look what a developer is using on their code to dynamically execute string. The problem what I fixed and forcing C# developers to be more specific especially with SQL parameter class that they are using default and the default is nvarchar. And most our databases we have a varchar which is a trigger convert implicit. Nobody can see it but it cost lots of problem in their performance. So number one I forcing developers if they don’t want to use, it’s called SqlDbType, implementing have to be varchar because by default it’s nvarchar. So to avoid such problem better to use store procedure which is much more manageable or just to keep an eye on your ideal code that’s much more for C# developers and be specific with data types.

Carlos: So that’s kind of an interesting take, right, so like setting a policy before you get going?

Alex: An interesting part they say when I’m done in presentations about. They say it’s[00:05:00] the best practice to specify data type in request code. I have a code, I wrote a code for them, and guys, that’s what you should do, that’s have to be. So sometimes we’re teaching .NET developers something else.

Carlos: Sure. Here we go.

Jeremy Fry: My name is Jeremy Fry and I agree with the whole room that these changes that people are indicating that are best practice to change or reasons why they would change it. Mine is max memory and the reason why it’s because I am a BI guy, and so other tools need to run on a server analysis services, SSIS. Although I would like to in real world scenario and in a best use case scenario break those components apart where I’m segregating if I have a warehouse my analysis services on its own server. An instance versus on the production server where your transactional information is held and your day to day business is occurring. But in a perfect world, that’s doesn’t always happen so with that being said sometimes I see memory the analysis services utilizes memory is it starts to cut things off at about 80% utilization. And so if you have high level of things happening on your transaction system that’s utilizing a lot of memory then you start to see a bottleneck down the line in other tools as well.

Woman: Ok, nobody said it so I was leaving it there. When I install a new instance of SQL server and it depends upon the resources that are in the actual operating system for the SQL server and of course also what other instances of SQL might be on that server so there is no tried and true setting that I use to put it at but I always go in and look at the cost threshold for parallelism and the max degree of parallelism because the settings that they are currently set at I don’t think are realistic for today’s hardware. And Microsoft, of course, likes to be always backward compatible so those values are kind of sitting at levels that are no longer appropriate for either today’s hardware or the platform anymore. Now what you set them at can vary but you need to look at that I think. Too much parallelism could not, you know, it’s like too much cookies, right?

Carlos: There you go. That’s a knowledge that I haven’t heard before – cookies to parallelism.

Alex: [inaudible – 7:58] package but you’re just killing parallelism. It’s not right for every single, there should be balance. I can’t find formula what the number of CPU and.

Woman: It depends on the code.

Man: Yeah, I start with 25 and I go up as far as a hundred.

Alex: They start it from 80 go through to 300. I told them, guys, I’m only one who is fighting a whole team including manager. I’m opposition. I hardly convince manager to jump over 2014. Skip upgrade 2014. Go over to 2016 directly because 2014 doesn’t do for application. Absolutely, CPU utilization, that’s true. Other than that, 0, just jump. I was like this to be fired because our manager he doesn’t like any oppositions. Somehow I got another back from my Pennsylvania team, he was my manager in Ireland. And the Pennsylvania team backed me up that’s why I still in here. He doesn’t like any opposition.
Tracy: Yeah, definitely query store then. Change the size. It only keeps 100MB of data. Change it up to like 2GB or something and tell it how many days you want to keep because it keeps up to a year. And get it off of the primary file group. There is a connect item for that. I twitted it out this morning so you can find it and upload it.

Woman: Ok, here is one. How about optimized for adhoc workloads? I’ve read that here has never been one negative, somebody is going to like respond, but I’ve read no one has ever ever ever seen anything negative or bad ever ever happened turning that on, ever. There you go, one more.

Carlos: So I do know that [word unclear – 12:29] experienced it once that when he turned it on he actually saw a CPU spike that he couldn’t explain. So you could toggle it, right? Turn it back off, right, CPU would. I can’t remember what the spike was but I think it was may have been negligible. But basically he could flip it and then he would see CPU differences. Now, I don’t know if you eventually tribute that to something else and it was like luck that somebody just happen to be doing something else while he was toggling that. That was the one freaky thing which I need to follow up back with him. But I don’t how he resolved that. I know that’s what [name unclear – 13:09] was complaining about.

Man: The one application that ran the same query plan only twice for every single. The one really efficient. If it comes exactly two times.

Woman: It was a query that checks to see if there were records were there before it actually runs the query. It seems like …

Man: It could be.

Woman: Really, [inaudible – 13:31] You’re like, go to the grocery store to see if they are open and you go back home and you get your car.

Man: I see you’ve met my developers.

Man: This is like one really [term unclear – 13:54] kind of thing. But for SSRS I would advice there are logs for when stuff gets used, and the default, I forgot, I think it’s like a month, maybe two or three but you can set it all the way to like 6 months. And that logs are really useful when you need to go back and say is anyone still using this report, because usually the answer is no. I mean, I know we’ve got a bunch of reports that are just aren’t being used and at some point we need to go back through, and it’s really nice to have 6 months of data that says the only time it was ran was when you ran to if it is still working. So that’s a nice change for SSRS is just get that log because at some point you’re going have to do cleanup because you just get report blown. It’s a problem.

Mandy: This is Mandy again. So that makes me think of having a SQL agent job that once a week or something cycles your error logs so you don’t end up when some things actually not going well and you want to go in the error log and take a look and your server has been up for a year. Yeah, good luck with that, right? You’re[00:15:00]like waiting, and waiting, and waiting, and like two hours later the thing might come up, right, so awful.

Tracy: Hi, Tracy Boggiano, again. Can we make the Hackathon noise go away and error logs please?

Man: Oh my god, error logs, gigabytes.

Tracy: Gigabytes in a day. System [inaudible – 15:30] points getting full [inaudible – 15:31] gigabyte log file so we only keep 30 days.

Man: All the messages are really scary. Couldn’t identify this file, couldn’t do a check point. All these horrible stuffs here that’s Microsoft, they are like…
Woman: Oh, your favorite settings. First is SSMS, if you like to have it, do certain things, and you want to see sp_helpindex, and you want to see sp whatever, right?

Woman: Don’t you go and setup hot keys. You don’t have to sit there and do that. You have a certain way of working.

Man: I’m not smart enough to do that. I don’t have any ITs that… I know what I like to do, that’s why.

Woman: I know there was one version of SSMS were they got rid of control [inaudible – 16:28] I couldn’t even work because I don’t hit that [inaudible – 16:30] some kind of road thing. You put it back, I think there are so many people like, “Arhaha”.

Kevin: This is Kevin Feasel. I must confess I’ve been lying the whole time. The first thing I do is install/configure PolyBase. The second thing I do is install R. The third thing I do is change the log growth.

Brian: This is Brian again, while we’re doing confessions, I would say that now I’m basically a broken person without SQL Prompt.

Carlos: We got a couple of concurrence over here. Yeah, SQL Confessions, oh boy!

Man: Just consider it the [inaudible – 17:24] DBA you got to buy SQL Prompt, just get it over with, alright?

Carlos: Yeah, I guess I’m interested when they are going to. Is there a competitor for the SQL Prompt. I know that Compare.

Man: I don’t use SQL Prompts.

Carlos: Oh, what are you using?

Man: I use DevArt SQL Complete.

Carlos: Oh, DevArt, ok. Yes, such true, they are there. I feel like I see a lot more publicity on the other tools maybe because of a compare and all, you know.

Man: [term unclear – 17:54] has also a SQL completion tool. I think it is also named SQL Complete.

Episode 97: SQL Server Replication

If you have ever been responsible for a replication environment, you know what a pain that can be.  Many of the guests of the show will point out their frustration with replication and many are loathe to use it–myself included; however, there are a few brave souls out in the world who do use replication regularly–and even use it well.  Chuck Lathrope is our guest today and he talks us through some of the components of replication and why we might not need to be so skittish about this long time feature.

We cover all the basics including an overview of Transactional Replication, Merge Replication, Peer-to-Peer replication and Snapshot Replication, the concept of publisher, distributor and subscriber in relation to replication.  The most important piece is Chuck gives us some good business cases for why we might use replication and I think you will find the discussion interesting.  Thanks for joining us on the SQL trail!

 Episode Quote

“I like to talk about it to waylay fears and give out some best practice knowledge”

“Stick to people who know what they are doing”

“The real ultimate goal of replication is to offload work from your main source of truth database server”

Listen to Learn

– Why you might use SQL server replication
– Different types of replication
– Use cases of transactional replication
– Replication and trace token
– Troubleshooting tips
– Where to get good information

https://gallery.technet.microsoft.com/scriptcenter/SQL-Server-Transactional-e34ed1e8/

https://www.brentozar.com/archive/2014/07/performance-tuning-sql-server-transactional-replication-checklist/

https://www.sqlskills.com/blogs/kimberly/8-steps-to-better-transaction-log-throughput/

http://github.com/SQLGuyChuck/

http://tribalsql.com/

http://download.red-gate.com/ebooks/SQL/fundamentals-of-sql-server-2012-replication.pdf/

About Chuck Lathrope

Chuck Lathrope is a Data Architect for Limeade, a SaaS health and wellness platform. He is a two-time SQL Saturday event speaker on replication, was a Top 5 nominee of Red-Gate Exceptional DBA Award in 2009. With over 20 years of experience in IT, he have used different operating systems as well as worked with different applications. He was also a Windows System Administrator for 12 years.

Transcription: Transactional Replication

Carlos: Chuck, welcome to the show.

Chuck: Thank you very much guys, glad to be here.

Steve: Yeah, it’s great to have you on the show. I know we kind of been talking about this for a few weeks trying to get together, and happy to have you here as a guest.

Chuck: Yeah, I know. Glad to be here guys.

Carlos: Yeah, even if you are a west coaster you’re still accepted here.

Chuck: Oh, great!

Steve: Anyhow, Carlos, there are more west coasters than there are east coasters right now.

Carlos: Oh men.

Chuck: Yeah, we’re winning.

Carlos: Well, what I lack in numbers I make up for ego apparently.

Steve: Alright, so today’s topic is on SQL server replication. I know this is a topic that I first heard you talked about at SQL Saturday in 2012 and I found your presentation there really interesting I didn’t picked it up very quick very soon after that. But then when it came around to a time that I needed to use replication I lean back to some of those things that I learned from you back then, so it’s great to have you on the show to be able to talk about replication today.

Chuck: Yeah, great, thank you! Yeah, I love talking about SQL server replication at SQL Saturday events. I try to do it as often as I can because I have all the battle wounds from having a very large environment with the transactional replication and so. Whenever I can I like to talk about it to waylay fears and give out some best practice knowledge.

Carlos: Well that’s interesting because I think there is a lot of fear around replication. We’ve brought it up in Episode 56 and yeah nobody wants to touch it with a 10-foot pole and here you said you have a large environment with replication, so I know I’m interested to kind of get in here and talk about some of your experience and why maybe we have so much fear around it.

Chuck: Yeah. I think the difficult part about replication is there’s a great job of keep retrying even though there is a failure. So if you’re not monitoring that correctly it will go sideways and if you’re not monitoring appropriately then it can get all kinds of bad and then you get to stuck situations like, now I’ve got this gigantic log file or distribution database. How I’m going to fix it? And then you go on the net and google stuff and there is so much bad advice out there that would basically destroy your entire environment and you will get basically a lot of wounds from trying to follow some of the advice out there. It seems to me like most of the advice out there is for people doing it like a dev environment not a real production environment so it’s not great. I usually stick to people like Hillary. Anyways, there are some great people out there that do from the Microsoft side and from consultant side that give great recommendation for replication. I mean, stick to people who know what they are doing, not random posts on the internet.

Carlos: Well, so I guess shall we go ahead and kind of let’s tee up some of the components around replication and let’s talk about some of the moving pieces.

Chuck: So there are three types of replication. There is actually four but one has been deprecated, that updated subscriptions. There is Transactional Replication which is always running, always sending out data to your subscribers. And then there is Merge Replication which allows you to do bidirectional updates, and then there is Peer-to-Peer Replication, and then there is Snapshot Replication. The snapshot really is you do it on less frequent basis than all the time like transactional. So like in a data warehouse scenario I could do like every 6 hours, or every 12, or every 24 hours and get that over to my data warehouse, so that’s what snapshot does. And my expertise happens to be in transaction replication.

Steve: Ok, so then what are some of the business cases or real use cases around transactional replication then?

Chuck: Yeah, so the real ultimate goal of replication is to offload work from your main source of truth database server. So you got multiple databases and they can all
be a part of different publications then send out to subscribers which are your clients that receive that data. The main goal there is just to offload work on off of that main production server and that’s kind of goal in life.

Steve: So occasionally in high availability or disaster recovery conversations somebody will bring up replication as a possible solution and I’ve never had a good feeling about that and I’m just curious what your thoughts are.

Chuck: Yeah, well just like you guys were talking about in Episode 59 and higher availability solution. Replication wasn’t really designed to be a higher availability solution. It was really designed to replicate some data to give off to other servers so they can do processing of that same data for whatever purpose you may need it for. Because what you can do with replication is published out tables or if you have Enterprise you can publish out a view and you can even publish out to your store procedures and functions and whatnot and your indexes. But you don’t have the security that kind of goes along with that and then there is still automatic failover anything like that. Your best option there is to stick with always on availability group or log shipping or database mirroring to do that high availability.

Carlos: Ok, so if I really want to just move some data off to another server for reporting purposes that’s where transactional replication or snapshot replication might be a good solution but not for the HADR.

Chuck: Yes, and a nice thing about the replication is it’s available on all editions and versions of SQL server so I can have Standard Edition of SQL out there and use that. I can even use that in my distributor server. So kind of give the bigger picture of that transactional replication. You have the publication server and that’s your main source of truth data, and then you have the distribution server that holds the distribution database that then sends that information to the subscribers so it’s like there is always a 3-way kind of path goes from publisher to the distributor to the subscriber, and you can have many subscribers and then you have one distribution server that could actually be on the publisher or the subscriber side. You don’t have to have a dedicated one. But when you start getting to bigger environments you definitely want a dedicated distribution server.

Steve: Ok. Now, I know you said it works with all of the latest versions of SQL server, when was replication first introduced?

Chuck: It’s been forever. I think it even came back in Sybase days. It’s been around that forever. It basically just pulling data from your log files looking to see which of those transactions are flagged as mark for replication and then sends that on down the path, so it’s been around for ages.

Steve: Yup, so with the replication going from a publisher to the distributor to the subscriber does the subscriber have to be on the same version of SQL server as the publisher?

Chuck: No, it doesn’t. You can have a mix. What you typically have as your distributor just needs to be at a level of your publisher and so you could have 2008 R2 publisher and 2014 distributor and a 2016 subscriber so the distributor just needs to be at a level according to your publication so you can have high level on your subscriber but you can’t just go lower on your distribution server than your publication server.

Carlos: So we’re talking about some of these terms, right, publisher, distributor, subscriber. What the original documentation release that I saw mentions kind of this newspaper hierarchy which some of our millennials may not even know
what that is. And so I feel like, and one of the things that was brought up is just some of the overhead and maintaining some of these. Where has some of the love been? Because it’s been around so long where has some of the love from Microsoft been with replication?

Chuck: Yeah, good question. So the latest kind of feature set just with replication is the support for always on availability groups such that if your main active server, primary.

Chuck: So you’re primary server can move to any of the other secondary servers and replication will go along with it, so that’s kind of the main reason support. And also that same team also does change it to the capture and change tracking so some of those features have been receiving a lot of love in the past few editions of SQL server. But SQL replication has been around for such a long period of time. It’s not a lot more features they can add on there that I could really think of other than maybe helping out with support ability of replication, so most of the love has gone that route.

Carlos: Got you. And I think that’s where I would have to agree is that in a setup like you mentioned like all the kind of the bad advice from fixing or troubleshooting. I guess I would like to see a little bit more love there because I feel like a lot of the fixes would just start over again. You know, and so being able to tap into that a little bit easier and feel better about what’s going on would be super helpful.

Chuck: Yeah, it would especially for people just getting into it. It could be overwhelming to do. I mean I was so passionate about in the past days to help author a chapter of a book SQL book from Redgate just on replication and kind of hear this. Here is how you can kind of monitor and here are some best practices for it just to kind of get more information out there because it’s been relatively few books out there. I can give a link to that. And then also Sebastian Mein which I think you guys had on a recent podcast. He created fundamentals of SQL server replication. It’s a free download from Redgate’s books too is a great starting point in kind of getting into replication.

Steve: Yup. Now, on the front of troubleshooting replication don’t you have some script available that you built that are available maybe on GitHub that can help with that? I know I’ve used them a couple of times I just forgot where exactly they were.

Chuck: Yes, so my SQL Guide Track on GitHub has some replication monitoring script for live use. And It kind of what I’ve lessons learned in replication it gives you the ability to create what are called tracer tokens. And it’s basically you insert this tracer command into your log file and it’s basically stuck at the very end and it watches it go through the entire process to your subscribers to see that latency that involved in there. You can do that with the replication monitor tool which is a GUI tool that you would use to kind of monitor and maintain replication. But the trouble with the tracer token is that it’s not really, you know, if you’ve got a busy environment that’s kind of overwhelmed currently, that tracer token might take ions of time to get through and it’s you mainly do it or you can automatically script it. The Microsoft IT guys actually have upon one of the free sites for script center where you can monitor replication. And you’ve automatically created this tool that would just continuously insert tracer tokens and you could see kind of health with it. But I found that a little bit limiting so my trick to tracking replication is look at the distribution database and figure out how many commands are waiting to be distributed out to the subscribers, and then I monitor that on a frequent basis like every 15 minutes or every half an
hour and it sends me an email. And it’s kind of gone above some sort of threshold, and that threshold is a variable amount that I can put in as a parameter. Because in some environments, you know, 5,000 records behind it is a big deal. On the other environment it’s be hundreds of thousands of records behind if you have a really gigantic installation. So that’s what my monitoring kind of does. It just monitor that queue that’s basically up there in distributor and then I also monitor for kind of pain points that replication has.

Carlos: I apologize. Before you get into that, I want to like I guess stop, one and make sure that everybody knows so all the scripts and things we talk about today will be available at sqldatapartners.com/replication so we’ll make sure to put all those things up there. But you had talked about tracer token, so I guess I’m all of a sudden thinking, it’s like I’m thinking of a tracer and that’s not profile, all of a sudden there was something that came to mind that was like that but. This is new to me, right, so it almost sounds like this is an object that I want to create in SQL server that’s going to start capturing commands which again kind of sounds like profiler or extended events. Is it just something that is specific to replication? I guess help me define what the tracer token is.

Chuck: Yeah, it’s very specific to replication. You basically you are sticking in this special command. You can use T-SQL that create this little tracer token or you can use the replication monitor tool to insert that tracer token and just watch it in the GUI. And it just shows you time from your publication to distributor and then the time from distributor to subscriber. Basically this goes through that path. You don’t see it just happens kind of behind the scenes kind of little hidden thing that happens but you can programmatically watch that information and see the results of it.

Carlos: Ok, so I don’t need to create it or modify it. It’s already there. It’s just what I’m using to collect the information from.

Chuck: Yeah, you have to basically interject it. It’s like putting a dye into a stream. You put this dye, a little drop in, and you can just do like one at a time. So you have to manually put that dye drop in the stream and you watch that dye drop go the entire path to the ocean I guess.

Steve: But in this example, that dye drop as you refer to it is really a record being injected into the transaction log. And then when it’s complete that means that it’s been through the part that catches the transaction log and moves it to the distributor and then actually ends up on a subscriber side. Conceptually it would be like almost like if you have a table and you just inserted a row to it and then you knew on the other side that that row actually arrived.

Carlos: Because that’s the one you’re looking at.

Steve: Yup.

Chuck: Some people even use what I call a canary table, so this create a generic table and then it will update the date on it. So it will be just a table of date of one column and this is just the current date and so replicate that up and use it kind of like a canary table so they can look at the subscriber and see how basically latent they are because it should be as close to the current date as possible. That’s another method you could use for monitoring.

Carlos: Interesting. Yeah, I haven’t heard of that one.

Steve: So I know with the tracer token in my experience of using the replication monitor to go through and inject tracer tokens on a few dozen different publishers. It can be painful clicking through all the screens and it sounds like you can do the same thing with T-SQL to just inject the tracer token. To me that’s one that I wasn’t aware of and it seems like that I could really speed up the process of troubleshooting, what’s working and what’s not working with replication.

Chuck: Yeah, I mean the scripts that I have are pretty useful. It doesn’t go down typically to the subscriber level though so I have the typical monitoring of jobs and failures. The standard best practices that all DBAs do for all production databases you need to do that for all the components in the replication including your distribution database and your subscribers so just the best practices known on jobs fail because all the replication jobs are basically kicked off with SQL agent
jobs so you need a monitor for failures of those and be watching that.

Steve: So you mentioned these jobs as SQL agent jobs and it seems like there are several of these jobs that are created when you initialized replication. What are all the jobs that are created in and what specifically do they do?

Chuck: So there is the log reader agent job that’s basically sitting on you master database you’re doing the publication on that is reading the log file. Then there is the distribution agent. It’s sitting in either one of two locations at your subscriber or at your distributor. The reason you have that is what we call is there is a push and pull scenario when you get the data to subscriber. And when you do a push all the work that is happening on your distributor server and when you’re in a pull it’s done on the subscriber side. So if you have a really gigantic environment and you have this dedicated distribution server you want to even offload some of the work off of it so you actually have what’s called the pull subscriptions in it. Basically you think of it, “I’m the subscriber. I’m going to the pull the data from the distribution server so I’m expanding all the work to figure out what data that I need to grab and I’ll go to distributor and grab that information. And the push scenario is usually in your smaller environments, and smaller I mean like you’ve got 5-10 kind of subscribers. In that scenario the distributor is doing all that work and pushing that data down the subscribers. So this is kind of the performance tricks that you kind of switch going from push to pull.

Carlos: Right. Now, what’s interesting is we talk about these jobs. We kind of almost talk about, you mentioned reading it from the log, right? The log is a sequential operation. You know, the way that it writes and things. Ok, and then you’ve talked about workload. Now, obviously the answer is it depends but from a percentage, a generalization perspective, is the reason that replication gets out of whack is because it just gets overloaded and you just can’t catch up?

Chuck: That is one of the scenarios, and then which if you’re monitoring properly it should be good. I mean you could have not pick the proper server for your distribution server, and maybe it’s kind of poor performer, or doesn’t have enough RAM, or doesn’t have enough CPU. Typically it’s I/O problem. So you find out quickly that you have tons of transaction you’re trying to replicate and you’re overwhelming your distribution server. Because in smaller environments you can do it all setup in one server if you wanted to. It’s not really I don’t know why you would ever want to do that because the whole point is offload work off of your publication server but you could overwhelm any of those…

Carlos: So getting back in there and I think you brought up or made a major point at least in my mind and that is the reason that you would use replication and again the offloading component. I think a lot of times replication gets jumped to because you’re trying to keep to systems in sync or something like that. Again, some of these new features maybe a little bit better at that but you’re potentially using replication for the wrong reason. And so replication get’s installed on that main server and so it just has to do with more work to keep up with all of that.

Chuck: Yeah, that all the truth. I think people are trying to use this because it’s available on all editions. It is like, “Oh, I can use Standard edition and save a lot of money”, because you really don’t have replication. I think database mirroring can do in Standard edition with two nodes. I can’t remember now. But now in 2016 always on availability groups can do two nodes kind of cluster.

Carlos: Right, they give you the read-only, or you can’t read from it if it’s available.

Steve: So jumping back a minute to when we’re talking about the different jobs on the
publisher, distributor and subscriber. One of the tips that I heard recently and I haven’t actually try this myself but it was a suggestion that for those jobs, it was suggested you go in and add a second job schedule to it so if something goes wrong and that job fails. There will be a second schedule there that would restart it. Is that something that you’ve heard or is it a common best practice?

Chuck: No, I mean, replication does retry and retry so, and the distribution agent where you automatically retry when you hit a failure, like if you’re like automatically scheduling like a snapshot, and it fails, that could be a scenario we might want have these multiple things. But this like you’ve said before, proper care and feeding have your SQL server and monitoring those job failures. I mean, I could see it in a maybe a snapshot scenario but I wouldn’t see it in the transactional replication because it will just continue to retry. So basically what it does is it retries and fails. You know a minute later retry again and fail, and it will do that forever until you basically fix the problem. And usually the error message is it provides are pretty useful.

Steve:  Okay, great.

Carlos:  Well so I guess getting you, kind of keeping in theme with that, you know, again the sky is the limit but what are your, you know is there like a general rule percentage-wise other than just that process getting overloaded? Why does that stuff get out of sync?

Chuck:  Yeah. You know I think it’s kind of these random hiccups that kind of happen and when you have these bigger environments, you know there’s something happens. It like the common error that I see even in my environment is that the road doesn’t exist at the subscriber. Like well, why doesn’t it exist? So why don’t you just add it?

Carlos:  Right exactly. Like you’re subscribing, you should get that real. That’s part of the process, right?

Chuck:  What the heck happened to that bro? And so then you have to spend time to figure out because everything starts backing up once you have that one error. So you have to do tricks like you can skip that one row or maybe you figure out what row it is. There’s a sp_replic commands that can get you information about the transaction that was trying to apply to subscriber. And you could figure out what row it is and go manually add that row. There’s also a command line tool diff.exe I think. We’re basically at the command line, datadiff. It can do a datadiff between your subscriber and your publisher and tell you what rows kind of missing and kind of screwed up and actually fix them for you.

Carlos:  Interesting. So you’d be okay and I guess to that point so the row being missing in the subscriber, is the common one of that that I’ve seen, you’d be okay with us just basically you know scripting that out and inserting that row?

Chuck:  Yep. Just to make it go past that error but I usually find this, usually like a group of commands that somehow didn’t make it to subscriber. And there’s a, you could have the build that you two, let’s say is, there’s different agents that you can set up on basically all these jobs. But the ones I typically modify in a short term, if I just wanted to skip errors because I know that it’s trying to update a row that got deleted for some reason that doesn’t exist in the publication. So it really got and skip errors. And so you can setup just the SQL agent to use this new agent that’s kind of like an agent profile is what it’s called. So I could change this agent profile and say skip errors and so skip a bunch of errors but then you really want to use that Datadiff 2.0 to figure out what rows are kind of out of sync and you can just manually update subscriber. And sometimes it really comes in that somebody on subscriber actually pull in and deleted a row but in reality is I know that’s typically not the case because I’ll make that security read-only on the subscriber side. So it’s like there’s no way it got deleted but somehow it just, you know, row missing so you got to do every do to update subscriber and you have full rights to do. You can do whatever you want to subscriber side and replication won’t stop you. You can even make a setting that just says, I don’t want to delete any of the data. So when the delete happens on the publication I can turn the setting on replication say don’t delete this row, if you wanted to do that for your workload.

Steve:  Okay sometimes that’s something that we talked about I think in SQL Saturday recently was with that when if the stored procedure was updating a million rows. I just clarified to make sure they’re right. Yet, if you didn’t replicate the stored procedure call it would send all millions of those updates across through replication but you could just push that store procedure call so it gets called to neither side. And then it’s happening sort of independently on both sides rather than pushing it row by row. Is that right?

Chuck:  Yeah. Yep.

Carlos:  Okay so now help me kind of connect the dots there right because in my mind I’m thinking replication, that’s by object right. I go in and I replicate this column with this table and whatnot. How do I setup or how do I visualize the process of saying when this store procedure gets executed that’s what I want you to take down instead of updates and deletes to this table?

Chuck:  Yeah. So you basically configure replication and tell it that I want to use, to replicate these calls to these store procedures. And I honestly haven’t used that use case because usually in my environment it’s like I can’t just describe it as one transaction because it’s related to maybe some other data that’s within the system or the data comes from something else. I haven’t been able to ever do that but you basically just tell replication I want to use to replicate these commands with some stored procedures and you just tell it and it will. If everything’s right it will use that store procedure call and replicate that rather than all the individual changes that happen to your table.

Carlos:  Right. Well I think that’s a pain point for a lot of people. And going back to the whole subscriber thing and like checking when the subscribers have issues, is that when they have, to say more than a couple, even then let’s just say you have three. When that’s near you had, I’m still how to troubleshoot one, so the source system. Two, my distributor which in you said it should also be another system, and then each of the three subscribers. So that’s five servers that I have to go and kind of do something with. And I think that’s maybe one of the pain points around replication. Doesn’t it give you a great way to centrally do all of that?

Chuck:  Right. Well the replication monitor allows you to see all those three servers and help you with troubleshooting. You know, I was in an environment where we had like 8 publications and like 30 to 40 subscribers in a dedicated distributor. And we use to have replication monitor up on our knock window and people would, you know look for errors and stuff happening there but that was just unrealistic and that’s why I created those monitoring tools to look for those kind of errors or replication latency where you’ve got tens of thousands of commands that haven’t been pushed to the subscriber. And it’s like semi-alert and then every 15 minutes I see the trend, if the trend is going down and everything’s good, if the trend is going up something’s wrong because that
replication will continuously retry even though it might’ve complete error. And if you don’t monitor that well enough you come to this point of it’s been for 48 hours how come? You don’t want to be in that scenario.

Steve:  So with the transactional replication on the subscriber side, if I’m using that as a reporting mechanism to offload some of the work from the publisher, if I need specific indexes on the subscriber to help with whatever the reporting is doing on that side, do I need to add those on the publisher and replicate them over or can they just exist on the subscriber?

Chuck:  Yeah, that’s a great question it’s because that’s when the most powerful features of the replication is I can say either replicate the indexes that exist on the source publication or don’t replicate them. And the great thing about not replicating them is your publication server has this specific workload and use case and your subscriber’s typically always different. And what you can do is you can setup in the replication setup is it, what’s called the post-replication script. It’s basically one file that can have any SQL code in there that will then create anything you want on the subscriber side. And it is ran after the publication, when you initialize the publication, so initialization is just the method you go to publish all your data and get it to subscriber. And so in that command there, there’s actually two that you can do, you can do a pre and a post after you do this initialization. So what I typically do is that I come up and figure out my tuning on my subscriber side what indexes I need. I just add it to that script. And that script is you know, check if it exist already, if not you know, add this index to this table. And that gets applied after your initialization has happened so your table’s been populated with PCP. You know, that’s the use of PCP behind the scenes and then it creates indexes if you told it to. And if you told it to not do the indexes then it’ll basically go to your script if you have one. You don’t have to have one. And then you just add them and I think that’s the super hidden benefit right there for replication.

Steve:  Oh yeah. It really seems like that could eliminate a lot of the replication bandwidth by not having to have those indexes in two places and synchronize over.

Chuck:  Yeah. I mean it’s not going to replicate over all your index changes. I think that’s kind of, it’s just whatever happens in the data tables themselves.

Steve:  Right. But on the initialization of it it’s going to have to bring, if you’re replicating the indexes, it’s going to have to bring over the whole index right?

Chuck:  No, it won’t transfer that data. It replicates out the script. Here’s what the index is and it will create that after the fact.

Steve: Okay. So then you’d use this post-script only if you needed, or if the indexes were only needed on the subscriber side.

Chuck:  Yeah.

Steve:  Wow. Lot of great information here. This is good stuff.

Chuck:  Yeah we only scratched the surface.

Steve:  Oh yeah.

Carlos:  Oh yeah, that’s right.

Chuck:  There’s so much. It’s crazy and the amount that you have to get into and just carry on that earlier thought of having those three subscribers. So you typically would have those behind the load balancer. And then so I have this like common name, like this is my order processing system. So my applications stack would just point to this when load balance name and then it would get the information that it will need behind the load balancer. And then when I have maintenance, I can say load balancer take this one server out. And then it allows me to create a new snapshot and then I’ll pause all three of them. So all three of them will become stale but once I do a new snapshot to a new initialization everybody tries to grab it and pull down so you cause an outage if you don’t pause the distribution agent. So then I just, then I work on that one server, get the snapshot applied and then make sure the indexes are then applied. And then add it back to the load balancer and I take my next server out and then so it’s up to date. And I may
decide that I’m going to take all of the servers and just use this one that’s now up to date or I may have one stale and one up to date and one maintaining the other. And so it’s kind of what, in a bigger environment, what it looks like.

Steve:  Okay, great. Well shall we wrap it up with the SQL Family questions?

Chuck:  Sure sounds good.

Steve:  So Chuck how did you first get started with SQL server?

Chuck:  Yeah. It’s how it was. And for that, you know, I didn’t really pursued databases as a career until that kind of dotcom crash in the early 2000’s when unemployment paid for my MCDBA certification classes. And so that like I was completely scared to write in SQL when I was doing Access, like I avoided it like.

Carlos:  Sure.

Chuck:  And some are tipped classes like oh this isn’t so bad. So I did some consulting work in SQL 7 in 2000. Then I became a contractor at Microsoft supporting SharePoint for just the worldwide internal SharePoint farm and work with some guys and Mike Watson They are kind of well-known players in that space and they kind of look to me as the go-to SQL guy. I’m like, “Holy crap I’ve never done SQL clustering before, and load balancing stuff.” I’ve always, you know, just done this MCDBA course before there and that kind of gave me the bug so I become a developer after that. And then I had a boss who said that, “No you can’t be the DBA because I don’t think you got the skills to do that, so just keep doing your SQL development stuff.” So I basically quit and found a new job doing that SQL developer and DBA and that were, yeah that’s a hidden lesson out there for managers out there. It’s pretty easy to piss off a developer and say no or, you know an ambitious person you know looking to learn new things. And my next gig, I work for 7 years as a DBA and a DBA manager and supported just a gigantic replication farm.

Steve:  So, doing all the things your previous boss told you you weren’t qualified to do?

Chuck:  Yes. And I even got picked for a top 5 finalist for the Redgate’s DBA of the Year Award. You know it sounds like ultimate response to no you can’t do that. Okay. Sure I can totally do this. It’s not a problem. So now I’m a data architect so I’ve kind of gone the spectrum from developer to data architect but I’m still a DBA as well. It’s tough to find good DBAs.

Steve:  It’s really interesting how telling someone they can’t do it will motivate them to just be wildly successful at doing it somewhere else.

Chuck:  Yeah. And I’m really thankful he did that because I thought I wouldn’t be where I am today. Yeah, I love being a DBA. It’s a great job.

Carlos:  Now as great as SQL server is, if there was one thing you could change about SQL server, what would it be?

Chuck:  Only one?

Carlos:  We’re going to take your top one today. Maybe the next time you can, we’ll dig deeper.

Chuck:  Yeah right. I think that we need to get licensing in more in tune with modern hardware specs because yeah you know SQL express cost 1 gigabyte a RAM and a 10 gigabyte database. I mean you’re kidding me, and you might just as well say say, “Go to MySQL because we don’t really care about you.”

Steve:  Yes. There are very few environments that SQL express actually is a good thing to you.

Chuck:  It’s useful for right. I mean they can use it even the replication environment which is kind of cool. But I mean my phone’s got more power and capability than SQL express does. And then Standard edition’s kind of the same way. I mean you’re limited to, what’s it, 64 gigs of RAM and you know they say that you can use more with the heap space now with the later editions. But still I mean it is like designed by untechnical marketing people and no reason where these numbers come from. So if we want to limit to one thing that change, I think we,
we’ve got to fix that.

Carlos:  Yes.

Steve:  Okay.

Episode 96: Building Community Tools

Have you ever created something and wanted to share it with others with the hope they will contribute and make it better? Sites like the new defunct codeplex and GitHub are full of functionality people want to share and work with others on; however, many projects remain dormant.  What are the characteristics that create an environment where people want to contribute to your project?  Steve and I discuss a few ideas and we interview two very successful community project leaders in Brent Ozar and Chrissy LeMarie about how they got started and what it takes to put together a good community project.

 Episode Quote

“We don’t know that building community tools is for everyone. . . , but if you have a problem that you’ve solved and you are committed to it, you liked the idea and I’ll say go for it.”

SQL Server in the News

https://azure.microsoft.com/en-us/services/analysis-services/

Transcription: Building Community Tools
Carlos: Companeros, welcome to Episode 96. And today, Steve and I will are going to be doing something little different and we are going to be talking a little bit about building community tools and referencing a couple of really successful examples as we kind of go through and talk about this a little bit.

Steve: Yes, and those tools are some things that people probably use some of them every day in their work so it shows how valuable they can be.

Carlos: Yes, so we are going to be talking about, I guess the concept is as you put something out there and you want to share it with others how have some of these “successful” meaning lots of people have participated in providing feedback or giving of their time into testing and things like that. If you want to replicate some of that what goes in, what’s involved with it and why perhaps some of these tools are successful and why others might not be, and we’re not going to talk necessarily about the ones that aren’t super successful. We’re not trying to embarrass anybody here. But we’re going to point out some of the characteristics that we’ve seen and kind of walk through that.

Steve: And I think there is a big difference there between a community tool that somebody creates and nobody ever uses and a community tool that somebody creates and people use every day and people even contribute to every day.

Carlos: That’s right, and I guess we are going to focus on the contribution components so for example Ola scripts, the Minion stuff, sp_whoisactive. What necessarily “qualify” for this only because generally they may be receiving feedback but they are not necessarily openly soliciting it that they’ve just developed and kind of made available. We are looking and we’re focusing on tools that have a, “hey, you want to help out, submit your idea here”, kind of idea.

Steve: And it’s not to say that any of those tools are bad by any means because they are all pretty awesome. It’s really just looking at how people contribute and how you go find.

Carlos: Collaboration components.
Steve: Yeah, from a single person as the contributor or a single company to the community actually being involved.

Carlos: Right, so we’ve already jumped into that. I do want to take a minute and give a shout out to Ernest who lives in the DC area. We met up at SQL Saturday Baltimore this last weekend, came up, and we chatted a bit. He mentioned that he listens to the podcast and so wanted to give a shout out to him. It can be a little bit daunting to reach out and say, what you like or what you don’t like. And so I appreciate him coming up and saying hello and we were able to chat just a bit and kind of get his take, what he thought about the show, what was good and maybe what we could do differently.

Steve: Yup. And another shout out came from [name unclear – 2:40] and his comment was, “Yes, #93 (Episode 93) left me in a cold sweat thinking about some of the mistakes I have made and servers taking down. I asked myself what mistakes I’ve made in the 15 years dealing with SQL server and the honest answer is what mistakes haven’t I made at least once.”

Carlos: Yup, we’re with you there Andre. And I guess ultimately hopefully we’re making some of those mistakes in the test environment.

Steve: Or for a previous employer not your current employer.

Carlos: That’s right. And yeah, I think what we get into the conversation here. A great way to engage the community is to talk about ways that others can avoid the same mistake that you’ve made.

Steve: Yup, we had another shout out. This was in regard to Database Health Monitor and this came in from Derrick Bovenkamp and his comment was one word, “Awesome”. And what that was referring to is that a new feature that I added in last week’s release of Database Health Monitor where in the future when there is an update available it will tell you there is an update you just click yes and it will download the file, install it and then restart Database Health Monitor so you don’t have to go out and get the download file, run the executable and do it all in several different steps. It just makes it a lot quicker. And then he said, now if only Microsoft would copy this. The new versions of SSMS are seriously frustrating especially because it feels like every month it is updated, and I think what is getting up there, and this originally came from conversation I had with him where he was complaining about how SSMS. They should download the file, run the installer every single month where if they could just, and that’s where I thought, “Oh, well let’s make Database Health Monitor do it quicker and easier”, so thanks for the input there Derrick. I appreciate it.

Carlos: Yes, very nice little feature. So a quick note on the Companero Conference, we’re excited to announce that all of our speakers are set. We’ve been able to identify them or at least what we think we have. Some of that might be subject to change if schedules get updated. But we have basically all we can handle and we appreciate everybody reaching out to us. It’s not say that we couldn’t squeeze in surprise speaker but for purposes our speaker setup is straight and we’re going to be announcing those on the website at companeroconference.com here in the next couple of days. Of course we’ll be talking about them on the podcast each one as we get close.

Steve: So look for those announcements. We do have a total of 6 speakers beyond Carlos and myself, so Carlos and I makes and 6 makes 8. So it’s going to be quite a few people presenting.

Carlos: We have a [inaudible – 5:50] crew there. One date to keep in mind is that we are going to have a price increase on June 16th which happens to be my birthday and so if you would like to get in the lowest possible level of $400 you can do that before June 16th and we would love to see you there.

Steve: Yup and that’s the early bird pricing until that point I guess.

Carlos: That’s right.

Steve: Another thing relating to that early bird pricing everyone who registers by June 16th get something extra. Is that right, Carlos?

Carlos: Yes, we are going to sprinkle a little something extra if you’re there on Tuesday evening before the opening social. We’re going to invite in the early bird option those who can register beforehand. We’re going to invite them out to a dinner with as many speakers as it make it as well and we’ll have a dinner at a restaurant that will be announced in a couple of weeks ahead of time and get together and be able to kick that off with everybody there.

Steve: Yup. I think that would be a fun time to get to meet those early registrants and talk with all the speakers as well.

Carlos: Yup. Ok, now for a little SQL server in the news. You know, it’s amazing, it’s impressive how much functionality continuous to roll out of Microsoft and one that I know that people we’re asking about for quite awhile and admittedly I need to play a little bit more with that that is the Azure analysis services in Azure obviously.

Steve: Oh, very interesting because if that’s what it sounds like, previously you need to use analytic services local and then push your data to Azure. Sounds like now you’ll be able to run analysis services in a server less in Azure.

Carlos: Yes, so you’re going to without having to have some of that infrastructure to be able to take advantage of the cubes and some of these other things so the aggregates and do so in a way that you don’t have to worry as much about the servers. And so kind of interesting idea, and so at first I thought, who would be a good candidate for some of this? And I think it’s really those who have adopted PowerBI and I don’t want to say outgrown but maybe have more data. They’re throwing a little bit more data at PowerBI that it can handle and they need some cleansing or smooth some of the edges of that data before they actually present it to the end users. And I think that’s going to be probably a predominant use case for most of those who are going to be the early adaptors of the service.

Steve: You know another good use case I think would be people who have pushed a lot of data into the Azure parallel data warehouse and they need then to built cubes or other analysis components off of that.

Carlos: Right, that’s true so it would be interesting and again I just had a discussion today with somebody who would like to adopt this and so we’ll probably dipping our feet a little bit more into it and so it would be interesting to see how that service responds and behaves compared to what we’ve known and used to.

Steve: Alright. Well, the URL for this episode is sqldatapartners.com/communitytools or sqldatapartners.com/96 for our episode number.

Carlos: That’s right, and so ultimately again our topic today is our thoughts on what it takes to get good participation for the community. And we should say first if you have done anything in the community, right, there are no losers. You’ve done it. You’ve stepped out and you stuck your neck out and you’ve made something that you’ve created available to others. And so whether that’s your mother whose downloaded it or a thousand people you’re still pretty cool on our books and congratulations or at least for taking a step to make that happen. Yup, and I think one of them that we’ve come across that I think is really a key thing to look at when you’re doing this is really why are you doing it? Why are you building a community tool?

Carlos: Exactly. And so two of the tools that we’re going to highlight are the Brent Ozar unlimted Blitz scripts and then the tools from dbatools.io. And so these two have had great success if you will not only in name recognition which I think obviously helps but that in fostering people to give their time and suggest an improvement or a way to enhance what it is they are trying to do.
Steve: Yup, and I think between those two projects they’ve kind of had a different path to where they got to what they are today. And I think that with the DBA Tools they started out and for years they have been open source with many contributors. And then with the Blitz scripts from Brent Ozar that was something that was originally built internally by Brent Ozar and team. Eventually it was let into the wild as the open source concept.

Carlos: That’s right. And in face, so we have little sound bites if you will from both of them that we’ll go ahead and share now.

Chrissy: Alright, so DBA Tools actually didn’t start out how it is now. Initially, way back in 2014 I created a bunch of migrations commands and made a little project on GitHub called SQL Migration and then I started adding a couple of things like Get SQL Server Key where it dug into the registry and it grabbed that information. And then I just kept on adding more and overtime I realized that it could be more than just migration tool so I called it DBA Tools. And then I just started marketing it that way as a general PowerShell tool for DBAs and it really took off from there.

Brent: Yeah, I think for me it was, like with our stuff it was always we built it ourselves to use it ourselves first. And I think what made it so successful is it has to be something that you’re going to code no matter what. Like you have to believe in it enough that you can evangelize it and use it every day yourself. If it’s something that you think you’re just, “I’m going to do this for the community”, you know and you’re coding it for someone else you got to be User #1. Chrissy was clearly User #1 of her stuff, you know, and then you look at everybody who started piling on to that because it was just so obvious that it made production DBAs lives better.

Carlos: You know, and Steve, even as we went back to Episode 86 we were talking with Sebastian about his tSQLt. He talked about they were working at a hospital. They had this large project and needed to basically this elephant if you will they needed to chew up and swallow and it needed the way to break it into smaller chunks. And that’s how the framework was built and then it kind of improved upon ever since. And I think, again so that first question of why are you building something. I think a key to and we can talk about how we want to find success but to getting engagement is building something that you’re going to use. It’s going to solve some problem that you have.

Steve: Absolutely, and I think that with the example that Sebastian had on tSQLt. It was something that even if he’d never put it out in the wild I think he would have built most of it to do the job that he was trying to do with that hospital.

Carlos: Right, because it can go sideways and I think sometimes particularly as entrepreneurs things that we are trying to do we think, “Oh hey, people might be interested in this.” And Brent actually mentions this idea of trying to build something for somebody else and it just lands to the thud. His quote is, “If users don’t have a reason to actively use it and if you don’t have a reason to actively use it it’s not going to catch on.”

Steve: Right, and I think that is so important and I think that there are things that I look and say, “Oh, that would be interesting to put out as an open source.” But then I realized, well it’s something that I wouldn’t actively be using. I just be kind of doing it here and there and it’s maybe not worth doing at that point.

Carlos: Sure. Yeah, because it is interesting as to what even people latch on to. I know when I was talking with Adam Machanic in Episode 22 about sp_whoisactive. It had been available for about 8 years there. Some of his favorite components of sp_whoisactive was not something that I was using. Admittedly, that may just be because I’m a knuckle dragging Neanderthal and he is a lot smarter than I am.

Steve: And I think with that, I mean another example is like the Ola maintenance scripts there, right, where I, prior to discovering those I’ve built some of those my own. And then I realized that I saw his and I thought, “Oh, I could either put mine out as publicly available or I could just switch over and start using his because his is really awesome.”

Carlos: Right, so kind of being the first out there you kind of stick your neck out. And I don’t know the history there because he does a couple of different things but I think backups or maybe index maintenance is one of those popular for and we were all kind of struggling with that, and so it was like, “Hey that’s a problem that a lot of people have. Let me put it out there and then make it better.” And I think as more people have used it, again they are able to provide some feedback with them. It makes it better for him in his environments and then for everybody else as well.

Steve: Yup, absolutely.

Carlos: Now, this kind of brings up the questions as what success looks like with a community project. And I think we can use the default database answer of it depends on what it is that you think success is. Obviously, engagement is a component but it’s not necessarily the only metric that we want to be using.

Steve: Yeah, and I think the big part is being able to get those active contributors. I mean, that’s not an easy thing to do. And I think that anyone can throw something out there and maybe it will get a little bit of acceptance but getting people to contribute that makes a huge difference between an active and successful community project and one that’s just maybe will languish along the way. And I know we all get busy with different project and things and if you’re the only person contributing well that product may slow down during the times when you’re busy. But if there are other people contributing at the times you get busy then it will continue to move along.

Carlos: Yup, and Chrissy Lemaire from dbatools.io. She actually brings up another point as to one of the reasons why she puts together her tool.

Chrissy: I don’t know if you can hear my smile, but it’s true. I’ve seen people come and say, “Man, CK was walking me through this thing and I’m just like, dude, CK you are awesome.” It’s been an incredible journey with CK, with Claudio, with Rob, and Aaron and everybody. It’s kind of out there buts it’s reduced, I don’t know about you guys, but for me kind of being isolated here as both an IT worker and somewhat of an introvert and somebody who is living in a foreign country. It’s reduced the isolation, and I love hanging out. I hang out with CK all the time, I hang out with Rob all the time, I hang out with Aaron and we all just kind of hang out together and we are all passionate about this project.

Constantine: Yeah, there is a lot of people who know that DBA’s are a lonely bunch. There’s not a lot of people understand what we are doing in the business even if we work with them every day and having other group of other DBA’s that you can complain to, or laugh about the same problems or look at an execution plan with so many zeros that doesn’t fit on the screen. That’s meaningful to people like us.

Carlos: Ok, so we have this idea, we have something that we think is helping us and we think will help other people. So it kind of begs the question as what does it take to keep the community going? And I think we can kind of skip to the end and a spoiler here for everyone is that it’s a lot of work. If you think building the tool was hard being a community manager is even harder, right? Those of you who have done anything with your User Groups is might be a good parallel to something like that.

Steve: Yeah, and I think one of the keys that I remember from the DBA Tools podcast that they talked about was how making everyone feel welcomed. And that’s not always easy to do in an online GitHub style environment because sometimes things don’t come across as smooth or as nice as you like them to. And I think having a positive engagement is really a key there.

Carlos: And so both Chrissy and Brent give us some thoughts on what it takes to kind of keep their communities going.

Chrissy: So I do have to tend to the channel almost like it’s a garden. If I get sick, or I just recently, my little baby my cat got real sick, I let everybody know hey, “I won’t be around. I’m having problems with my cat.” Always having him, to bring him to friends which is fancy. I’m bringing my cat to friends to get some fixes and I let them know and I do invest a lot of time, you know ensuring that the conversation continues, ensuring that the pull requests are merged. So there is absolutely a lot of time invested in developing those relationships like you were talking about.”

Brent: When someone contributes code, when they make a pull request you can kind of bet on it being probably an hour of your life. Like you’re going to go through, when they submit a piece of code even if you worked with them before so I have to spin up instances of SQL server from 2008 to 2016. I got to test it on case sensitive databases. Databases with large number of indexes, servers with large numbers of databases because people who contribute code don’t necessarily have this lab environments to go test it in, and it’s not fair for me to ask them to do that. We have this huge lab an AWS and Google so it’s easier for us to do that but there goes an hour of my life just testing it across those. Even with automated testing like I have got a few cheap and easy SQL scripts that just roll it out everywhere and bang on it then I’ve got to find what the errors are and make sure that it’s actually going to work. Then you want to document it enough that other people can go on and use it. Sometimes that means writing documentation in a form of a blog post because someone will write a check for something new in sp_Blitz like a new crazy way type and then if there is no good documentation out there on it you have to go write it. So it can be one, two, three, four hours to do a complex pull request but even a basic one it’s not unusual to see an hour of your life there. So looking it the GitHub right now we have four pull request out, Jeff Rosenberg, RW Howard, Eric has a couple of them out there. We’re now just looking at them. I know I’m going to go and spend like half a day on the weekend or like an evening so that I can go tackle these things and make sure that they are good to go publicly. And it’s not the contributor’s fault when they don’t. Often they are really using this code themselves in certain environments but then they just encounter surprising gatches like the case sensitive database thing that most people don’t have to hassle with.

Carlos: So we mentioned a little bit about the tech side there. But as you alluded too there previous Steve, it’s not just about technology at that point. It’s going to be about people.

Steve: Absolutely, and I think part of it is kind of the gamafication side of it and I think that in the GitHub system there is a lot of that because you can go in and see how often people contribute and you can see overtime what contributions are and it’s almost like people want to compete to get the most in there.

Carlos: That’s right so they have pretty good metrics there. How many downloads and pull requests and all of these other type things. It helps lends itself to, “Yes, hey, I’m a project worth contributing to.” But again it does take a little bit of effort to put some of that together. I know, so for Chrissy for example has her contribution wall so on dbatools.io if you could have meet certain metric. And I think it’s like a certain number of lines of code that you have to contribute then you get added to the wall of fame type of thing.

Steve: Yup, and it’s kind of like a high score board on who has had the most contributions.

Carlos: That’s right. Again it’s another facet of the overall project. It’s not going to make your tool be any better but it’s going to help engage and foster that community.

Steve: Well, I mean you say it won’t make the tool any better. It actually may help make the tool better because those people are going to want to contribute more.

Steve: Sure, sure, you know, that’s right. I think I’m bespoke there in the sense that you’re putting in effort that will indirectly affect the tool. You’re hoping that that work will then be reproduced by other people who want to participate.

Carlos: Yup, and I remember, I don’t know, many years ago I tried contributing on an open source Linux project of around something I was working on that point and I just remember doing that the people I was contributing with just made me feel stupid. Not necessarily [inaudible – 24:16] doing but I just didn’t understand their process or their flow. And I think that that’s one of the things that the DBA Tools do with the wall of contributor and the gamafication and all that is making people feel comfortable.

Carlos: And then also, so Brent did mention this. We don’t have a quote from it but kind of that rules of engagement of, and Chrissy mentioned this as well as far as like, “Here is kind of our standards. Here is how we kind of expect certain things to be.” Now she did mentioned what kind of working with Constantine and doing those code reviews. And again, that’s kind of individual one on one time but again it helps [inaudible – 24:59] like, hey when you contribute this is kind of what we’re looking for and this is what we think it should look like and that will help people kind of self identify as to whether they can help or not. But it doesn’t always go your way. I mean getting people to contribute might mean that they wanted to do things that are outside of what you thought your tool is going to do, and Brent got an interesting quote here on this one.

Brent: Well, another you said was how much time do you spend guiding versus building? One of the things that’s is tricky is sometimes you’ll get request or either pull request or just request for features that don’t line up with where you want to go. Like even though it’s open source that doesn’t mean it has to be a 100% community vision. It can still be your vision of where you want the tool to go because when somebody contributes code you’re supporting that for the rest of your life. Just because somebody threw in a brand new feature doesn’t mean they’re going to be around to support it when it breaks. So there is a lot of time where I think maybe one pull request out of 4 or 5 where I’ll say, “You know what I totally get where you’re going with that but this just isn’t is going to be something that we do.” The classic example is security stuff. People often want to add security stuff in the sp_blitz and I’m just like I get that it’s important to some people but someone needs to start a security script because I’m not going to be the guy who’s going to explain how security works because that simply way out of my wheel house.

Carlos: And I think this one is tough because we’ve just talked about kind of going to great lengths in creating processes and thinking about how you’re going to engage people and then you finally get some folks engaged then you’re like, “I don’t want to do that.” And I think that could be tough.

Steve: Yeah, and I think a lot of that really comes down to the vision of what it is that your open source thing is going to do. If the vision is we’re going to these things but not include security well that’s a very different project then if we’re going to go build a security platform. Not that either one of them is any better or any worst than the other. It’s really just focus so that you can do a really good job at what you’re doing.

Carlos: Right, and I think at the end of the day you have to know that if you accept it it’s going to stick with you forever, or not forever, it’s going to be something that at least take care of, tend to or get out. And that could just cause problems for adoption. So kind of knowing where you want to go and I think it’s then ok to say, “Thank you for that but we’re not going down that path.” But they do talk about it being a little bit like having another job.

Steve: Yeah I can see that. I think that there is so much involved in managing all of that from the code reviews to the people to even getting updates out when that’s appropriate that it could really be like another job.

Carlos: Right. And so Brent has a interesting quote about this.

Brent: Yeah, and it will be draining. This is going to sound awful but there is a great post you can go and find it on how draining it is. It’s like, I’m sorry you’re the new leader of an open source project, but there is going to be a lot of people who file bugs and you are so proud of your code until the moment all of a sudden people start pouring in, “Hey, this doesn’t work.” And they’re not going to be polite. They’re going to be, “Hey, you doug.” They’re going to be short into the point and you have to think of that feedback as a gift because for every person who tells you what’s broken there are a hundred people who downloaded, and try it, and went, “This sucks!”, and they’re not ever going to take their time to go tell you what was broken. It’s bad as it sounds even the short snippy, “Hey, this is broken when I do this”, is still good feedback that you should be thankful that people are actually using it. I tell myself that all the time.

Carlos: It’s funny I tried to get Chrissy to kind of nail down be like, “Ok”, because I started to kind of sense what she was doing of course in the Slack Channel. She’s quite busy and even in our interview kind of doing this individual code reviews and I thought, “Oh my goodness that sounds like a lot of work”. And she kind of demurred a little bit and like, “Oh, you know, it’s all just part of what I have to do.” So I actually signed up for the DBA Tools watcher their on GitHub and so I get notifications. So I signed up on April 7th, and between April 7th and as of the time of this recording there were 1,017 emails or discussion points, pulls, updates, changes, modifications to the code and so that is a lot. Here we’re recording on the 11th, so just a little over a month a thousand. You divide that by 30 that’s pretty intensive.

Steve: Oh yeah, and that’s more than any solo developer would be able to do. And it’s probably even more than a lot of small companies may be able to do with the development team.

Carlos: Sure, so I definitely think, you are kind of making that commitment to putting in some processes and then almost like the dev ops stuff we keep talking about. Like making your life a little bit easier and spending some time on things that are just going to help those processes a little bit easier. Now one of the things that we’re going to want to help you with this is good metrics. And at least in our case with the podcast we lament quite a bit about we could see the download numbers but then the people that actually engage with us is a teeny teeny tiny fraction of that number. And so kind of knowing, if you’re going the right direction and kind of getting that feedback is important and I think Chrissy is a great example of this is because she is in there talking, promoting, cheerleading, engaging, “Hey, you mentioned you’re going to do this for me. How was it going. What can I help you with?” Those kind of things and I think because of that she is getting good metrics. She can see a lot of this traffic and where people are going and paying attention to them.

Steve: And I think as you mentioned the podcast with a thousand listeners. I mean that’s one metric but how valuable is that really? I mean if you have a thousand of downloads of something, is that a good number? If you don’t get a lot of feedback do you assume that everything is perfect or do you assume that nobody cares enough? It’s really hard to tell there.
Carlos: Yeah, and I think again things like GitHub can help with that but again it’s just another component that you have to then pay attention to. Now, we’ve been talking a little bit about tools at least the ones that we’ve mentioned have all been tool based but it doesn’t necessarily mean that it has just to be a tool in order for you to engage with the community. And Steve actually is a really good example. As I was getting dressed today I actually happen to be wearing my Database Corruption Challenge t-shirt and I thought that that was kind of an interesting way. You won’t necessarily creating a tool so much as you were creating a framework for people to be able to participate in and contribute, you learn quite a bit and they also learn as well.

Steve: Yup, and with that that was something that, oh it kind of evolved and grew as I did it. I didn’t really have a solid plan in the beginning of what it was going to be. But what it turned to be in the end was a great training platform. A way for people to learn how to deal with database corruption and with that I could go and rant and rave all day about here is how you can deal with it. But with contributors, the people who competed in the Database Corruption Challenge, providing their solutions I was able to learn some really interesting things along the way, and I was able to share those with the community that others could learn from those interesting solutions along the way as well.

Carlos: Sure, so kind of a win-win scenario.

Steve: Yup, and although it’s technically not an open source thing in the fact that there is no GitHub and people don’t contribute in that way. But it is, I mean all of the sample databases and the solutions that are there are freely available for people who download and take a look at and learn from. So it’s almost like open learning.

Carlos: Right, exactly, I think because you had that mechanism where you were open to that idea of new suggestions or “hey have you tried this” or “what about this certain scenario”. Again it lends itself to people wanting to participate and engage. Another one again that is slightly different take which some of you may have participated in, I think about the T-SQL Tuesday. Again that’s kind of taking on a life of its own. There is now like a each month there is a leader. The leader then picks a topic, makes a blog post, lets users know how they can interact with this and then they make sure that all the post and what not are put together in a single repository if you will. Generally I guess on that page so that those who are interested on learning more about that topic can go and produce some of the blogs of their choice.

Steve: Yup, that’s definitely another interesting training with community involvement.

Carlos: Yes, and again that’s one of those things, maybe a good example of a way to engage without taking on a ton of responsibility, and originally Adam Machanic started that. Steve Jones helped kind of keep it alive I think by helping maintain the topics and who was organizing and now they have their own website and different people kind of organize that so they are able to engage in different way.

Steve: Yup. SQL Saturday is another example of community contribution.

Carlos: It is, that’s right, I mean today SQL Saturday is big, probably a big deal particularly in our community but I think about Andy Warren and Steve. I know there are a couple of others I’m not mentioning but kind of put in together those initial SQL Saturdays, you know where, “Hey, let’s get some pizza and get together and chat about tech stuff and how much we now as organizers benefit from the structure that is already been created.” If you’ve ever put one on you like to fret and worry about all kinds of different things but there is a whole lot of infrastructure that’s been put together that you’re taking advantage of and branding of course as a result people kind of giving their time giving feedback to pass about how to make it better and different, change it up. Things like that.

Steve: Yup, I think SQL Saturday is one of the more successful endeavors in that area. I mean there are a lot of other things people tried along the way out there but the involvement you get from people all over the world in the SQL community through SQL Saturdays is just amazing.

Carlos: Mind boggling. And then even through a lesser extent so our podcast it kind of a way to engage community. Again, we lamented that we don’t have as good as of engagement or metrics maybe some of these other ways. But we still like to get out there. It’s something that we can do with a little bit of process around it and people do give us suggestions on things we could talk about which we appreciate and we would like you to keep them coming.

Steve: Yeah, I find it more enjoyable sometimes when we’re building a podcast around something that somebody asked for rather than just something that we thought up and that would be nice, you know. And I think that pushes us little bit more and we usually learn a bit more in those areas too. But it also leads to some more successful podcast too. And I think one of the big ones was our indexing podcast that ended up being a double episode. And that was one that people asked for for a long time and we just kind of never really thought of it ourselves as, “Hey let’s do an indexing one.” The community asked and we responded, and I think we got some great feedback on that one.

Carlos: That’s right, that’s a great point. So kind of wrapping these all together is you may consider maybe you have something out there that you think you would like to share. So the question is should you do it? We don’t know that it’s for everyone and I think we’ve talked about some different ways that you could potentially engage without doing a full on set tool. But if you have a problem that you’ve solved and you are committed to it. You liked the idea and I’ll say go for it.

Steve: Absolutely. It’s the kind of thing that if it solved the problem for you and in your environment there is good chance that maybe it will solve a problem for someone else. And if you want to have that be something that grows and has a life of its own look for other people to contribute to it and you may end up with something way more powerful than you ever thought of.

Carlos: And this is where we’ll open up our podcast. If there is something you would like for us to talk about we could probably incorporate that into the SQL Server in the News. Or if you want to just talk about the actual problem that you’re solving, we’ve done that plenty of times. We would love to have you on and we could talk about that and what the problem is and how you go about solving it. That would be an interesting conversation as well or anything we could do to help contribute to that. Let us know and we would be happy to make something happen.

Steve: Yup, and sometimes that might be that we love what it is and we want to run with it and use it ourselves. As you make it available other times it might be coming out and say, “Hey, that’s sounds like a really good thing to go into sp_blitz and go talk to Brent about that.” Or it might be a great thing for DBA Tools or just understanding what the right avenue is to get that thing you’re doing out there and available for other people to use.

Carlos: Exactly. So we look forward to what the community continues to put forth and some of the neat ideas that are available. And so by all means, again, if you’re on the fence if we can help you push on as you go ahead and do it go for it if the things that we talked about today haven’t scared you away.

Steve: Yeah, and this isn’t for everybody. Some people may look at this and say, “No, I never want to do that.” That’s ok too.

Carlos: I think like you mentioned like getting plugged in to some other project might be the way to go, right?

Steve: Yup.

Carlos: Well, awesome. Upcoming episodes we actually have Chuck Lathrope. Did I say that name right?

Steve: Yup, and it’s on SQL replication.

Carlos: That’s right, so he has been talking about this quite a bit and so we thought, hey let’s have him on and talk a little bit about that. Again, the show notes for today’s episode are going to be at sqldatapartners.com/communitytools.

Steve: Or at sqldatapartners.com/96 for the episode number.

Carlos: Yes, and so companeros, thanks again. If you’ve made it this far we appreciate you tuning in. If you’ve been listening for a while, thank you! If you’re new to the program thanks for tuning in and we look forward to your feedback. We’ve been talking quite a bit about maybe we’ve been mentioning Twitter but actually we are getting more feedback from other places beyond just Twitter and all of the social media platforms. Feel free to reach out to us and then of course at any in person event. Some of the SQL Saturdays and we’d love to have you out at the Companero Conference if it makes sense for you to be there. So with that I guess we will close up this episode. Last thoughts, Steve?

Steve: Well, as far as the social media goes. Normally we close out with our Twitter and those spread I think this time I’d like to say instead of follow us on Twitter, come find me on LinkedIn and find Carlos on LinkedIn.

Carlos: There you go. Ok, well thanks again for tuning in companeros and we’ll see you on the SQL trail.

Episode 95: Polybase

Big data is term we have been hearing frequently as of late and this might cause some concern for us in the SQL Server space.  Microsoft has introduced some new functionality to help connect different data stores with PolyBase.  We are happy to have Kevin Feasel from ChannelAdvisor back with us and  Kevin will discuss some of the basics around what PolyBase does. We’ll be discussing a lot about integrations using PolyBase specifically on Hadoop and Azure SQL Blob Storage. We also touch on some of the technologies that will be supported in the future.

For those looking at implementing both structured and unstructured data stores, PolyBase will be a way to help bring these environments together. Kevin gives us a great overview and we think you will enjoy this episode.

PolyBase

 Episode Quote

“PolyBase gives you this integration and it’s opening this door to possibly getting rid of link servers.”

“PolyBase simplifies that a lot for us by making an assumption that there is a consistent definition for each row.”

“Learn something new… You learn something the first time, you can learn something again.”

Listen to Learn

– What is PolyBase?
– Technologies supported by PolyBase
– PolyBase integration with different data sources
– Some thoughts around which teams are going to own which pieces of the project
– How Hadoop integrators are responding to PolyBase

Kevin on Twitter
Polybase Guide

About Kevin Feasel

Kevin is a database administrator for ChannelAdvisor and the leader of the PASS Chapter in the Raleigh NC area.  Since he was last on the podcast, Kevin has been awarded the Microsoft MVP and will be a speaker at the Compañero Conference.  He also enjoys old German films.

Transcription: Polybase

Kevin: My name is Kevin Feasel, I am a Data platform MVP. I’m also a manager of a predictive analytics team here in Durham North Carolina. I’m extremely pleased to be able to speak at Campanero Con, even though I can’t pronounce it. I’m going to be speaking on a couple of topics one of them is Security and really getting an understanding of that network security, getting an understanding of what a database administrator can do to help secure a SQL server instance. I’m also really looking forward to talk about a big data solution basically how do I get started in that. I’m a database administrator, I’m the only database administrator at this company and somebody is coming to me talking about big data, where do I start? What do I start looking at? What actually is the benefit? What kinds of workloads would work well under this and which ones don’t? And getting some of the ideas of what’s happening in the industry and seeing how this different technologies are evolving and turning into a full ecosystem. Finally, showing how that ecosystem some integrates with SQL server.

Carlos: Kevin, our all-time podcast episode extraordinaire. Welcome back for another episode.

Kevin: Thank you! It’s good to defend the title.

Carlos: Yes, thank you for coming and talking with us. One of the things and one of the reasons we continue to have you on is you’re doing lots of different interesting things, and as database administrators we’ve been hearing this for a little while this idea of big data and it’s kind of been at the door. Lots of people even from a past perspective, they’ve open the doors to analytics to kind of join those two worlds. But for a lot of us it’s still kind of an unknown entity, it’s different technology and we think that we have something here that will kind of save the day if you will in the sense. And so our topic today is PolyBase and we wanted to talk with you about it, you’ve been talking about it, and presenting on it and working with it, so why don’t you give us the tour of PolyBase? What is it and why would we be interested in it?

Kevin: Sure, here’s the nickel tour version. PolyBase initially came about, I believe it was actually first introduced in 2010, so it was part of SQL server parallel data warehouse edition which later became APS otherwise known as Extremely Expensive edition. Enterprise is expensive, PDWAPS extremely expensive, in SQL server 2016 this was brought down to the masses or at least the masses who could afford Enterprise edition. It’s been around for few years but 2016 feels like first version for the rest of us who didn’t have a chance to play with really expensive hardware. The concept of a PolyBase does at a really high level is it allows you to integrate with other data sources, so before people start thinking, “Oh no, it is link servers all over again.”It’s not links servers. It’s not that bad. So as of today PolyBase supports a few different links where you can connect to a Hadoop cluster. We can connect to Azure blob storage and we can use PolyBase to migrate data from Azure blob storage into Azure SQL data warehouse. At PASS Summit 2016 there were a couple of interesting keynotes where they talked about expanding PolyBase beyond Hadoop and Azure blob storage looking into Elasticsearch, MongoDB, Teradata, Oracle and other sources as well.

Carlos: Wow, so basically we’re going to have the ability through a SQL server Management Studio to be able to interact with, move data to and from all of these different systems that you have mentioned?

Kevin: Yes, and be able to query it using just regular T-SQL. So when you create this table you create what’s called an external table. It’s a concept that lives on the source server like the Hadoop cluster. The data is over in Hadoop but when you query that table select star from my external table it’s going to go over, request data from Hadoop cluster, and pull that data back in the SQL server where you can treat it like it just came from a local table.

Carlos: Got you, so now is it going to store that like a time bases so that you know, I run my select star and then 10 minutes later Steve runs his. Is it going to pull that data back over again or there’s some management now that we have to think about because the data is now on my SQL server?

Kevin: So the data doesn’t really get persisted to SQL server. It’s an external table meaning that it will live in the blob storage or on your Hadoop cluster. The mechanic that PolyBase uses to allow this to work is it will pull the data into SQL server into a temp table but it’s not a temp table that you should know about as a developer of a T-SQL query. It’s kind of like behind the scenes temp table that then acts as the table that you’re going to query against, so you query DBO.myexternal table. Behind the scenes there is a secret temp table and it has the form and structure of that external table, data gets pulled in, collected and then processed as though it were local. But once it’s done it’s gone.

Steve: So then that process sounds very similar to sort of underlying workings behind when you run a query over a link server where it issues a command on the other side it brings the results back, it’s basically stores them in a hidden format so you can use them and the rest of the query, locally. And I guess, I mean I’m just trying to understand the correlation there. Is there a big difference on how that’s been done versus the link server?

Kevin: So there is one major difference and that is the concept of predicate push down. So the idea here is let’s say that I have a petabyte of data in this Hadoop cluster and petabyte of data in this folder, I want to be able to query, I’m sending a query that maybe I just want a few thousand rows or I want to aggregate the data in such ways that I don’t get a petabyte back, I just get the few thousand rows I need.

Carlos: Hopefully, because if you’re turning a petabyte of data you’re going to be in trouble.

Kevin: Yeah. I don’t have a petabyte of data on my SQL server instances. So I write this query in my WHERE clause, maybe I do summations, GROUP BY’s, HAVING’s. All of that predicate will get sent to the Hadoop cluster and on the Hadoop cluster side, PolyBase instigates a MapReduce job or set of MapReduce jobs to perform the operations that you wrote in T-SQL. It generates all of the jobs. It creates the data set that comes back and gets pulled into SQL server. So the link server if I were doing a link server to an another SQL instance, well another SQL instance is a special case, but if I were doing it to Oracle, I have to pull the whole data set back or from querying out to Hive I have to pull the whole data set back and then any filters get applied. So predicate push down is what lets you get back the rows that you need, only the rows that you need and gets around that whole links
server problem where, oh yeah I’m querying a billion rows I’ll see you tomorrow.

Steve: Sure, very interesting. I’ve heard some people speculate that link servers are dead or will be going away because of what we can do with PolyBase. Do you think that that’s a fair assessment?

Kevin: I am crossing my fingers hoping that this is so. As soon as they announced at 2016 PASS Summit what PolyBase is going to do in the future, I got really excited because I thought about, “Wait, what if I could connect to another SQL Server instance.” And there is one extra bit of PolyBase that I haven’t talked about yet. That is the concept of head nodes versus compute nodes. This concept in massive parallel processing that you have a head node, this is the orchestrator, this is the server that knows what queries are supposed to come in and out, and then it passes off details to different compute nodes. In Hadoop you have a name node and you have a bunch of data nodes. Over in PolyBase there is actually a similar infrastructure, so there is a head node, that is your SQL Server instance, must be Enterprise edition, and it controls the jobs. But you can to different compute nodes. They call it scale up cluster. These are Standard edition SQL Server instances that I can have sitting there doing work connecting to this Hadoop cluster to the different data nodes on the Hadoop cluster, pulling data back and getting aggregated data back to my head node. So unlike a link server where I have to pull all the data over to my one instance I can now have several PolyBase servers getting data, aggregating it locally, sending it up to the head node, sending up that aggregated as fine as they could data back to the head node where the head node finishes aggregation and presents to the end user the result.

Steve: Very interesting.

Carlos: Yeah, kind of scale out approach. Now I guess at this point it might be worth kind of going back and talking about some of the things that I need to put in place. Now you mentioned kind of this architecture perspective, I have Enterprise version I can have Standard versions but let’s just scale it down a little bit. I just have one node and I want to start using PolyBase. What are some of the things that I need to create or steps that I would take to in order to set that up?

Kevin: Ok, so let’s take the easiest example, that’s connecting to Azure blob storage. On my SQL server instance, I have to install PolyBase. That’s part of the setup; there is a little checkbox you can select. But in order to install PolyBase you must install the Oracle Java Runtime Environment.

Carlos: Yes, I cheated, and I was looking at the docs here and I saw that and I thought, “What in the world!” It’s like sleeping with the enemy, right?

Steve: So just to recap then, if I want to query against Azure blob storage with PolyBase when I install SQL server I need to also install, and again you get this as part of the install, but Oracle components for the Oracle Java Runtime.

Kevin: Correct. So you install those. There is a couple of configuration steps that are involved like there is a setting in SP configure that allows for external queries. Turn all the stuff on. There are configuration guides that can help you with that. Once you’ve got that done what you do is you create three things. The first thing that you want to create is an external data source. So the external data source says, this is what I’m connecting to, this is the resource location. If I’m connecting to Azure blob storage there is a type, actually I think for Azure blob storage you just use a type of Hadoop. If you use a Hadoop cluster you just use a type of Hadoop. If you’re writing in Azure elastic scaling query, there is a different data source type for that. But that’s little bit beyond my kin I haven’t written those yet. Ok, so you create this data source. This data source says, over here is the WASB address, the Azure blob storage location of the folder or file I’m interested in. So, actually let me, I may have to rephrase that because I’m now looking at the, opps. Ok, let me, sorry Julien, going to have to cut this part just a little bit because I just said something wrong. Now, I could just keep going and make it
sound like I’m an idiot. That wouldn’t be the first time admittedly, but.

Carlos: No, that’s fine. We’ll make it right.

Kevin: Ok, so let me start over. So the next thing that you do after you’ve configured PolyBase is you want to create an external data source. For Azure blob storage we create this external data source that points to the WASB address of your blob storage container location. So you’ll point to the container and the account name.

Steve: URL right?

Kevin: Yeah, that is WASB or WASB[s] address. It’s an Azure blob storage location. You’ll include your credentials because if it’s a secure blob you’ll need to pass in credentials. So you create this data source. The next thing you want to do is you create a file, so an external file format. That file format says, any files that are on a source that I specify, any files are going to follow this format. There are a few different formats. One of them is delimited text, so just text maybe colon delimited or semi-colon delimited or however you’ve delimited your text. You can use other formats as well. I would recommend starting out just use delimited text that is easiest to understand. You can grab a file and look at it. But when you start asking about better performance, one of the better formats is ORC, which is row columnar format that high views the store data. So it’s much more efficient for querying especially aggregating data but you can just use flat files.

Carlos: So knuckle-dragging Neanderthal that I am like how am I supposed to choose what kind of file that I need to use. Is there like a, if I’m going to be, I don’t know anything about Hadoop, how would I choose that?

Kevin: Yeah, absolutely, so knuckle-dragger, delimited file. Keep it easy for yourself. Once you get passed that, once you kind of get passed the doorway and you say, ok now how do I get do better? You have to think about whether your data is more of aggregation like what you would find in a warehouse table. In that case, I would use ORC. If I’m storing the data and it’s more of a row store style data I would use Parquet. There are a couple of other formats as well but those are the main two that really supported within PolyBase.

Carlos: Well now, so in that determination, so again I’m going to use the delimited file. I start, I don’t know, three months in, right, I start writing queries. There are processes that I now have in place and I decided, “Hey, I think I can do better. I want to change the format.” Am I going to have to like start redoing my queries or what’s all involved if I wanted to change that format down the line.

Kevin: Great question. What you would have to do? Let’s say you have delimited files now. You’ve created an external file format that’s of delimited type. Later on you say, well I am actually storing this as Parquet, so you create an external file format that’s Parquet. And now we get to the last portion of a PolyBase external table so the table has a two part name. It looks like any other table when you query it dbo. something or maybe external.mytable. You have the column definitions so all of the attributes in your table and at the bottom, the part that’s little a different is there is a WITH clause and inside that WITH clause you specify the location of your data, so those would be the specific file or folder that you want to point to. The data source and the file format.

Carlos: Got it. So when I wanted to do a new, if I wanted to change file formats I’m creating a new external table.

Kevin: Yeah or you just drop and recreate the one that’s there. The external table doesn’t have any data. It just has some metadata around it. So if you have a few second downtime you can drop that table, recreate the table, use the new format, maybe point to a new folder that has the data in a different format. All the nasty work of converting those files getting them into the other format, yeah you still have to do that stuff, but you can do that as a back fill process or you can do that kind of off to the side and just switch when you’re done. That way you don’t have to update any of your procedures or any calling code.

Carlos: Got you, ok, so that’s nice.

Steve: So then when you say the external file doesn’t really have anything more than just for definition there. That’s the definition that sits on your SQL server that’s defining where it’s going to go and get that data for instance out of Azure blob storage. So it’s really just a pointer off to that data and you’re switching it around and if you point it to a different format file you have to give it a format type appropriately.

Kevin: Yeah, so the external table, yeah it’s just metadata. It’s just some basic information.

Steve: Ok, so then with that it’s pointing to a file in Azure blob storage and can you just start out with an empty file and then start filling in with data from there or does that file in Azure blob storage have to have been created somewhere else to meet those formats?

Kevin: That’s a really good question. So you have the ability to insert data into blob storage or into Hadoop. There is another configuration option you have to turn on to allow for inserting and once you do each insert operation you do will create some number of files in blob storage or in Hadoop. So you have to have a folder as your right location. But every time you insert maybe you’re inserting once a month, you’re taking last month’s financial data, all the individual transactions and you’re writing it over to blob storage for long term storage. That insert generates 8 files over in Azure blob storage and then the data is there. You can query it just like it was always there. But you cannot update that data from PolyBase, you cannot delete that data from PolyBase.

Carlos: Interesting, so now obviously it’s going to vary from place to place but for me a setup perspective let’s say, right, so again I’m the only database administrator in my organization or I’m not familiar with Hadoop or these other. Well, I guess when the other databases get on boarded then there will be more access, right? But when I think from a big data perspective generally there’s going to be another team, maybe a vendor comes in, installs Hadoop, starts loading data, things like that. What are we as database administrators, were going to create all of those components that you just talked about, are the Hadoop vendors familiar with PolyBase? Are we talking the same language here or is this still something kind of a very SQL server centric idea? Does that make sense?

Kevin: I would say that vendors, they’re not really going to know a lot of the PolyBase details. They’re probably not going to be familiar enough with PolyBase itself to do it. I’ve had some discussion with people who worked at Hadoop vendors and they’re very interested in the concept but there is not a lot of internalized information around there. These are typically not people who spend a lot of time in SQL server, with SQL server so they don’t necessarily know how it works, how to set it up, what the positive and negative aspect are, how you can shoot yourself in the
foot.

Carlos: Well, so speaking of that so what are the ways we can shoot ourselves in the foot?

Kevin: Oh, you have to go and ask that. There are some assumptions that are built into the way that PolyBase works today. This is not a critique of the PolyBase team, of the developers, of the PMs. This is not at all a critique aimed at them. I like you guys still, don’t worry. One issue that you can run into is let’s say you have just text data and your file has new lines in it but the new lines don’t represent new lines of data. Maybe it’s a free form text field where a person typed in new lines to symbolize a new paragraph. Well, PolyBase doesn’t understand this idea of ignore new lines unless I told you that it’s a new line. It will just pick up that new line and say, oh yeah you got a new line here.

Carlos: A new record basically.

Kevin: Right. There are some assumptions that are built in. You can also burn yourself by defining your result set so you create that external table and maybe you define a value as an integer. Well, if the value comes back as a string because some of the data is malformed coming in then those rows will be rejected as they should. So you’re going from a non-structured or a semi-structured system into a very structured system in SQL server. That semi-structured system is ok with you throwing whatever garbage you want into this file but you have to define structure when you pull this out. Historically, on the Hadoop side that structure was defined in the mapping in the reduction phases, so MapReduce. It was defined by the developer putting together the data in such a way that the developer understood what this data point signifies. PolyBase simplifies that a lot for us by making an assumption that there is a consistent definition for each row, so we say an integer age is the first value. Well, it’s going to assume that there is an integer value there and it’s going to populate age with that. If maybe every 20th row we have something that’s totally different. Maybe instead of age it is eye color because something weird happened with our data. Well, every 20th row gets rejected. The way you can shoot yourself in the foot, let’s go back to you have a few billion rows of data that you want to pull over. Maybe you want to get just everywhere were the person is exactly 14 years of age. So you’re scanning through this data and every 20th row instead of it being integer age it’s actually a string. Every one of those rows gets rejected. There is a cutoff for the number of records that you are allowed to reject before just failing a query. That cutoff can be 0 or it can be as many as you want. It can be percentage or a numeric value. So let’s say 1 billion rows and you have a cutoff of 5,000. You’re going to go through quite a few records to get 5,000 rejected rows. Once it’s done, once rejection happens, once failure occurs the entire transactions roll back and you don’t get to see the data that was there already. It’s roll back. There was an error.

Carlos: Oh, got you, that’s right, yeah.

Kevin: So you may be sitting there for an hour waiting for this data to process and it comes back and it fails.

Carlos: Yes, so you might almost think about in a sense, again not try to discount Hadoop. At least in my mind a knuckle-dragger that I am, I think about that almost like an Excel file, right. I want to load that into something that it can accept it and then let me take care of finalizing any of that and look rejected rows and things like that. Almost like an ETL process, right?

Kevin: Sure. This is a fairly common pattern in the Hadoop ecosystem as well where; ok we have a raw data coming in. It’s there we put it into the data lake. So ideally the data lake has a lot of nice clean data in reality it’s more like a data swamp. It’s where you throw in a bunch of old stuff. You got mattresses in there, just all kinds of dirtiness.

Carlos: Fish with three eyes.

Kevin: Yeah, exactly. And so you pull that stuff out and you try to clean it up in some
process. Usually it’s going to be a Hadoop process. Maybe that’s a spark job, MapRecuce job that scrubs this data, tries to give it some symbol of sense and then writes it out to another directory where it’s more of a structured format. In that way you can read it in Hive which is SQL for Hadoop. You can read it with SparkSQL, SQL for Spark, or you could read it with PolyBase, SQL for SQL.

Carlos:  Got you, so that kind of almost goes back or takes me back to that idea again of, kind of that who’s working with who type idea, and it almost sounds like if we wanted to we could push some of that to like hey guys can we work on this MapReduce. Is that a fair question to say, hey can we work on this that when the data comes back it gets cleansed before I see it? Or is that still kind of, you know, I need to as a SQL server person assume all responsibility for that kind of thing?

Kevin:  I think that depends on your environment. It depends on relative levels of familiarity. But personally my expectation would be that if you are say using SQL server as the engine to see final results, then I believe that it makes perfect sense to ask the people on the Hadoop side, “Hey guys give me the data in a format that I can pull it easily.” So for example, maybe we are reading a lot of data coming in from IoT devices. We have Kafka setup. Kafka’s a big distributed message broker. It’s a really fascinating thing and we’re getting tremendous numbers of messages that are streaming in to our Hadoop cluster. We’re cleaning up those results, we’re storing the results and maybe we have some aggregations that we’re doing to show hourly results by device type. And then load that data in to a file that PolyBase can read. As part of an ETL process you may pull that data over the SQL server, Persistent SQL server. So query like SELECT FROM your table INSERT into the real SQL server table, and you’re keeping a smaller streamlined data set that you can use to populate a PowerBI Grid or that you can use to populate web application. In that scenario, personally I’d argue that yeah the Hadoop side people probably should be doing most of the cleanup work. If you are both sides, it becomes more a question of well what am I more comfortable doing, like sometimes if the data’s relatively clean to begin with, or if we’re willing to accept a certain level of failure, take it, bring it to the SQL server, I can do really cool things in SQL server.

Carlos:  So it kind of goes back right to the adage of knowing your data, right?

Kevin:  Absolutely.

Carlos:  Being familiar with it and then making a decision based on that.

Steve:  So then back to that example with the age and putting that into integer column in the table definition, do you see that, I mean, there’s lots of things that could be valid for ages in there. You could have 6 mo. to represent someone who’s six months old but then obviously when that gets pulled down and try to go into integer, it’s got text data in there and it’s not going to work. So do you find that people sort of shy away from those restrictive types in their table definitions or maybe just leave it as something that’s more open like a varchar max or something like that? Or do find that people go through the battle of cleaning it up or filtering it ahead of time?

Kevin:  Unfortunately, probably more the former. It’s more of, well it’s a string, every string works so we will pull that in as a string and then we’ll clean it up here. That is a downside where with a lot of ETL, through ETL tools I can take a data element, I can make decisions based off of what that element looks like, like 6 mo. I can do a substring, I can parse out, is there a MO or YR or some known value here, and use conditional logic to convert that into something that is consistent across the board. PolyBase isn’t going to give you that. It’s going to give you the easy way of pulling data but yeah that, it means, it doesn’t do the transformations for you.

Steve:  Okay. So another area that I’ve thought a little bit about is that and I know sort of jumping back to the whole link server example is that when you’re running a query in sort of old school link server, whatever’s going on in the other side really gets hidden from execution plans. It’s just blindly calling something on the other side across the link server and your execution plan doesn’t give you any details other than it was waiting on something on the other side. Now, is there an option for seeing execution plans when you’re using PolyBase to get a better understanding of if a query’s taking some time, maybe where that’s time is being taken on when it’s connecting out to Hadoop for Azure blob storage?

Kevin:  Yeah. The short answer is yes. The long answer is yes if you look at the XML. So you look at the query plan XML, it will give you some details including there’s a remote query which is XML inside of the XML. So you have to deserialize the XML, decode the XML, and you’ll be able to see what the remote operation looks like. So it gives you a few indicators of what’s happening. It’ll show you the individual steps. Also, there are several dynamic management views that are exposed for PolyBase. And those DMVs will show you a lot of the same information. They’ll show you the individual steps that occur for this MapReduce process or for the data retrieval process.

Carlos:  So I think very interesting topic and we’ll let you give last thoughts here but one of the things that I feel, that I’m confident about or happy about is that while there’s still some unknowns here, right? Having the Hadoop, you know, in my environment or being able to connect to it, Azure blob storage, all these other things that are coming down  the pipe, at least it’s going to be, I have a tool that I can do or integrate with some of these things on my own turf. And it’s not completely foreign that I have to go and, you know, pickup new technologies right away.

Kevin:  Yes. That’s how I’m thinking of it. This is why I like it so much. This is why, honest I think this was the best feature in SQL Server 2016. A lot of people are going to say query store is the best feature. Query Store is an awesome feature but PolyBase gives you this integration and it’s opening this door to possibly getting rid of link servers. It’s opening a door to distributing queries, distributing those really expensive SQL server queries. Kind of like what you do in Azure SQL data warehouse, hoping that maybe we get something like that locally.

Steve:  So I know you talked about how PolyBase is perhaps one of the best features in SQL server 2016. I know that SQL Server 2017 community technology preview too I believe just came out recently. And is there anything that’s in there new with PolyBase that you know about?

Kevin:  Nothing new with PolyBase.

Carlos: Got you.

Steve:  Okay.

Kevin:  There’s a whole bunch of really cool stuff I’m excited about but.

Carlos: The fair question to think or assume but it will be supported in Linux version as well.

Carlos:  Because it’s a core feature if you will, I know they’ve been working and talking with Travis, the PM over there for the Linux migration. That’s what they’ve been trying to accommodate. Again, listening to the AMP conference or event or whatever it was called. They did mention some additional
functionality that would be in the Linux version. I don’t remember them specifically calling up PolyBase but, you know, I had to imagine that it will be there even if it’s not there on day one.

Kevin:  The answer that I think is safe to give is in today’s CTP, CTP 2 for SQL on Linux, there is not PolyBase support but there is no reason that PolyBase cannot be there.

Carlos:  Got you. There you go. But again well we did mention that this ultimately Enterprise only feature, right?

Kevin:  Yeah, for the head node it has to be Enterprise edition. I think even with the SQL Server 2016 SP1, I think it still required to be Enterprise edition for the head node.

Carlos:  Okay, got you. Yeah, I feel like that PolyBase was in the list of things that they made available in the lower editions but I’m not sure if that includes the head node or not.

Kevin:  Yeah, I know that the compute node was available in Standard edition but I’m not sure.

Steve:  Yep. So given that it’s been a little while since 2016 came out, around a year roughly, and with PolyBase sort of been mainstream available since then, do you see that a lot of people are actually adopting this and using it in production environments or do you see more people just sort of experimenting and trying things out at this point?

Kevin:  It’s more of experimentation. I don’t know of many companies that are doing it. The way that I would put it is okay well you have to have SQL server 2016 which already cuts out large slice with companies. You have to have Enterprise edition and you have to have Hadoop cluster or you could use Azure Blob Storage and get value out of that way, but this is going to be a fairly narrow segment of the population even today.

Carlos:  Got you. Yeah, make sense.

Steve:  Well perhaps after this podcast more people will give it a check.

Kevin:  Yeah, I hope so.

Carlos:  That’s right. Compañeros if you are using PolyBase after what you’ve heard here today, I want to know about it. We’re going to report that to Microsoft. Let them know you heard it here first folks. Okay, so I know you’ve been on the show here before, Kevin, but we’re going to still go through SQL family.

Kevin:  Excellent.

Carlos:  Can we do it?

Kevin:  I think so. I may make up new answers.

Carlos:  Well would you have a couple of new questions that I think that have changed since last time you’re an individual guest so.

Carlos:  Okay. So the first question is how did you get started with SQL server?

Kevin:  I got started as a Web Developer. It was about a decade ago and I was an ASP.NET web forms developer. It was my first real job, so I was the person who was least afraid of databases. I’ve written SQL queries before and we had a need for database administration so I.

Carlos:  How hard could it be?

Kevin:  Yeah, pretty much. Like hey why is the server failing? Oh it’s ’cause it’s not a disk space.

Steve: Alright.

Carlos:  There you go, and now you know the rest of the story.

Steve:  So if you could change one thing about SQL server, what would it be?

Kevin:  That’s a good question because everything that I think of tends to happen which is really cool, I like that. So last time around I said I want PolyBase to support Spark, and I’d like to see that happen still. I’ve wanted Python support for machine learning within R services which is now machine learning services. And we just got that so that’s really cool. The thing that I want most right now is a really good client for Linux. So I want Management Studio for Linux or something Management Studio ask for Linux that does maybe like 70% of what SSMS does.

Carlos:  Interesting. In all flavors of Linux or do you have a particular flavor that you’re interested in?

Kevin:  I’m kind of okay with pretty much any flavor. I mean you can get it to work. Nowadays, I use Ubuntu or Elementary a lot. Previously I’ve done a lot of Redhat. I go back to Mandrake for people in the know.

Steve:  Right. Yeah, I know recently we heard that, what was it SQL command, was going to be available on the Mac and that was a big move. And I think we’re a long ways off from Management Studio being on other platforms. But who knows, I
could be wrong there.

Kevin:  Yeah. I’m looking forward to whatever they are able to provide.

Steve:  No, I know that’d be certainly cool.

Carlos:  Although, and we do have request into the PM for SQL Server Management Studio. We haven’t quite been able to get them on the show just yet, but when we do we’ll ask them that question.

Kevin:  Put them on the spot.

Carlos:  That’s right. Okay, so best piece of career advice you’ve received.

Kevin:  I’m going to flip this on its head, best career advice I can give.

Carlos:  Well, here we go.

Kevin: Learn something new. Especially if you’re in a shop where you’re on SQL server 2005, take some more of your own time. Learn something new. It doesn’t matter that much what it is but expand out just a little bit. It could be features, it could be new versions of SQL server, it could be let’s learn a new language, let’s learn a new part of the stack. But don’t get caught in this one little part that you find out someday oh look your job has been animated away and you lost all of those skills to learn. You learn something the first time, you can learn something again. So that would be my advice.

Carlos:  And that is why we’re going to have you as a speaker at the Companero Conference. So folks if you want to hang out more with Kevin and learn all of his wisdom, you can come to the conference and hang out with us.

Kevin: Wisdom and $5 gets you a cup of coffee.

Steve:  And on to our last SQL family question, if you could have one superhero power, what would it be and why would you want it?

Kevin:  We’re getting close to episode 100. Nobody else has ever answered that this way. I want phase walking. I want the shadow cat kitty pride be able to phase through walls, phase through objects. Nobody else has answered that so either I’m completely insane and picking the wrong power or I’m the head of the curve. I’ll let the audience decide.

Steve:  Or it could be you’ve just answered the question several times before as well and you’ve had more time to think about it too.

Kevin:  That is also possible.

Steve:  Alright, very good.

Carlos:  Awesome, Kevin. Thanks again for stopping by. We always enjoy it.

Kevin:  I’m glad to come here.

Episode 94: Qlik Technologies

There are lots of reporting options and I watch SQL Server move up the Gartner magic quantrant, I saw another reporting tool moving up as well–Qlik.  In this episode we will start by sharing information about Qlik, what it is and some background in the event it gets adopted at your company. Today we are delighted to have Michael Armentrout as our guest. Michael is a Microsoft SQL Server DBA as well as the developer of QlikView and we discuss the fundamentals of QlikView such as the associative model, in-memory, compression and sharing among others. Also we will hear from Michael the difference between QlikView and QlikSense, and some of the “competitors”.

Michael share some of his thoughts on using the technology and how it is different from the traditional Microsoft stack.  We think you will find it interesting.

Episode Quote

“I think a lot of others are starting to inch towards that in-memory model…it’s a new silver bullet.”

“The biggest thing is understanding of the data obviously.”

“‘No one will care about your career more than you, so it’s up to you to advance your career to whatever level you want it to be.”

Listen to Learn

– Qlik technologies/products
– QlikView functionalities
– Qlik products and their associative engine
– Differences between QlikView and QlikSense
– How they differ from the Microsoft Stack

Michael on LinkedIn
Michael on Twitter

SQL Server in the News

Azure total cost of ownership calculator – http://tco.microsoft.com/
Testing on SQL Server 2017 CTP2

About Michael Armentrout

Michael S. Armentrout is a 17 year Sybase/Microsoft SQLServer DBA working in the legal, healthcare and financial industries including versions SQLServer 6.5 – 2016. The past few years I have focused, primarily, on Qlik development, while providing DBA support for a healthcare provider. Currently I on contract with Summit Healthcare, in Chambersburg, PA, providing QlikView services as well as DBA services. In my downtime I enjoy my wife, my four kids and playing guitar.

Transcription: Qlik Technologies

Carlos: Michael, welcome to the program!

Michael: Thank you! It’s an honour to be here.

Carlos: Thanks for coming on the show today and we’re hoping that you can give us some insight into this new technology that I’ve been seeing a lot about and that I’m actually curious about its name. Steven and I, we’re kind of talking about this and we were like quilk, click, click, clock. Tell us how you pronounce it.

Michael: It’s pronounced “click”.

Steve: Wow! That’s so easy to say that way compared to what we’re trying to do.

Michael: Actually I’ve seen it spelled many ways so…

Carlos: So ultimately the interest and I think one of the reasons we wanted to have you on, we’ve been talking a little bit about reporting options and we like to try to get around the edges if you will of the SQL server community from time to time. Just to talk about what others are seeing and what they might be experiencing. Last year as Microsoft, Eclipse, Oracle, on the Gartner Magic Quadrant, you could see where other technologies were in relation to that and so they continue to publish this out and one that I saw that was moving up the ranks was this Qlik Technology. And of course been talking with some folks, you being among them, that are listeners of the show but and then also users of this product, I thought, oh, ok, we should probably have the conversation because if people were being exposed to it, it’d be nice to at least talk about the what’s and the why’s and how that it might play into their environments. So I guess give us the nickel tour of what is Qlik and QlikView, give us the nickel tour.

Michael: Qlik, the company, was founded in 1993 in Sweden and the creators wanted to create software that mimics the way humans intuitively think which is through associations. So in 1996 they released the first version of QlikView.

Carlos: Is this a reporting software or…

Michael: It is a. So it’s ETL along with storage or EW and then producing UI/UX on the backend. So you don’t need an SSIS although you can leverage them but it’s self-contained.

Carlos: Interesting. So then it would connect directly to your SQL server box and then just kind of take care of the rest?

Michael: Correct. I mean there’s a way to load data, and it’s pretty much load column names from SQL and then a select statements. And it’s basic level.

Carlos: So tell us how you got started with it. I mean you were using some SQL server tools if you will. We talked a little about SSRS and whatnot. You mentioned SSIS. So how does an organization or maybe take us a little bit of history, how did you decide or what was the thought process like, hey, we should be using Qlik.

Michael: Originally, the company I’m at wanted to go the Microsoft BI Stack route. And then were very rural area so they learned pretty quickly that we’re not going to be able to staff up to leverage Microsoft. So we worked with another tender. We’ve had another product that worked with Meditech, our EMR, and we’ve developed an app for about a year and a half and it just didn’t pan out, it wasn’t robust enough. So they went back to drawn board and brought Qlik View back in or brought it in, excuse me. They were considered originally but…

Carlos: Didn’t make a cut, another, getting a second look.

Michael: Correct.

Steve: So then, what type of things are you doing with QlikView or where’re you doing with QlikView in an environment?

Michael: Ton of clinical reporting but I guess now might be a time to sort of, pull back to covers, and maybe explain why the Qlik Technologies, what their advantage is. So they have an associative engine. So as it calls data in, it compresses it with
cardinality so once you like 47500 zip codes, 43000 zip codes, I believe, in the US. So if you have five million customers, you don’t store five million zip codes. You store for 43000 zip codes. Now if you have male and female, you store two values for that. Just got an example, we have an app that calls in the last 30 days of nursing orders. So in SQL Server that’s 557 MB. But when we call that in in Qlik View, it’s 27 MB. That’s same information.

Carlos: Interesting. There’s some compression going on.

Michael: Massive compression. And then both Qlik products, QlikView and QlikSense, we can get on the differences later, are all in memory.

Carlos: Got you. Interesting. We’ve seen some of that played out here as some features in the SQL Server realm. So this is interesting, QlikView may be one of those helping pushing Microsoft in that direction. Almost sounds like.

Michael: Correct. And I think a lot of others are starting to inch towards that in-memory model.

Carlos: Sure. Sure. It’s just so sexy, right. It’s in-memory, right.

Michael: It’s a new silver bullet.

Carlos: That’s right. That’s right.

Steve: So then with that, once you’ve got all that data in memory, and then earlier you mentioned building UI/UX, are you then using that to build just the reporting UI/UX or you actually building applications that can change and manipulate data?

Michael: Good question. So, the QlikView, the Qlik Products are sort of read-only. So you pull from your source and then ultimately the biggest, the thing that makes everything else easier is your data model which you shoot forward as a start schema. So once you bring that data in through your ETL and you manipulate it, you can do it in the frontend or on the ETL loads and there’s places to do that, and you bring it into a start schema. And so Qlik’s model is everything, is associated. Each data piece is associated with every other piece of data. So your tables, you have one column that they join on your primary key and it could only be one column. So then through the visual you will be able to see associations. So Qlik is known for their green, white, gray color scheme. So if you select a value in say, in the UI, and it’s green, that’s the slack value. So you might choose all males. And so you’ll have, on your other screens, you’ll have values that are white which means they are associated, and you’ll have values that are gray which means they are not associated. So it’s an associative data model.

Carlos: Just take an example a little bit further just to make sure that I’m understanding. Again, we’re talking about a medical facility here. So I’m kind of querying the data and I’m looking for patterns basically, right? So I want to know the males that have broken legs, just use as an example, or what other things might those people experience, and then it’s going to show that to me without having to build the report or build all the logic to show that data?

Michael: Correct. Absolute correct. So I just, I mean, I love SQL Server so what I’m about to say is not necessary a knock on it, it’s just a comparison. So query-based tools like, for example, Cubes, you sort to have a, you have to pre-build everything. You have to know what questions users going to ask, and you have to build answers to it. If they change something, you have to rebuild the cube, they have to wait a day, two days, whatever might be. So with query-based tools, you have pre-defined joins, pre-aggregated hierarchies, and its only part of the story. If you forget to add a column into your query or into your results or your cube or whatnot, and you’ve lost it until someone recognizes that and you go back.

Carlos: Right. You know, I think you bring up a very interesting point. And one that actually people have asked us to talked about and that is how the relational, the database landscape, if you will, in general, is just changing. And you have these
other database technologies that are coming up to help solve some of those exact problems. Because of the team of people, not generally, that have data warehousing experience, they put that all together, so to have tools that make it easier like I don’t need to be a data warehouse expert to get in there and start playing around my data is very attractive.

Michael: Yes. And that’s sort of the selling point and then whenever we get in the differences between the QlikView and QlikSense, what audience each of those serves.

Carlos: Sure. Yeah. So let’s get into that now.

Michael: Ok. Sure. So, QlikView is what they call, is what they term guided analytics. So you sort of, you build the UI and that could be scatter plots, bar charts, pie charts, pivot tables, there’s lots of objects within QlikView that you can build which is sort of guide the user. We kind of nudge them along and then tell them what you expose, they could go, “Oh, I didn’t know that males with broken arms, they all get some certain medicine or most of them get a certain medicine.” So unless you’re asking that question beforehand, you won’t know that in SQL.

Carlos: Right, got you.

Michael: So Qlik exposes those, QlikView exposes those through the UI.

Carlos: And then QlikSense?

Michael: QlikSense is what’s called, it’s their newest product, they’re morphing they’re web-based, cloud, it is what’s called self-service analytics.

Carlos: Aha, there’s another buzz term.

Michael: Yes. So in QlikSense, I mean, the syntax is, you could take QlikView syntax, drop it into a QlikSense application and it’ll work the same. You have a different sort of lay-out and constraints in your UI but ultimately what people would do, would build, is what’s called a master library of items. And these are, this’s when data governance comes in to. If the organization decides this is the definition of length of stay of point X to point B, or A to B, then they might build that and then users on the back end could theoretically drop that unto a chart that they want as a measure or as a dimension or whatever or how they want to use it. And then it ultimately ends up being self-service.

Carlos: Got you. Now, you’ve thrown out terms: dimensions, columns and joins and setting kind of some of the stuff, I guess take us through some of that and well, it’s been a while since you’ve used SSRS, you mentioned it in our, as we were getting ready to go live here. What are some of the hurdles potentially, or maybe some of the differences that you’ve seen in, what’s ramp up is more my question, to start using a tool like this?

Michael: The ramp up is, the biggest thing is understanding of the data obviously.

Carlos: So that doesn’t go away?

Michael: No, not at all. You have to understand the data and then for DBAs, for SQL people what was really hard for me was I wanted because it loads columns from SQL statement. And I wanted to write this big SQL statements and they could tell me just suck it all in and work it within QlikView, it’s much easier. And I’ve learned, they’re actually correct, it’s much easier. So just bring in all the tables you need and then you have to model it, you might do some clean up. Ultimately, I’m thinking appear some less. Everything is in one environment. You don’t have to go to multiple environments or use sort of multiple tools. I can have an SSIS package write on some table and your data warehouse that SSRS pulls from. You could theoretically do it all in one product. A lot of companies I guess are using this as sort of replacement for data warehouse. You could use Qlik as a replacement for data warehouse.

Carlos: Right, again, that self service model, just suck the data in let the tool figure it out and do some of those hard things for you; totally understandable.

Steve: So then with that if you’ve got, I mean there’s a listener out there who’s a DBA, or BI developer, or someone who wants to try it out for the first time. Is there a developer or trial edition or something like that that they could try out as a proof of concept?

Michael: Yes. Everyone can go to qlik.com and can download QlikView or QlikSense and it’s a fully functional, no limitations product. What you can’t do is I can’t create an application and send it to you Steve or Carlos and you guys open it up on yours and have it to be working, so that the limitation. But on your desktop it’s fully functional.

Steve: Ok, interesting. So then let’s just explore that comment you made about if you create an application and send it. So if you build an application in the paid version or the full featured version, you create an application that something you want to share with Carlos and I for instance, and would that application then contain a copy of all the data at the point you built it and you ship it off to us or would it be something where we would have to be on the same network with access to the SQL server and just be querying the data as it’s needed?

Michael: Good question. Again, we get back to both products, so I’m not aware of if I create a QlikView application I can share it with my organization because I put on the publisher server and then we have security and then it can be shared, or it can be accessed to any groups whatever your permissions are. I’m not aware of being able to share that with you. There might be some external phasing option but I’m not aware of it.

Carlos: So you think that should be a kind of shared repository, right, because even though the data is “in-memory” it could probably just bundled that up.

Steve: So when you talk about sharing then it’s really internal sharing inside of your organization?

Michael: With QlikView it is internal. With QlikSense they have QlikSense cloud and they have a business model which is more from enterprise side. So if I create an app in QlikSense I can upload it to my QlikSense cloud and I could invite you and Carlos to view it or to utilize it and you could, if you have QlikSense on your desktops or had an account on QlikSense cloud you could use it.

Steve: Ok, so then do you see Power BI being a direct competitor of this sort of they do different things?

Michael: They are competitor. They end up at a similar point just in a different way and I’m not super versed in Power BI. That’s one of the things on my list sort of like Tableau which is the other competitor. So I need to spend some time and just understand the differences so I’m not super versed in the Power BI stuff.

Carlos: Well, I think that answers your question. If Tableau is a competitor then Power BI is definitely the mix.

Michael: Absolutely, oh yes.

Carlos:I think there are different comparisons from Tableau to Power BI, probably with that strength, the association strength and again also just coming from Power BI guru but while it definitely suck in that data and you can play with it. We’ve talk with some of the folks about doing like mobile reporting you definitely want to limit some of that data it sounds like QlikView is a little but more robust in that sense. But yeah, I mean I would think that it’s kind of what tool other people want to get comfortable with and does kind of suit their needs.

Michael: Right, and it ultimately boils down to a culture is what I found if you have a, do I say, younger culture that is open to different delivery methods then it’s more accepted I think. I think when you have an older culture they go, “Wow, that’s really cool. Can you email me that every morning?” “Yes, we can.” But that’s not the point of it. The point is for you to go and you discover on your own to click around, no pun intended, but click around and make the associations and discover things.

Carlos: Got you. So that’s an interesting little tidbit you brought up. So you can schedule reports and kind of send snapshots if you will of, “Hey, here’s your data.”

Michael: Yes. They have a product called Inprinting and they purchased it probably about a
year ago and then having incorporated it. So that’s where you schedule, so you might have an entire application that may have 50 objects in it and you might schedule everyday at 8:00 AM at least five people get these three objects emailed to them. So there’s that functionality that’s built in to the products.

Steve: Okay, so on your comparison to Tableau you mentioned earlier, I know one of the things I’ve seen with Tableau is it’s often driven from the sort of a business side of an organization rather than the IT side of the organization, and do you see the same thing happening with Qlik where it would be somebody purchase it to analyze your data and then it sort of ends up in IT’s lap later?

Michael: In my n equals one experience, we say hey we have this tool that can do all these great things here are all the bells and muscles, and I go yes we want that and we want to do an application around, say sepsis. So then it comes back to IT, so in the Qlik view model, it all comes back I guess to IT, for your developers to create the solution. whereas they’re moving towards the QlikSense model, which is self-service. So still IT’s involved and we’re creating the master items that, you know there might be 50, could be 200, but users can then go in and drag and drop to the various bar charts or pie charts, scatter plots, whatever it might be and then we have predefined, agreed upon governed measures. So it’s an iterative process and it takes time to build a library like that, but that’s the ultimate goal.

Carlos: Give us a quick overview of the architecture. We’ve talked about it a little bit, right? Just from our components perspective, does it, do I have a Qlik view server and I connect to my SQL server and then it does the rest of it so like the matching of the data and the UI components. Give us an overview of that architecture.

Michael: Ok, so what I would do on my local PC, I would connect my data sources and I would bring the data in locally. I would develop a full application so the UI and that would be the modelling, and the ETL, the modelling, the cleanup, the UI, everything. And then what I would do is I would then pour that application to the publishing server and that exposes it out on our internal network and then those folks that have the correct AD credentials can see it, so that’s sort of security model, one of the security models with it. In QlikSense we have not purchased on enterprise level and I imagine it’s a similar concept. One of the things with QlikView is if I expose a QlikView application the end users can only consume it but that’s only they can do. Now in QlikSense there is the ability to basically download my application and they could create their own tabs or dashboards or what not. Sort of two products, the old school and the new school and they’re sort of morphing towards each other.

Carlos: Last question here, from a job perspective or a demand perspective, you mentioned you actually transitioning as we record, as well as your transitioning but it comes out you will be at your new location, you’ve put in a couple of years into learning this and you mentioned that you’re small mid-western town, are there opportunities or you see, when you decided to make the transition here, are you seeing opportunities you think this is still kind of a growth area?

Michael: It’s still a growth area but there are a lot of opportunities and I only said that based on the emails I received and it’s a lot of contract opportunities and then there, right exactly that’s where people reach out, so there is based on my emails there’s demand or normally QlikSense because that’s kind of what folks are pushing nowadays because it’s a new product. And one of the difference is where I mentioned earlier was QlikView you kind of build it and you design it for a resolution and it can be consumed on a mobile device but QlikSense is all HTML 5 and so I build an app and it renders in sizes whether on my laptop or on a iPhone,
on a iPad, doesn’t matter it’s all that.

Carlos: Yeah, it’s responsive?

Michael: Yes and so what Qlik has done is they build a base product which I keep brooding upon but they expose their API’s and so now there is a market growing for folks creating extensions. There’s certain kind of a chart that maybe QlikSense doesn’t come without a box, there’s a big market, boutique market if you will springing at creating different extensions.

Carlos: Interesting, now we should know and again, obviously you’re moving to a new place we can’t speak to that necessarily just yet but at least in your current experience and some of the discussions you’ve had as a Qlik developer it sounds like very similar to the SQL side while you can have some reporting experience there is still some administration components associated with it or I guess I should say in essence the database administration and the reporting are still kind of under the same umbrella or responsibility. Is that fair?

Michael: In my case, yes, because we’re a small shop so there are multiple hats. But I can’t say if that’s the norm. I’ve said there are larger shops that obviously might have a more definitive separation of duties. We’re like, “Okay, here, you know we love you and then SQL server here, DB data reader there, suck it all in and you want that into your app because you store those QVDs or data files compress on your sever not mine.

Steve: So from the perspective of the DBA who’s responsible for that server that all the data that is being pulled from are there any gotchas or anything that they should really be aware of as they’re allowing access to that data to be used through Qlik?

Michael: Yes and we learned this the hard way. So being both the DBA and the QlikView developer, we had a lot of projects we’re doing Agile methodology and so we’re relatively quickly cranking our applications, so everyone or the two of us in a vacuum created our own queries back to database. So what DBA’s should be aware of is the same queries coming back, so both of us might be pulling the last 30 days of orders which is bad in the sense of rather than doing a full sort of data warehouse model where every night there is a separate job that pulls in the changes in the last 24 hours and then we’re hitting that to pull our information. So that’s one thing I would sort of caution or with DBAs is to get that sort of governance and make sure it’s only pulling once.

Steve: So does that means that the different developers just have to sort out the data that’s being pulled in using the same data that’s there rather than writing their own queries to get it? Or is it more complex for the Qlik developer than that?

Michael: it really depends, so there’s nothing stopping each developer from pulling their own set of data in a perfect world you would have a governed set of data. So for example you might all agree that, okay here are the 30 fields that constitute patient data of any patient and so every night we’re going to go out and pull the changes into this one QVD which is our QlikView data file and then, “Hey, developers you’re going to write your applications against these QVD’s. You’re not going to hit the server, the SQL server 24 times, 20 times you’re all going to pull from this agreed upon governed set of data.” Didn’t go either way like currently we’re all just pulling what we need when we need it and it took contention at times.

Steve: Alright good stuff there.

Carlos: Alright, cool, I guess last words on Qlik.

Michael: I would say just like to give an example sort of that DBA’s might able to relate to. So you can create an app that goes out and pulls backup file and back up set, let say you look at the average size of the back up so what not. Well, back to our modelling so when you bring those two tables in they have about six or seven columns in common so as a QlikView developer what I would do is I would go in and rename the columns in one table so there’s one column that they have in common that’s sort of a key to this all, so like your back up set ID. So we do that and then I have unique data that’s joined by one column and then I start creating my UI pieces, so for example I have a bar chart, so what’s the average back up size for every database and creates a bar chart. Now, I can create what are called list boxes which is just a unique list of values, so for example database, gives me all the unique database names and then in our case there might be different compatibility modes. So I’ll have a list box for compatibility modes and so through the GUI or through the application I might say okay list all the databases descending and the upper size but how many are 100 compatibility level, so once I click 100 I can visually see all two of my database on this server or three of them are at compatibility level 100. So the other ones which are grey which means they are not associated, well they are not at that level, so even I may not have known all that got different compatibly levels but through a tool like everything is associated, again green, associated white and then grey is not associated. I can visually see that might lead me down to another path to ask more questions. That’s just one example that I tried created the sequel space that DBAs might be familiar with.

Steve: Yeah, I think that’s a great example on something that I’m familiar with and it makes total sense to that point. But do you find that DBA actually end up using Qlik to do that type of work or is that just more of a theoretical type example?

Michael: It’s probably more theoretical, I again n equals one unfortunately is my sample pull, so I use it for things like that just to kind of okay I’m going to quickly throw together these 2,3,4,5 tables, and I’m going to look for something, and then nine times out of ten I’m looking for something, I discover something else. So in the example I gave you another list box could be what type of backup was it, was it date or a log and I may click a database to see what it’s average backup size is and the L may turn gray and that tells me there’s been no log backups on this. Now there are other methods to figure that out but just you know you can leverage visually and go, “Oh wow I didn’t know that” then you take some action.

Carlos: I think what’s cool about that is and again we all have our preferred ways to go about that but if you have that skill and the tools there it will be very easy to go around and start like, “Let me start asking some of these questions that maybe I wouldn’t be asking because I have to go figure out what a column name is”, or whatever. I could just go in and start picking at it without having to read a lot of documentation or figure that out afterwards once I start seeing some correlations or whatever.

Michael: Right, so there’s definitely a learning curve in the syntax which is not difficult but you know there’ll be a learning curve but I want you to be comfortable with that, you can just bring data in. So for example, I created a, so Meditech is a old school 60’s, 70’s programming application but they put all the data every night to a SQL server which is the Meditech Data Repository. So a lot of the times the analyst who know Meditech inside and out don’t know SQL server. Now, I know SQL server but I don’t know Meditech inside and out, so there are tables within Meditech that map columns to tables within Meditech and can give you hints about where it came from. So I build an application off of two tables that brings in and maps them so if I want to search for a module like I want to see all the lab modules. Well in my application just click lab and it shows me all the lab tables and then if someone says it has to do with admissions or admitting time, well okay I can search admit and it brings columns or tables that share that name and then I can use that to sort of narrow down where to find data on the SQL server that they’re seeing in the application. It’s kind of a data dictionary if you will. It’s probably better with visuals.

Steve: Alright, any last thoughts as we wrap it up?

Michael: No, I just SQL server DBA, side base DBA for many years and stumbled across this and saw the power and once you get past couple months of the learning curve, being able to leverage the Qlik either View or Sense solutions to visualize data even just one person, a lot longer with entire organization that is all ordering up the same menu if you will and see the data the same way and gives them the ability to ask their own questions in QlikSense.

Carlos: Awesome, should we do SQL family?

Michael: Absolutely.

Steve: So Michael how did you first get started using SQL server?

Michael: I was the typical DBA but I started in the side base world and did that for a couple of years and notice my other partners all had many more years of experience than me. So I had an opportunity at a company that was on Microsoft SQL Server 7.0 before it went public so we were sort of a beta site and thought, “Alright, that’s it easy transfer skills that’s relatively the same just with the GUI on top”, and made the move and been in SQL Server ever since 1999.

Carlos: Now if you could change one thing about SQL Server what would it be?

Michael: This was the toughest one. Nothing really major I guess like some minor things that I would probably change. Sometimes it irritates me when the different either size or time increments are in different values milliseconds, seconds, gigabytes, megabytes, kilobytes and just hadn’t do all the conversions gets to be annoying at times.

Carlos: Oh, got you. You’re being a little bit standardizing the some of the reporting data.

Michael: Correct, here’s the formula I have to use on every size field because they’re all in kilobytes. And then the one today we had an issue at work today with some deadlocks and keep going into the SQL Server through the GUI, went into the log and soon as you expand the error log it pops up and defaults to the SQL Server agent log, so you have to uncheck that and check the current SQL Server log, so just a little you know, no shows stoppers, but these kind of annoying little things.

Steve: What is the best piece of career advice that you’ve ever received?

Michael: I had a CIO at a company years ago he was Marine Corps reservist who since gone to retire as a General but point being is that he said that, “No one will care about your career more than you”. So it’s up to you to advance your career to whatever level you want it to be.

Steve: I like that, so true.

Carlos: Michael our last question for you today, if you could have one superhero power what would it be and why do you want it?

Michael: I’ll probably go vanilla here and say to fly.

Steve: Don’t be ashamed. Flying is a great superhero power.

Michael: Predicting the future and all that stuff like already cool but I would say the flying. I do a fair amount of driving and so it would be nice to get to places I grow up.
Carlos: There you go.

Steve: Alright, very good.

Carlos: Well awesome, Michael. Thank you so much for being on the show today.

Michael: Thank you for having me. I appreciate it and hope I have provided value to the companeros out there.

Carlos: Hey, companeros. That’s great! Well, if you didn’t comapneros, you let us know, I will let Michael know. That would be very nice.

Michael: Absolutely.

Steve: I know I learned something along the way, thanks, Michael.

Michael: Thank you guys! I appreciate it.