Episode 81 Indexes Part I

Part of what we like to do on podcast is to present topics and discuss different ways you might use features. In this episode, we go back to basics and discuss indexes. Like most topics, we made sure to explain all the important aspects. Because this is a such broad topic, we decided to carry this over into our next episode as well. Using the phone book as an example, we chat with our guest Randolph West and explain heaps, non-clustered indexes, and clustered indexes. We also explain best practices to use them when creating new tables.

We didn’t forget to cover the problems you might run into. For example, you might not think about indexes until your data grows very large. Sure, you can attempt to resolve the problem by moving to SSD’s and increasing memory, but this will push the issue further in the future. In this episode, we will discuss considerations for indexes and why database architecture is so important.

We would like to hear your opinions. What are your strategies when using indexes? Use the hashtag #sqlpodcast and let us know!

 Episode Quote

“The clustered index should be chosen very carefully because your entire table is going to be sorted according to that column or inside of columns and what you’re looking for is a unique value for every row.“

“All that an SSD is doing is making your problems go away until a later time. The problem is still there. It’s just being hidden by the fact that you’ve got faster drives. In fact, the same goes for adding more memory. It’s just that happens to be cheaper than trying to fix your indexes.”

Listen to Learn

  • Heaps and indexes types explained
  • Randolph’s thoughts on clustered indexes vs. non-clustered indexes
  • Similarity between clustered indexes and primary keys
  • The results of using Management Studio Designer for creating new tables
  • How to create covering indexes
  • The negatives of having too many indexes and what you can do about it
  • How are default fill factors affecting indexes? Is faster storage good reason to avoid fixing the indexes?

Randolph on IMDB
SQL Skills article on BTree
Randolph’s Blog
Randolph on Twitter
Clustered and Non-clustered indexes described
Heaps
Creating Indexes with Included Columns

About Randolph West

IndexesRandolph West solves technology problems with a focus on SQL Server and C#. He is a Microsoft Data Platform MVP who has worked with SQL Server since the late 1990s. When not consulting, he can be seen acting on the stage and screen or doing voices for independent video games.

 

Transcription: Indexes Part 1

Carlos: Ok, so let’s, I guess we’ll go ahead and kick it off. So Randolph, welcome to the show.

Randolph: Thank you very much for having me.

Carlos: Yeah, it’s great. I know that you and Steve have done a little bit of work together and some of your adventures have come up from time to time as we’ve talked about it, and so it’s great to have you here on the show with us today.

Randolph: Well, thank you! It’s been great fun working with Steve. He is a consummate professional, and I would hire him. I’m just saying.

Carlos: There you go.

Steve: And technically I would say I would hire you too because we’ve done that on projects.

Randolph: Yes we have.

Steve: I think we first met on the Database Corruption Challenge in 2015.

Randolph: Has been it where.

Steve: Oh yeah, yes, and then, yeah you did an amazing job on some of the corruption we worked on there. And then you and I have worked on fixing a couple of corruption issues for clients since then, and then we also worked together on another project sort of a longer term disaster recovery moving to Azure type project for our client.

Carlos: So today our topic is indexes and now this can be a pretty broad topic and just some of our pre-planning we thought, “Gosh, there’s no way we’re going to cover it all.” Some of the components are just a little complex to potentially discuss all of them over an audio format. So some of that will be pointing to our show notes and pointing people to additional documentation there. But let’s go ahead, I guess let’s start with that conversation and kind of maybe even defining an index and how that works. Now I think, so the first one, so we talked about indexes, we’re talking about copies of data, the way our data is structured, right. The very first example we have of that is a heap. Basically, a heap is a table with no indexes on it so there’s no structure or order into the way that the data is saved. Obviously, you have columns, right. You can have data types, and all those kind of constraints but the way the data is actually saved is not part of the definition if you will of the table. That makes it a heap.

Steve: Yup, so then really from there we take a look at how do we get to the data faster because with a heap to find any specific piece of data you’ve really got to search through the entire list, or the entire table and look at every single row. And say, “Is this what I’m looking for? Is that what I’m looking for?” Eventually after you’ve looked through the entire pile, heap in this case you come back with the result. But another way to this is by reordering some of the data. The way I like to think of it is sort of like a phonebook or a phone list. And I know growing up I’m used to seeing the yellow pages and the white pages. The yellow pages were the business listing and the white pages were the listing of every person and business with a phone number organized by name. Now, if you were just to take everyone on that phone list and just put them in a random order perhaps in the order that they signed up or purchased their phone service. That’s sort of the equivalent of a heap. There’s no specific order to it that’s useful for searching on it unless of course maybe you’re searching on the order that they got their phone service installed. But it would make it really impossible for anyone using that phonebook if they were just in that order, just random order basically to find anybodies phone number. So what they’ve done with the white pages is they’ve organized all of the people ordered by last name, then by, when there’s multiple people having the same last name you have the first name order, and when there’s multiple people with the first name you have the middle initial order. And that’s really the equivalent of creating a clustered index on last name, first name and middle initial. What that effectively does is it orders the entire table or phonebook in this case by those items so that when you needed to do a lookup and you’re looking for someone who’s name is Smith, you don’t have to start through the As and go all the way through to the Zs and confirm you found all the Smiths. You can just jump to the S section and then track down where Smith is from there really quickly. So then we look at the yellow pages, and the yellow pages are the business listings where the business is organized by business type. So if you’re looking for an accountant you can jump to the A section, find the section where all the accountants are and then lookup who all the accountants that are listed in the yellow pages. But if you’re looking for an exterminator, you need to go to the E section for exterminators and then you can find the list there. Now, with the yellow pages, it’s sort of like a filtered, non-clustered index in that it’s a copy of data that exist in the yellow pages, sorry it’s a copy of a data that exist in the white pages but it’s ordered differently and the reason it’s filtered is that it only includes businesses. So we’re kind of throwing two things in there, one it’s a non-clustered index and then it’s a copy of the data and two it’s filtered because just people have been thrown out and only businesses are shown.

Carlos: One thing that’s kind of important there is that that filter won’t necessarily change, right. That category of business, you’re not going to take that out of your system. While you can potentially filter on dates, it is a specific date. You’re not going to be like keeping an average for example of 90 days in an index. That wouldn’t be a good use of an index. But if you only wanted to see like from example, the beginning of 2017 forward then that might be an option, something like that.

Steve: Right, yup. So then next if you had what’s often referred to as a reverse phone directory. Typically not included with you white pages or yellow pages but it would be ordered by phone number so you can look up a name. That would be the equivalent of a non-clustered index or a copy of some of the data that’s been ordered in a different way to make it quicker to find something. So if you want to find out who has the phone number at 555-1212 well you can look it up by the phone number and then find out their name. And that would be just a plain old non-clustered index with no filter in that case because it would have the entire list. Not sure if the yellow pages and white pages are international or available in all countries but hopefully everyone has sort of understood where we’re going with that as an example on how those indexes work.

Randolph: Yeah, that makes sense.

Carlos: So let’s see, Randolph, what about thoughts on clustered indexes vs. non-clustered indexes?

Randolph: Well, the clustered index should be chosen very carefully because your entire table is going to be sorted according to that column or inside of columns and what you’re looking for is a unique value for every row. And you want to make sure that the clustering key is narrow so that it doesn’t take of that much space because that clustering key is really used in non-clustered indexes. Because you have to  have a way for that copy of the data to match back to the original table so the way that the database works is it stores the clustering key, the clustered index inside the non-clustered index so that it can match back. Your clustered index, your clustering key has to be narrow. It has to be unique preferably and ever increasing. What you don’t want is a random value because for a number of reasons, which we can get into later. A random value causes something called page splits and it kills performance, and it kills the way the storage mechanism which just makes things ugly. So you want a number that is ever increasing like an integer or a big integer if you’re going to have more than 2 or 4 billion rows. It’s just you have to choose carefully.

Carlos: One of the things I like there about that idea, I guess we can kind of jump in there, so you mentioned page splits. Even Steve using the analogy of the phone book in a real and almost, I guess it’s an example. But if you think about taking pieces of paper right, and then basically writing on them and filling that up. That’s kind of what the way that SQL Server works and it has and these pages at least in SQL Server are 8K chunks right, so it has to sort them. When you talk about that sorting and the clustered key, clustered index rather it’s going to store them in that scenario and I think to your point is that when you’re doing that in kind of a random order it has to resort that every time, and those pages can get out of whack because, then all of a sudden I’ve got to fit in data that wasn’t there originally.

Randolph: Exactly, so what happens there is you have to wait for the index to be resorted and that can take time. It can kill your performance badly, so that’s why we try and stir clear of wide random columns for a clustering key or a clustered index. For example, GUID columns are not a good choice for a clustering key. That doesn’t mean that will make a bad index but they would make a bad clustered index because remember the entire table is sorted according to that index because the clustering key is how you’re data is sorted and that is your actual data. It’s not a copy of the data.

Carlos: So it’s worth asking at this point, right, to maybe like a small little poll here. So we talk about non-clustered indexes and then which are separate and different from but very, they kind of go hand in hand with primary keys most of the time, right?

Randolph: There’s an interesting conversation here. Microsoft for I think a good reason, it wasn’t the right choice but a good reason, decided that if you create a primary key inside the Management Studio Designer and in fact even before there in previous versions of the tool Enterprise Manager I think was called in the old days. If you create a primary key on a table it will automatically make that primary key a clustered index. I think the reason for that default was because they wanted you to have a clustered index at least. And because the primary key is unique by definition it made sense from a designer point of view and for people who are not intimately familiar with internals to have at least a unique value as your primary key and also as your clustering key. It makes sense from that point of view but it can run into trouble because let’s face it a GUID is unique. It’s a very good candidate for a primary key but it may not be a good candidate for a clustered index. In fact, it isn’t. There are reasons why you would not want to use the designer on particular tables. In fact, I prefer not to use the designer anymore when I’m creating tables because it has a bunch of defaults that may not be appropriate for the design that I’m implementing. It doesn’t mean I don’t use the designer. In fact, I use it a lot for creating relationships and stuff because it’s easier. But in terms of clustered indexes and primary keys the designer will make that the same set of columns so it’s something to watch out for.

Carlos: Well, now, having said that, kind of from a poll perspective. What percentage of time are your clustered indexes different from your primary key?

Randolph: Well in my case, 1%.

Steve: Yeah, I think I would put that really into two camps. One of them is if it’s a new database that I’m architecting from scratch or new tables that I’m building out where I have control over it. Oftentimes they may end up being different, the clustered index versus the primary key. But if it’s a, I don’t know, third-party application database that I’m jumping in to do some maintenance on or something. Like 99.9% of the time those are the same.

Carlos: Ok, and so I think, that’s been my experience as well most of the time, right, so that 99.9% of the time. It’s pretty much like when I’ve identified an issue or they have for an instance a GUID as their primary key and not in the clustered index, then we decide, “Hey, maybe we should make change here.” I got into a little bit of, so I’ve got some feedback. I’ve just put out this book “Zero to SQL”. And one of the things, actually I put a sentence in there. I said, when talking about non-clustered indexes, I say, “We don’t have to include the primary key of the table in the non-clustered index because this is automatically gets included in the index.” Now, somebody kind of took me to task, and they’re ultimately correct in the sense that, it’s the primary key that gets added, it’s the clustered index that gets added. I just thought they’re just curious I’m like how many times is your primary key not your clustered index. That’s kind of where I was going with that.

Randolph: Yeah, we’re designing from scratches as Steve said. You might have a reason for that.

Steve: So I guess you has just gone into a little bit talking about how you said in the book something about not putting the same columns that are in the primary key into your non-clustered indexes because they are included there as well. What was the feedback you got on that?

Carlos: So technically correct. It’s not the primary key that gets added, it’s the clustered index that gets added to the non-clustered index.

Randolph: And in fact there is a little bit deeper depth into that is it gets into the leaf node guaranteed. But if it’s a unique index then it’s not in the intermediate levels if that matters.

Carlos: Ok, and so here we start talking about and this is part of the index that will be a little bit hard to describe over the audio portion. But one of the things we kind of came up was that the idea of the game of the $10,000 pyramid or even pyramids, the card game, if your familiar with that. And that idea that you have a structured top and the data kind of goes the pyramid format, and that’s one of the things that Randolph mentions there. So they have leaf nodes and other types of names if you will for some of those sections so that as SQL Server is looking through the data it can find, do I need to keep going or not.         

Steve: Yup, so really there when the index is being used, it’s that pyramid or like a tree type structure. And it uses that to be able to quickly jump to the pages that you’re looking for rather than scanning every leaf node. It goes through that tree to get to the data quickly. Randolph, did you want to talk about the B-Tree and how that part works?

Randolph: Well, it’s difficult to speak about with just audio. But if you think of the top of the pyramid there is only ever one page there and that is called the root node. Only ever one page and it’s also an 8K page. And it would be a type of page instead of being a normal data page. It’s called an index page. But otherwise it’s exactly the same as a table structure or a clustered index because remember your clustered index is your data. So your non-clustered index and your clustered index have a similar structure. You’ve got the single page at the top and then inside that page are references to other pages. And those pages contain the index information. So if you need to go to Stedman, you would look on the root page for where the S starts and then tell you which page to should go to where the actual data is and then inside there it will be able to find the data much quicker than having to read the entire table to find Steve Stedman’s number in there.

Carlos: Almost like an index of sorts, right, like a database.

Randolph: It’s exactly what it is, right? It’s just an implementation detail of exactly what you’re talking about. You flip to the back of the book, you look for S, there’s Stedman and you know which page to jump to. That’s exactly what’s happening.

Steve: So then one of the differences with that is the amount of time it takes to get there. Is it if you’re going through a heap or a clustered index that isn’t clustered on last name and you’re searching for me Stedman. If there’s a million pages in your table you have to touch all million of those to check and see, to find the rows with my last name on it. But with the index in that tree structure there you might have to read two, or three, or four pages depending on how deep that index is to get to the S section, and then scan through just a few pages once you’re there to find Stedman. And the difference is you’re touching maybe 8, maybe 10 pages rather than.

Randolph: Yeah, tends a lot for a seek. What you’ve just described, if you’re checking the entire white pages or the entire index that’s called a “scan”, and if you’re just looking for a particular record because you know how to get there through the index then that is called a “seek”.  There’s a whole conversation here about which is better? The answer is none of them because it depends on what you’re doing. If you need 10,000 records scan is much more efficient than if you just need one. So a seek would be more efficient if you’re just looking for one or two records. But if you’re looking for a million out of your 10 million then a scan would be much more efficient because it would only need to read fewer pages in that respect.

Steve: Yup, good point there. I think that’s one of those that it comes back to really what is it you’re trying to do. Sometimes the query optimizer has to guess at what is it you’re attempting to do with your query. Usually it gets it pretty close but sometimes it’s not quite right and you end up with a seek when a scan may have been better and vice versa.

Randolph: Yeah, and that gets into conversations about are your indexes correct? Are they stale? Do they need to be updated? Do they need to be maintained? There’s a whole bunch of stuff here that we can talk about. There’s statistics as well.

Carlos: I guess we can jump into some of that. Before we do that I think one of those ideas is, are my indexes good enough? Because the optimizer can do some different things based on the information that it has, right? And then there’s this idea of a covering index. Take us through that.

Randolph: What happens there is we spoke about the structure of the index briefly were you’ve got your root node, you’ve got your leaf node where your data is, and then there’s this intermediate levels. What happens with an included column is that if you have a table where you need to bring back the last name, the first name, the telephone number but there’s a whole bunch of other data in there as well then your index would only need to contain those three columns. But let’s say that some of your queries had happen quite often would also require a middle name. You could add the middle name as an included column. That middle name would appear in the leaf level so the bottom level of the index. But it wouldn’t appear in the intermediate levels and that’s called an included column. Your index doesn’t have to be huge if it had four columns because you wouldn’t be using it for all of your queries but it would be a suitable covering query for the one that requires the middle name. I hope I haven’t confused everybody by saying it that way.

Steve: Just a little bit more detail on the covering index because sometimes when I talked about covering index is people say, “Oh, how do I create that? Do I create, is that a different create statement than a non-clustered or clustered index?” And the answer to that is really, No. A covering index is simply an index where all of the data that you’re asking for in your query happens to be in that non-clustered index. It can return it all back to the query without ever touching the heap or the clustered index. But if it’s not covering, what that means is it uses that non-clustered index to find the rows that you need, and then it jumps back to the heap or the clustered index to bring in the data that you’re looking for. So that last step to jump to the heap or non-clustered index that skipped in the covering index and that can save quite a bit of time too.

Randolph: Yup.

Carlos: And that’s in your execution plans you’ll see that lookup, right. And that’s what’s doing. It is basically, I don’t have the information I need so I need to jump and grab it, right. So whether you should then create a “covering index” will depend on how many times a query gets run, performance to that query and all those kind of things.

Randolph: And how many columns there are in the lookup and all sorts of stuff. Yeah, it’s an art. It’s not a science.

Carlos: Right.

Steve: So then with that when you’re looking at the query plan and there is a suggested index. And I see this quite often where it says suggested index with two or three columns and then include, and it includes like 25 columns in it. That kind of thing would certainly give you a covering index but what are your thoughts when SQL Server is suggesting those types of indexes?

Randolph: Don’t do that.

Steve: And why?

Randolph: Ok so, when SQL recommends and index, a missing index, it is only specific to that query. If we know anything about query plans you can add a space to a query and it will suddenly be a different query plan maybe. Don’t just create missing indexes as they come up. What you need to do is to evaluate against other missing indexes or other recommendations for indexes on that table. And maybe combine the different recommendations into one or two new indexes or modifying existing indexes that try and accommodate as many of those recommendations as possible. An index that includes all of the columns of the table you might as well just be creating the clustered index. It’s just not efficient.

Steve: So one of the things that I see oftentimes when I’m looking at performance tuning and you’re looking at maybe indexes that are inefficient or not used. Often times you see them names something like missing index name which is a clue that somebody just copied and pasted from the suggested missing index and run it then they end up with an index that may not be really what they’re really looking for there.

Randolph: In fact, this goes back to Pinal’s interview that you did in December where he said his reference point is, “Do you have more than ten indexes on a table.” He’s going to start looking into that and see why. Because you can have on a legacy system or even a vendor system, third-party that you’re not allowed to touch. People have over the years gone and added indexes because it helps with the particular query. You can end up with duplicate indexes or indexes that are only different by say an included column. You’ve got all of that maintenance that’s going into keeping these indexes up to date. And statistics and all other stuffs that goes with that when in fact you can consolidate and reduce your indexes quite easily by just analyzing the system and how those queries are running.

Steve: Ok, so then let’s say I’ve got too many indexes like you’re talking about there and some of them are not efficient, or they’re not being used. They may help someday with some query but what are the negatives to having too many indexes specifically what’s going to slow down when you have too many of those?

Randolph: Well, there are two things that I would like to address. The first things is, if this index is useful once a month then create it once a month then drop it again. If you’ve got a reporting style query that has to run once month then generates a report then without the index it would take 10x as long, go ahead and build that index just before the report runs and then afterwards drop it because it’s not in use 28 days of the month or 29 days of the month. There’s nothing wrong with you creating and dropping index when you need it as long as it doesn’t obviously impact business hours. That’s the first thing. The second thing to answer the question is every time you modify the table it has to make sure that the indexes are up to date. So if you’ve got 10 indexes or 15 indexes on a table it’s going to have to, every time you insert or update a row, or delete in fact which is the same as an update because it’s changing the data you’ve got to make sure that that row is replicated or the change of the row is replicated in each one of those indexes. So it has to lock the table and it has to lock the indexes while it’s updating them and that can cause performance problems in of itself. It can also cause problems because you are locking data so you cannot make any changes, you cannot select from that unless you start using tricks that you shouldn’t be using query hence which I’m not even going to mention. Because you have too many indexes and you’re trying to satisfy a possibility that might happen in the future when in fact you’ve got statistics that tell you, DMVs that tell you exactly what’s going on with your database. Use the information available. It’s not SQL 2000 anymore. We have better ways of analyzing how your queries are running.

Steve: Ok, so then one of the things that often comes up, and it’s highly debatable, I don’t know, in places that I’ve been recently is fill factor. And the concept of what is a GUID default fill factor setting and what does that really mean to your indexes?

Randolph: Ok well, fill factor is in theory is about leaving enough space in the data pages themselves so that if you do need to add columns and it is a random type of column like we spoke about before with GUIDs. It leaves space in the data page so that it doesn’t have to split as often and it can go and insert data into those pages without having to resort the entire index every time. That’s kind of what’s it’s about. Do you want to add anything on that definition?

Steve: Well, I was just thinking, I heard a debate awhile back of about someone arguing that with SSDs and faster storage that your fill factor changes over spinning storage. I think that, I don’t know if I entirely agree with that. 

Randolph: I completely disagree because all that an SSD is doing is making your problems go away until a later time. The problem is still there. It’s just being hidden by the fact that you’ve got faster drives. In fact, the same goes for adding more memory. It’s just that happens to be cheaper than trying to fix your indexes.

Steve: Yup. Ok then, so with fill factor I think what we end up seeing oftentimes is that people have a default fill factor setting on their database. They think that that’s going to cover what’s ever they need there it’s going to be right for all occasions. But really it comes down to that fill factor needs to vary depending on the data type, the frequency of change, whether the data going in is sequential or ever growing, or whether it’s completely random. Fill factor really depends on the data.

Carlos: Do you have different fill factor settings based on your clustered vs. non-clustered indexes? Because that kind of would be a good sorting factor in the sense that in theory we talked about our clustered indexes. Again, assuming that they’re unique, that they’re ordered, that you shouldn’t be seeing as much fragmentation there because again they’re kind of in that order. Kind of going back to the idea of them being in the primary key or unique, ascending, keep going, identity is a good example of that. It’s the clustered index is that or going to be a little bit more random if you will, and you could see a little bit more shuffling.

Randolph: The magic word you use there was fragmentation. This used to be a huge problem as Steve mentioned with spinning drives. While I disagree with the fact that I’m not worrying about fill factor at all is ok.  I disagree, you should worry about fragmentation. You should worry about these things purely from the point of views of knowing that your database isn’t in a good condition is well maintained. At the same time though, index fragmentation with faster drives doesn’t matter as much as for example making sure that your statistics are up to date. In fact, I now say in my performance tuning session if you want to get the best benefit from your maintenance window. It is far better time spent by updating your statistics than to, for example, rebuild an entire index. Firstly, from an I/O point of view you need a size of data space for large tables if you’re going to be building an index rebuilt. That’s going to need, transaction log space is going to need space in the table itself, in the data file itself. It’s going to need a lot of resources, and memory, and all of those things. And if you’re replicating, it’s going to replicate all. It’s a big deal to rebuild a large table whereas a much more effective use of your time to do a statistics update. And the benefit from having an index rebuild versus your query plan being more effective fragmentation is less of a problem now. So from that point of view, Steve, yes I will agree with your statement before about fill factor But yeah, the one thing I wanted to mention on fragmentation is that there was a confusion a few years ago, fill factor of zero and fill factor of 100. What’s the difference? There is no difference. They’re exactly the same thing. Zero fill factor means there is zero space left in the page, fill factor of a hundred means it’s 100% used so it’s the same thing. I don’t know why they decided to distinguished but who knows it is Microsoft. A fill factor is a percentage from 1 to 100 so zero is a hundred. What you want to do there as you mentioned is it depends on the type of column you’re addressing. Less so of the type of index you’re using but more about the type of column. So a GUID as we spoken about about before is a random value so you would want to have a lower fill factor on that index for that type of data because then it reduces number of page splits which reduces the amount of fragmentation. However, you will still need to maintain that index from time to time to make sure that the fragmentation is reduced.

Steve: One of the interesting things I’ve come across there is that if you’re in a scenario where somebody has a GUID as their primary key and as their clustered index and it might be in a position where it’s too big in a given maintenance window to change that clustered index or too time consuming to change that. One of the work arounds that I found there is to use the new sequential ID rather than new ID so that you can turn GUIDs into sort of a pseudo ever incrementing value.

Randolph: Right, and that’s 16 bytes wide so.

Steve: Yup, it’s still big.

Randolph: It’s still big but it is sequential as you mentioned.

Steve: Yeah, it’s better than random at that point.

Randolph: It’s better than random but the reason people use GUIDs is usually from a development first perspective so the developers want to manage the index, sorry the primary key in the application as opposed to the database handling it. For number of legacy reasons that was a good idea 15 years ago. It’s not a good idea now. So there are a lot of habits and this is the way we do things happening from a development side that are not being, well the new stuff in SQL Server is not being adapted as effectively because there is 15 year old news on the Internet that says SQL is not good at this. The other thing is we have to educate people to say, listen there is a reason these RDBMS is in the top 3 in the world. There’s Oracle, there’s MySQL and there’s SQL Server, those are the top 3 RDBMSs on the world, Relational Database Management System. So SQL Server is in the top 3 for a reason. It’s a good product. It’s in fact one of the best products out there to handle your data so you should be using the newer features and the newer compatibility levels and things like that. And get away from the old habits of 15 years ago because they’re bad habits and things have gotten better in the last few years.

Steve: Yup, and that is oftentimes a challenging debate to have. And I found that to show examples or to come in and show what the performance difference is between one way of doing it versus the other way. It is a good way to show what’s better there. Ok, so then another one is talked about just the different options around indexing that we have is the concept of filtered indexes. And I know I threw this term out when we’re talking about the phonebook directory around the business names in the yellow pages. So filtered indexes, any thoughts on pros and cons and sort of how that works that you want to talk about here?

Randolph: Well, the idea of a filtered indexes is that if you have a very particular condition in your WHERE clause in your query. Let’s say that you want to grab all of the Stedman’s in the phonebook where there middle name equals J or starts with a J. If that WHERE condition happens often, then you could create a filtered index that uses that WHERE condition and then the index will only include those rows that satisfy that condition. The problem with the filtered index there is that if you use a parameter in your WHERE clause it’s not going to use the filtered index. So it’s solving a very specific problem in my opinion. It’s a good problem that it solves but it’s a very particular way of solving it. And you have to be very careful when you create your indexes and create your queries that you’re using them correctly.

Steve: Yup, so a real world example I came across there that filtered indexes were very handy was around a very large table and needing to purge data beyond a certain date. I found that when I just ran DELETE statement to delete where something that’s older than a given date it was forcing a large index again to find that. And it was like taking 40 minutes to delete a thousand rows. And then what I did instead was I went in and I added a filtered index and deleting one day at a time that was beyond the date range. I added a filtered index just for that one day. That filtered index gave me the shortcut to be able to delete a thousand rows at a time just split second like less than a second to delete those rather than extended amount of time to delete it. I found in that case I could add that filtered index do the work I needed to do and then drop it and then add a new one when I was working on a different day, and that was really a handy way to use it, to improve performance there.

Randolph: Yeah, that’s an excellent example of when a filtered index would make a lot of sense. Another example is if you have a very large table and you have a bit typed field where you’ve got a nullable values, it’s a nullable column, and you’ve only got a few of those rows that have a value of true or false and you could have an index, a filtered index on either of the null values or the populated values depending on what type of query you’re writing. And it can identify those columns or those rows that have that bit column set much more easily.

Steve: Yup, yup, another good example there.

Carlos: Exactly, yeah, that’s been my experience as well.

Steve: So then let’s jump a little bit more into the care and feeding side of things, and I know we touched a little bit on rebuild and a little bit on statistics but if we look at what are the general things that we needed to do to take care of indexes and keep them well structured on an ongoing basis?

Randolph: I’d be interested to hear your side because I’ve been speaking a lot. I’ll open if I had any feedback.

Steve: Ok, I mean, so really I think on that there’s the concept of reorganizing versus rebuilding as you get fragmentation over time. And then there’s statistics and I think that, or let me start with a very bad way of doing it that I see quite often that is if you go and use one of the default maintenance plans. And the maintenance plan you just go and click let’s rebuild my indexes, let’s reorganize my indexes, then let’s rebuilt statistics. One of the things you end up doing is rebuilding which basically rebuilds the entire index from scratch without, basically if you’re in Enterprise Edition you can do that on online mode but if you’re not in Enterprise Edition that’s an offline operation. But what the default maintenance plan people sometimes check is that they say, “Let’s rebuild them.” And that gives you the best structured index you’re probably going to end up with and it also rebuilds your statistics. And then they reorganized them and then they rebuilt statistics with like a 5% sampling or something like that. And in that example you end up with a whole lot of work that’s actually giving you sort of a negative approach or negative impact. Whereas if you just rebuilt them to begin with and then not done the reorg, and not done the statistics rebuild afterwards you would be in a lot better shape.

Randolph: So, you know, those defaults that you’re talking about, the reason that happens is that because the maintenance plan wizard has got rebuilt then reorg, then statistics in that order in the list.

Steve: Absolutely, yeah, people just click the checkboxes or select them and that’s what you get.

Randolph: So when I see that I blame Microsoft for that. That’s purely Microsoft’s fault because generally people here using the maintenance wizard are using it because they don’t know how to do it with say, Ola Hallengren or MinionWare tools so just to name two, there are many. So that’s Microsoft’s fault and every time I see that I do not blame the DBA, or the accidental DBA, or whoever is managing it. It’s not their fault. But it is causing a whole bunch of work that is unnecessary.

Steve: Yes, it may not be their fault but it’s their pain they have to deal with. When I see that I generally going a little bit of education to explain why that’s bad and turn those off. And like you said, replace it with MinionWare or Ola Hallengren scripts. Both which are far better solutions at doing at than the default maintenance plans.

Randolph: So the one thing that I wanted to mention, Steve, is that an index rebuild is, as I mentioned before is size of data operations. So if your table is 10GB and you need to rebuild it, if you’re doing a clustered index rebuild it’s going to require 10GB of space in the data file and then it’s going to require at least 10GB in the transaction log. And if you’re rebuilding a clustered index it is going to by default rebuild all of the non-clustered indexes on that table that referred to that clustered index and that means all of them. So if you’ve got a 10GB table and you’ve got 30GB of non-clustered indexes on that table it’s going to require at least 40GB in the data file and 40GB in the transaction log because it’s going to rebuild everything. So if you’ve got a maintenance plan that says let’s do a clustered index rebuild and then before that you had the non-clustered indexes rebuilding it’s going to do them again anyway so it’s a nightmare. This is why I say it is statistic updates more regularly are going to be a lot more effective than doing index rebuilds over the long term. You still do need index maintenance but let’s not do it every night.

Steve: And that’s where I like the third party solutions that we mentioned there for the ability to rebuild those where you have options to say given a certain level of fragmentation I want to choose to reorganize this or if it’s even more fragmented then we’re going to choose to rebuild it. And some of them even have the option of only do this for a certain amount of time.

Randolph: Can you go into detail what’s the difference between a rebuild and a reorg is?

Steve: Ok, so with the rebuild it’s basically, technically it’s like the equivalent if you drop the index then recreate it, and then it’s going to be completely rebuilt from scratch and it will take up the log and disk I/O like you talked about a moment ago. But with the reorg, what that’s going to do is going to the index and basically restructure or compact the index pages as needed. It will go through and shuffle them around to compact them down to a smaller amount of space. And it will do that page by page as it goes through there and that can be done while things are online in Standard Edition. But it’s slow and the end outcome is not as good as a complete rebuild. It might get you real close but it’s not going to be good as complete rebuild but the benefit is you can keep it online.

Randolph: And you still need to update the statistics after you’ve done a reorganization which is something that people tend to forget as well. The statistics need to be up to date so that the query optimizer knows where the data is and how much of it is in there, and what the distribution of the data is in that index so that it can make the right choice about what type of query to create because a lot of queries is coming down to SORTS and JOINS and that’s where you find your biggest performance bottlenecks. A SORT is a blocking operation so everything in the query has to stop until the sort is finished. So if your index is up to date and it’s sorted correctly then it doesn’t have to do that operation it’s already done. Whole bunch of things about loops, we can talk about here as well which I’m not going to. An up to date statistic is much more valuable to the query optimizer than a defragmented index.

Carlos: There’s another point there even that, we’ve talked about current feeding of the indexes, right, rebuilds vs. reorgs. However, we can also have statistics that are outside of the indexes, right, on the table. Those are also providing feedback to SQL Server to the optimizer. And if we neglect those we’re still kind of the same peril because the optimizer maybe using the statistics but we’re not giving them that current feeding.

Randolph: Correct so, it’s valuable to have a, I hate this word, holistic approach to your current feeding is that you need to look at not just indexes, and not just fragmentation, and not just statistics but all of them. And in a way that is effective and efficient for your maintenance window. You don’t want to go and rebuild indexes every night because it doesn’t make any sense. But you also don’t want to stop doing statistics updates.

Steve: Yup. One thing I commonly see with that is rebuilding of the statistics or updating the statistics where somebody will have a job that goes through and blindly update every single statistic in the entire database every night. And that can have a pretty big performance impact on things depending on the size of your database. And a better approach to do that is to look at which tables or indexes actually have changes and only update those that have a fair amount of changes to them on that frequent basis. And then maybe on a regular basis of once a month or once a week you do an update on across all of them.

Randolph: Yeah, that’s a very good point. The amount of data that changes in a table if you’ve got automatic statistics enabled, we’re going to go a little bit of a segway now, by default automatic statistics are enabled. There are other settings that you can choose in 2016. There’s a whole bunch of new stuff per database which you can choose. Let’s talk about everything before 2016 because that’s what most people are still using. Statistics will only update themselves if a certain percentage of the table has changed. The value has changed across versions recently but the fact is that statistics will only update themselves if a certain percentage has been changed. So if you’ve got a really large table with tens of millions of records or rows and you have only one million that have changed recently statistics will not be updated automatically. So it does pay you to use one of the third party tools we’ve mentioned like MinionWare or Ola Hallengren’s maintenance solution where it will go and check, has any number changed and then you can update the statistics there and that will help your query optimizer as well.

Steve: The other side effect you can get with that is that if you do hit that threshold where it decides it’s time to rebuild those statistics that might be in the middle of your peak load during the day. And right when you’ve got a lot of traffic coming to your database and index or statistics rebuild occurred that can have impact on things too.

Randolph: That in fact is one of the recommendations for SharePoint style databases. There’s a lot of things I don’t like about SharePoint but the fact is a lot of people use it so one of their recommendations is to turn off automatic statistics updates on any SharePoint style database. That includes CRM, it includes GreatPlains, all of those even. What’s the one that I used? Whatever it is, there’s a whole bunch of them where you should turn off statistics updates automatically and then include statistics rebuilds in your maintenance plans. So it’s just to keep in mind each database is different.

Steve: It will probably apply to Navision and Dynamics as well. 

Randolph: Yes that’s the one. Yeah, Dynamics is the one I’m trying to remember. Thank you! So any SharePoint style database they do recommend turning off statistics updates automatically and to do them in a separate maintenance window. So it pays to do your research to make sure that you’re doing the right kind of maintenance plans for your database and your instance.

Steve: Yup, very good point.

Carlos: So I guess a couple of different things we talked about all of these components. And I guess let’s now talk about some of the benefits right, so all of these things we have to go in, right, kind of the holistic approach, kind of having to know our data, getting more familiar with it. Ultimately to what end are we going to do that? I think, so we’ve talked a little bit about performance. I guess we should probably talk about how that performance gets into the system if you will or the mechanisms that cause the improvements?

Randolph: Ok, so when I do my performance session, what I say is, “You’ve got expensive items like your CPU. You’ve got less expensive items like your RAM, and you’ve got even less expensive items like your hard drives.” So your CPU is the most valuable thing to you because the SQL license is per CPU core and you want to make sure you’re using the most efficiencies of your CPU and memory as you can. What an index does is, we spoke about this before, it is a copy of the data so you want to keep your indexes as efficient as possible so that if you’ve got a large table you don’t want your index to be large as well. You want it to be smaller so that less of it is in memory because that’s what this game is about. SQL Server is all about being in memory as much data as possible in memory. So for Standard Edition up to 2016 even you’ve only got a certain amount of memory that you can access. 2016 Service Pack 1, the limit is still there but that’s your buffer pool that is in memory as opposed to everything. But the fact is that there are limits to the amount of memory you can use for SQL Server specifically with Standard Edition because not everybody can afford Enterprise. So you have to manage the amount of data that’s going into memory as much as you can and that is most effectively done by the right kind of indexes for your queries. And that’s also why you don’t want to have duplicate indexes because it will also be in memory. Also, you don’t want to have wide indexes because they will take up more memory than they need to. And that’s why included columns are very handy way to reduce the size of your indexes that’s why we have filtered indexes. All of these things to try and reduce the amount of data that is in memory so that we can do a lot more with what’s in the buffer pool.

Carlos: So then how do we go about or what’s the best ways to determine, we talked about looking at execution plan that kind of gives the recommended index there. So what are some good strategies to go about deciding which index is get included and which ones we need to ignore?

Steve: One approach I take on that is if I’m looking at an execution plan and it suggests an index. If it’s a development environment one of the things I’ll try initially is just create the index and see how it helps and I’ll drop it, and then go look and see is there a similar index. I mean, if the index that it suggested was beneficial I’ll go and see if there’s a similar index that could be modified or added to that would give the similar and a fact that the suggested index was doing. And sometimes that means you’re taking existing index and just add in an include to it or taking an existing index and add in another column into it. 

Carlos: Ok, so that’s kind of a dev environment, one query at a time.

Steve: Yup, and the other approach that I see there is I can even go and look at some of the missing index DMVs to go and figure out. Actually you look at DMVs for a handful of things, one is to figure out what are the missing indexes and figure out the overall cause associated with those missing indexes. And then come up with what are some good ones to add that are not going to be duplicates of other indexes and then see how that improves performance. You can also, using DMVs, go in and look, you can figure out what are your duplicate indexes. If you have duplicate indexes oftentimes you’ll see that one of them might be not being used at all and the other one is being used a lot. You can go in and drop one of those. However, you need to be careful when you’re looking at unique indexes. You don’t want to drop a unique index versus a clustered index or non-clustered index you want to look at and drop the right one there. Another thing to look at is big clustered indexes, that’s another thing that you track down is what are the clustered indexes that are really wide. And that really wide means it’s going to ripple through all the non-clustered indexes with that extra size. You can also track down unused indexes. What are the indexes that are on your system that are not being used? Now when you look at that you need to be very careful because that’s only going to be the ones that haven’t been used since the SQL Server instance restarted. Also, if you rebuild and index I believe it clears those index usage stats.

Randolph: Sometimes.

Steve: Sometimes, exactly. So it’s something that if you see there’s an index that is not being used you might want to track it over a few weeks to a month and confirm it, yeah it really isn’t being used and then go and take a look dropping those.

Randolph: That goes back to one of the best practices is to have a baseline. So know what your system is doing and track that overtime and then you can refer back to that baseline and say, “Well, this is exactly what’s happening. This is different.” And go from there. Yeah, that’s a good point.

Steve: Yup, and that reminds me of a project, Carlos, that you and I worked on. Were we setup a monitoring component that ran for a month and kept track of unused index details, and log them every few hours, and then we went back at the end of the month and reviewed the ones that over the entire month had no utilization and then suggest those as possible candidates to be dropped.

Randolph: That reminded me of Pinal’s comment as well that he’ll only look at the database that’s been running for a week. The one thing I wanted to mention is the DMVs that you’re referring to there’s one in particular, there’s a group of diagnostic queries that Glenn Berry, Glenn Alan Berry, from sqlskills, he uses and maintains them. He’s got a couple of, and so does Brent Ozar and a whole bunch of other people. But Brent Ozar’s one called as sp_BlitzIndex and Glenn Berry’s one is included in his diagnostic scripts. It’s a very similar query and what it does it waits your missing indexes as according to number of scans, number of seeks and something called Average User Impact. Now that Average User Impact number can be, I think it’s unit less, I’m not quite sure how does that number is calculated but if it’s really in a high impact like tens and hundreds of thousands then usually I will go and look at that first and say, “Well, how many times has this index been used in terms of number of seek and scans.” And if it’s a high usage index that is missing or a high impact then I will usually create that without too many modifications if it’s not too wide or doesn’t have too many include columns.

Carlos: I think it’s the number of times requested multiplied by the number of hits.

Steve: Yup, and just to jump in there I think I’d feel a little bit left out I didn’t have the chance to mention Database Health Monitor on that. In Database Health Monitor there are many indexing reports similar to what Randolph has described there.

Randolph: Certainly, from Steve Stedman’s solutions has a magnificent and free tool called Database Health Monitor which I have used. I don’t even get paid for this statement. It’s a good product. It’s free which is even more amazing. This is the great thing about the SQL Server community. There are a lot of free tools that are out there that are adding value all the time. And all that people asked is that you recognized them and I recognized Steve as a valuable member of our community. That ends the sponsorship message.             

Steve: Thank you for the shameless plug.

Carlos: So it’s interesting, I guess, and maybe I’m being a little bit paranoid but using, so when I’m looking at that impact and I will take that and also using the ones that I’m interested in adding. Of course I want to do what Steve mentioned looking for duplicates or kind of what’s there, right? Then am I going to get over my threshold of ten or whatever just kind of taking into consideration what additional load am I putting on this table by creating the index. And while you want to test that out mostly the environments that I worked on they just don’t have a good mechanism for creating the similar load as I have in production. So when I go to implement that index one of the first things I’ll start to look at is those usage stats, right. Because I want the usage stats in the index that I just created to be going up because I know that, “Ok well, wait a second what did I do? Is this still a really good candidate?”

Randolph: Yeah, that’s a very good point.

Carlos: So I guess, other that’s how are going to go out and creating them, couple of other things that we didn’t really get into some of the other new indexes like column store or even XML indexes. But I guess other thoughts about when to start looking at these other, we even try to approach that now. Maybe I feel like we should punt on some of those. 

Randolph: I can briefly mention about XML indexes that the way they work is they’re going   to be larger than your table or your XML column. The reason being is that it will create an internal table according to the structure the XML document or XML field that you’re indexing. So it actually expands out the XML data into an internal table and then indexes that so you could end up with a column that is say 100kb maximum. That’s a small one and you can end up with an index that is 500kb, or gigabyte, or tens of gigabyte because it’s creating an internal table under the covers. A system table that it is then indexing so be wary of XML columns in that if you’re going to index then make sure that there’s sensible indexes that they’re only indexing certain nodes inside that XML document and be aware of that. And also I’d like to add here that if you use the XML data type, the data going into that XML row or column is not going be the same as what you get out. It does modify the XML data going in for whatever reasons.

Carlos: Say that one more time.

Randolph: If you have XML data that you put in for auditing reasons for example. If you pull it out it’s going to have been modified somehow. Either the tags will be slightly different or the spacing will be slightly different so do not use an XML data type to store audited records of XML types. If you want to keep an absolute record of what your XML looked like as it came out of a web service for whatever reason store it in a varchar or nvarchar column instead because in it it is identical. If you put it into an XML data type it will modify the data. It is still the same data but it will slightly modify. The tags will be slight different or whatever.

Steve: So then as far as column store indexes I think that’s something we could probably dedicate an entire episode just to talk about.

Randolph: We definitely can.

Carlos: I think you’re right.

Randolph: A column store index is neither an index nor clustered so it’s so complicated. In fact, there is a series of post. I think it’s Nico who has done it over one hundred posts on how column store indexes work. We could talk for days on that. They are fascinating and completely subverts what you think you know about indexes, and data, and row level storage and all that kind of stuff. It’s fascinating stuff.

Steve: Yeah, and just a few notes on that I think that it’s one of those things that is great for data warehousing or OLAP type things. And may not always be the best option for your OLTP side.

Randolph: At the same time you could have reporting style queries in your database and with 2016’s optimizations for column store you could have which can change and can be clustered and all sorts of stuff. You could have some stuff that in your OLTP environment that could be OLAP style indexes. Yeah, there’s so much to think about there.

Carlos: We saw that in Episode 78 with Brian Carrig. They talked about using their column store index in a transactional environment.

Steve: Yup, and then one of the things that I’ve seen that is one of those misunderstandings around column stores is that if you, because column store does the compression on each column, is that people think of it as I don’t have to have any non-clustered indexes if I have a column store index. And that’s not true at all. And that if you’re looking at a bigger table and you’re using column store you may still need to have some non-clustered indexes on it as well.

Randolph: Oh yeah, so do you research. Read all one hundred and whatever post. Become an expert then implement them.

Carlos: So we’ve talked a little bit about, so implications, how do we know what to use. So space considerations we talked a little bit about more in the rebuild process, right? That we’re going to need additional space in our data file and our log file things like that. I think we’ve had a pretty good conversation there. I guess one last idea I’ll throw out that we can bat around a little bit. We talked a lot about having multiple TempDB files and so one of the thoughts that I’ve seen out there is if you’re potentially, I won’t say, so what’s the value there is that if you’re looking to spread out or move your indexes from where your data might reside. One way to do that is to create another data file potentially on a separate storage and then rebuild those indexes.

Randolph: Ok, there are two things that I can talk about there. The first thing I wanted to mention is if you do need to move your table into a different file or file group all you have to do is do a clustered index rebuild and target that file group as the new location for the clustered index because remember the clustered index is you data. So that’s a handy way if you need to move your table into a different file group that’s how you will do it. The other thing is by virtue of that definition there is that you could have your non-clustered indexes in a different file as your data and you might want that for improved I/O performance or if you’ve got your read-only stuff in a different file group or all sorts of reasons for that. It’s very handy for splitting your load on the I/O level. Less of a problem these days but it’s a nice thing for large tables to split your non-clustered indexes from your clustered index so that’s not reading from the same portion of the disk or the I/O subsystem or both.

Carlos: Right.

Steve: Just to note on that index rebuild for a clustered index to move it to a different file group although that will move the table and the entire index. One of the things that could be left behind when you do that is any of the varchar max or nvarchar max or other large items that are located outside of the data pages for that table.

Randolph: Yeah, off row data will be affected. Steve, I don’t know because I’ve never tried. What happens if you tell it to do the index rebuild with the log compaction enabled.

Steve: You know, that’s a good question. I have not tried that for a long time.

Randolph: I smell a blog post.

Steve: Yup.

Carlos: Ok, very good. Well, awesome. I think great conversation and obviously there’s a lot more that we could talk about indexes but I think this was a noble attempt at covering some of the basics and getting into some of the nitty gritty as well.

Randolph: Yeah, the fact is that indexes don’t stand by themselves. They are a very important part of everything including statistics and everything else so don’t think that once you became an index expert you become an index performance tuning expert because that’s not true. You have to have a very broad knowledge of how things work in a number of different fields used upon to get the best performance out of your system. And there’s nothing wrong with good enough. You don’t have to have 100% defrag indexes. You don’t have to have indexes rebuilt. You can have them reorganized. Don’t have to have them reorganized at all if your statistics are up to date in certain contexts. There are a lot of tradeoffs that you have to think about when doing your maintenance plans and indexes form just a small part of that.

Steve: Yup.  

Carlos: Great.

Steve: Very good point.

Carlos: So shall we do SQL Family?

Steve: So how did you first get started with SQL Server, Randolph?

Randolph: Steve, that’s a good question because when I originally got this question I had to think very hard about it. I have been fascinated from a personal level about organizing data. Back in my youth I would create lists and lists of things CDs, books all that kind of stuff and I would do them in the paper equivalent of Microsoft Excel. And then when Lotus 123 came along I started playing with that because my dad worked at the bank and he was a 123 expert. If I’d know about VisiCalc I probably would have used it. And then I got into Microsoft Access because it was there. It was a very old version of Access. I think it was version 2 that I started with and then I started working at a PeopleSoft implementation partner in South Africa where I am from. And that’s where I first work with SQL Server and Oracle at the same time, and I was not a fan of SQL Server. I have to be honest. At that time it was version 6.5 and version 7 that’s around the time I joined so there were issues because as some of your listeners may know SQL Server’s engine was rebuilt around 7 time, 6.5, 7 time. In fact, if you look on MSDN you can download 6.5 and you can download 2000 but you cannot download version 7. It is very difficult to come by. There are maybe good reasons for that. I got exposed to SQL Server that would have been in 1997, around there. So yeah, that’s where I first got exposed but then I didn’t really play with it until I got to the bank. I was more of an Oracle guy. Got to the bank in 2006 so there was a large gap of not playing in SQL Server and then I couldn’t figure out why DESCRIBE, the keyword DESCRIBE wasn’t working in SQL Server. For all of you Oracle office you notice there I prefer SQL Server now because as I said before a lot has changed. It is a much better product than it was in 1997. In fact, I think it has surpassed Oracle. I think it’s better than anything else as well and that’s because I’m trying to keep abreast of all the new stuff. I don’t want to be stuck in the past and have assumptions about the product. I want to play around with the new stuff so. That was a long way of saying 20 years or so.

Steve: Ok. Well I know that in the late 90’s Oracle was a much better product than SQL Server.

Randolph: It really was.

Steve: Yup, and that has changed. That has changed significantly in the last 17 years.

Randolph: Yeah, it has.

Carlos: If you could change one thing about SQL Server what would it be?

Randolph: I had a pot for this but I’m going to change my mind. The pot answer is in-memory processing on SQL Server, in-memory OLTP which was called Hackathon, that was the code name for it. There is a transaction log operation that happens even for stuff that you don’t want kept. It’s still being backed by disks somewhere. It may not be in the transaction log itself but it is still being backed by disk somewhere. I would like to have in-memory objects that do not have anything on disk backing them at all. They must just be there, and if the server goes down, oh well tough luck.

Carlos: Alright. Yes, I remember seeing you in the Bob Ward presentation at SQL PASS talking about the in-memory OLTP.

Randolph: There’s some very very cool stuff with in-memory OLTP that I’ve been playing with. I’ve been blogging about it as well on my blog bornsql.ca. That speaks about some in-memory ops and I found. Interestingly I found another issue that I’ve spoken to Jonathan Kehayias about where Temp tables are slower than Temp variables in certain cases. So I’m investigating that at the moment and it had to do with my in-memory investigations.

Steve: Oh interesting, I’d like to see what you find there.

Randolph: If I could change, that was my pot answer about SQL Server what I would change. What I would change is having a TempDB per user database as a configurable extra.

Steve: That would be nice.

Randolph: In other words it doesn’t have to be default but if I need a TempDB for a particular database I would like to have that as a separate entity so that I could manage it separately and keep track of the objects that are happening for that particular database. The architecture for that is going to be really difficult to do so I don’t know if Microsoft is going to do that but that’s what I would like.

Steve: Alright, so what’s the best piece of career advice that you’ve received.

Randolph: Oh, it’s going to be along answer as well, sorry. “Let your work speak for itself” is the first and foremost one. So it doesn’t matter what people say about you if your work can speak for itself then you don’t have to say anything. That’s the best piece of advice that I had. But the other is “Everything you say and everything you do is being recorded somewhere” so please treat people with respect. Treat people as you would have them treat you and don’t say things on voicemails that can be played back to you at meetings with the CEO and saying Randolph don’t do that.      

Steve: I like that too.

Carlos: We won’t ask for the personal experience that led you to that advice.

Randolph: I have at the end of the year or every now and then when it’s a quiet time on my blog I say career limiting moves and that was one of them. So there are a bunch of career limiting moves that I can recommend to you. Don’t do because I’ve done them not because it’s good advice but it’s because I’ve done them so don’t do that because I’ve done it, and I will be one of them.

Steve: Alright.

Carlos: Our last question for you today. If you could have one superhero power what would it be and why do you want it?

Randolph: I thought about this and I honestly don’t know. I liked Pinal’s when where he could touch a server and know what’s wrong with it. The problem is whenever I touch as server if it’s going to break it’s going to break then and there. Maybe I want the reverse of that superpower.

Steve: To just know without touching the server.

Randolph: Yeah, you know per cost of maintenance is one thing. But if I’m in a room and it is going to break it is going to break while I’m there which is good I guess in way because you then you know that we can figure it out then and there. But I’d like the reverse of that that would be cool.

Steve: Ok.

Carlos: Well, awesome. Randolph thanks so much for being with us today.

Randolph: Well, thanks for having me it’s a topic near and dear to my heart indexing and maintenance and stuff like that so I’m glad I got an opportunity to speak to you guys about it.  

Episode 80: Containers

When was the last time you deployed code and then found an issue with the version of a stored procedure, a setting, or even a service pack?  While we data folk don’t generally have as much trouble with environment issues are our developer counterparts, it can still be a big deal.  Spinning up a VM may not be such a big deal anymore; however, most of us still have to request one and wait.  What if you could take care of the OS and the SQL Server and not worry about setting up another development environment?  Our guest today is Andrew Pruski and he talks to us about how he is using containers to support his environments and the flexibility it provides to his and his co-workers.

While the Linux containers seem to get lots of love, one unique thing about Andrew’s setup is he is running Windows containers with older versions of SQL Server.  What is cool to me is there are tools out there that can help us folks running windows get up and running without having to wait on our infrastructure to upgrade to Windows server 2016.  If you are using containers, I would love to hear about it.  Use the hastag #sqlpodcast and let us know!

Episode Quote

“Containers getting more and more popular with more diverse software that you can run in them.”

Listen to Learn

  • What containers are and how SQL Server can live in them.
  • The benefits of containers
  • How Andrew gets containers to run on Windows
  • Resources for more information

Andrew on Twitter
Andrew’s blog
Windocks
SQL Server Containers

About Andrew

SQL Server ContainersAndrew Pruski is a SQL Server DBA with over 5 years in database development/administration. Originally from Great Britain, he now works in Ireland. He is a regular speaker at SQLSaturday events.

Transcription: Containers

Carlos: Andrew, welcome to the show.

Andrew: Hi guys! Thanks for having me.

Carlos: Yes, so one of the reasons I reached out to you is ultimately our topic today is containers. And you had put out a series of posts on your blog about containers so I wanted to have you on because I think containers are going to impact database administrators here in the coming months if not years. I heard the analogy, and most of what we’ve been hearing from the containers kind of come from the Linux side but it seems like the Windows side is starting to speed up there as well. The analogy that I heard was that containers are to environments much like VMs were to Servers. And just kind of carving them up and making them their own little islands if you will so that you can then move them around much easier. That is what kind of the container idea brings to a specific application, and in this case we are talking about SQL Server in the container. Is that like a fair representation or understanding of containers? I guess why won’t you start as off and take us through kind of that definition of what containers are and how you’ve been using them?

Andrew: Yeah, definitely we’ll say that’s a good analogy. Their own horizon especially the Linux has been out for long as more established technology but the Windows side is coming up fast. And in, I’ll just say, months or years we’re going to be looking at containers getting more and more popular. And more and more diverse with the amount of software that you can run in them and especially with Microsoft announcing support SQL Server with their containers. It’s going to be an interesting time in next year to see what happens. I’ve been looking at containers as opposed to VMs is if you have a lot of different apps that you want to run in an environment. Maybe you have a VM with only apps running but you have the same app that you want to run multiple instances of then containers can definitely be the way forward for you.

Carlos: Right, so it’s kind of that encapsulation of an application that I can then move the application around.

Andrew: Yup, definitely this apps will give you the portability features very very good. I mean, with the SQL Server I think the main advantage that you’re going to get with containers is the ability to spin up an instance at a very very short period of time. So if you need it quick throw away instances into SQL Server, do some development, or testing on containers they’re definitely be the way forward.

Carlos: Right, so talk to us about how you’ve started integrating containers into your environment?

Andrew: My QA and Dev Departments have a couple of old servers sitting in the backroom somewhere that they’ve installed hypervisor on and they’ve got access to it. They’ve constantly building DMs, testing their apps and then blowing away the VM, and they do this every single time. Each time they do that they’re also installing a local instance of SQL Server. Now, the servers they’re running on exactly high spec, we’re talking 72K disks, CPUs that are five years old, probably older so the install time for these VMs was taking up to an hour. They did installed SQL and then start restoring all that databases from baseline kept in TFS and this process wasn’t exactly reliable either. When it did fail, they look, they have tuning to error logs and what not. They’re looking at this process and always thinking, you know, there has to be a better way but what to do. I mean this process has been in place for awhile I didn’t know there are other options to give them. Until probably about this time last year I start reading about containers and what options are there. I started to look around and got into companies that are out there, Docker being the main prevalent one. I started researching into containers and how SQL can integrate with those. What I basically did was I went off and had a word with a couple of my system admins and ask if there was a spare box lying around that I could provision, wiped it, installed an OS on it and then got the containers up and running. The idea is instead of these guys building their VMs, deploying their apps and then installing SQL Server. Because they deploy their apps to their VMs but then they run, the older apps they contact to container that spun up on a separate host, our container host. And the benefits that we saw were, well, we went from an install time of 40 minutes to an hour to spinning up container in about two minutes. That’s a new instance of SQL Server with all our databases available. Bear in mind they are small databases but the same database that we’re using for the install. And so from 40 minutes to about 2 minutes, that’s a huge saving.

Carlos: Right, now does that require an application change. You mentioned instead of pointing them to the database they could point it to the container. Does that mean that the application, instead of giving them a Port 33, excuse me, 1443, 1433, yeah and server name that you’re giving them a container name now. There’s another layer there?

Andrew: No, so what we did was one of the prerequisites I have was I didn’t want them to change that process too much. All I want to do is remove the install of SQL Server and have everything else exactly the same. What we did was I’ve written a partial script that goes off and creates the container from calling the Docker daemon. And then it creates a bunch of aliases, SQL client aliases on the application service because all our applications use DNS entries to contact the apps, to contact the server so we don’t need, so instead of using DNS the client alias overwrites DNS entry with the new container port number, so the app doesn’t know anything about it. The app still thinks it’s talking to an instance of SQL when really it’s talking to a container.

Carlos: Got you.

Steve: Ok, so then to take an example like what I do is I work on a Database Health Monitor Application and part of that is testing it on lots of different versions of SQL Server everything from 2005 all the way through 2016, and different installs from Express, to Standard, to Enterprise. And to do that I have a number of VMs that I’ve spun up and then I also have stacked instances on each of those VMs. So it sounds like if I was using containers instead of having all those VMs and stacked instances I could just have a container that I spin up with that test environment. Do what I needed to do and then turn it back off again. Am I following this correctly?

Andrew: That’s exactly, yes. Instead of having your instance there you could literally just kick off, say a little script that you’ve got to say create me a new container from a saved image that I’ve got especially for this one test purpose and off you go.

Steve: Ok.

Carlos: So again, the components there are, so there’s a host server, right? That host server is going to have an operating system and then I’m installing the containers on top of that. And my application could be a container and my database is a container and they can talk to each other.

Andrew: Yes, absolutely, so instead of having another layer with a hypervisor there. So instead of having the hypervisor the container is running on it that you have base OS and then you have your container daemon running that would create your containers for you which basically is the service that you call to say spin me up this container, spin me up a container.

Carlos: Very cool! And so we talked about Docker being kind of the leading technology there. It seems like as we’ve mentioned all the love was kind of going to the Linux world but you’re doing all of these in a Windows environment.   

Andrew: I am, so one of the things that trip me up at the start of the year was great. So we get in some support for containers on Windows which is actually brilliant. I think Microsoft has announced that partnership with Docker way back in 2014. So we’ve been all sitting on how it’s going, come on then when it’s going to happen. But it start to progress in the recent SQL Windows Server 2016 that the community technology preview, and Windows 10 as well if you’re inside of track preview which have the Enterprise Edition you can run containers on Windows 10. And so yeah, as you said, Linux was getting well of, and we’re finally start to see some of that come true now. We’re getting the same functionality that these guys have been playing for awhile.

Carlos: Right, so now what are the apps or components that you’re using on the Windows side to get that done because it seem like you’re really just talking  about from an operating system perspective it seem like Windows Server 2016 was the operating system where that finally arrived.

Andrew: Yeah, so that’s the operating system that Microsoft announced first supported feature. It is actually an in-built feature that you can switch on when you install the operating system so it’s just like anything else like .Net and things like and you’re away. You could be up and running with it in less than 10 minutes.

Steve: So when you say less than 10 minutes is that less than 10 minutes for someone who is a pro with Docker or less than 10 minutes for someone who’s just getting started.

Andrew: Well, it literally is enable the feature run. I think it’s two scripts and you’ve got your service up and running so it’s very very simple.

Steve: Ok, so then today I know if I’m working with VM to sort of make my own environment. I remote desktop to that VM, I do whatever I needed to with it and then I could shut that VM down later. But with containers then is it similar that you can connect to that container and then you have a choice of saving what you’ve done or aborting it and going back to that original state.

Andrew: Yeah, so what you can do is you initially connect to the Docker engine, Docker daemon, and you can create your own base images from templates so you have the base OS which I think Windows core on 2016. And then you can get SQL image, download that and spin up your SQL container. Then what I did was install a lot of databases, shut that container down and save that as a new image that I could then create new containers from, so instead of having to restore databases each time and to each container when I spin the container up those databases there are already in the state where I wanted them in. 

Carlos: Well, that seems like a slightly or I guess I’m not sure that I quite understand that either because the databases, I thought what the databases were not actually part of the container. Now maybe the data on that database, I thought they were separate like I can move my container around and those database files are not part of the container.

Andrew: Yup, ok the portability feature of the containers is something that I hasn’t actually played with that much so I’m looking at getting into. What I’ve been mainly concern with is I want an instance of SQL up and running very quickly. The software that I’m working with which I mentioned a bit called Windox is I don’t think you get that portability functionality because it is obviously dependent on the I/O subsystem. One thing I have tried is that with the Windows containers, the ones that support on Windows 2016, the database and the SQL instance are all wrap up in the container which is I/O independent. I need to double check that I’m afraid guys but you can basically push your image up to the Docker hub, save it there and then pull it down unto a different server and the databases will all be there.

Carlos: Right, now again that’s what I’ve been seeing from these demos, all these part of what they’re talking about. It seems like from an application perspective it’s a little easier to understand and grasp. But from the database perspective it seems like there’s a little bit of leap there still.

Andrew: Yes still, again very very new, I mean obviously that’s stateless so doesn’t really matter where they live and then not depend on the I/O subsystem. But of course with SQL Server, so depend on the I/O subsystem getting you head around how it actually all working in the background.

Carlos: Right, and you also mentioned your Win Docker tool and one of the things you like about it, the reason you’re using it besides the fact we’re on Windows was it helps support previous versions of SQL Server.

Andrew: Yes, so that was another thing that sort of trip me up from the start as well when it comes to this new software that’s been available, new functionalities available on Windows Server is what you’ve got at in the Docker hub at the moment is Express Editions of 2014 and 2016 and then full edition of the next edition of SQL Server. Which I think all the features are available that are available on Windows Core, so it’s pretty fully pledge database engine there. But again it’s a higher version of SQL. I wouldn’t say anyone is using particularly right now. There’s no point in getting my QA and Dev people testing on the higher version that I have in production. We’re running Windows Server 2012 R2 with SQL Server 2012. As far as I know we have no plans to upgrade in the next year anyway so what I did is I have to go out and start looking for it. Is there a software solution out there that will give me SQL 2012 running containers? I eventually found, yup, this company called Windox who have basically built a custom port of the open source software that Dockers put in the market which allows earlier versions of SQL server to running containers on earlier versions of Windows Server. I think it is 2008 upwards, both cases. I mean, I was pretty tough of that because it was exactly what I needed so I went off and attached to them and they’ve got a free Community Edition which I went to grab to ask them. Touch my system admins and grab my old beat up server, installed Windows Server 2012, followed the Windox installation instructions which again was pretty simple. They’ve even simplified it more since I had the first run of it. It’s basically a partial script to get the engine up and running, and once you’ve done that you’re good to go. That has allowed us to get SQL Server 2012 in containers on Windows Server 2012 so I got my Dev and QA Teams now working containers down the same level, and patch level as my production environment.

Carlos: So what’s the difference between the Community Edition and in the paid version?

Andrew: It’s basically the number of containers that you can have.

Carlos: Ok, so what’s the Community Edition limit you to?

Andrew: I think it limits you to two. Literally it’s just a soft gateway to show you how the functionality works and what you can do.

Carlos: Got you.

Steve: Ok, so in that solution then, if you’re downloading the container with the SQL Server version to match what you’re using in production. How is the licensing then done on that? Are you then having to put your own license codes in for SQL Server?

Andrew: We’ve used the Developer Edition so we neatly side step all of that.

Steve: Oh, perfect.

Andrew: I think the way it’s going to work with say like Standard. I think it would be nuts to go with Enterprise in the container at the moment but I think it would work the same way as how you would license your host and you can build from there. 

Carlos: Yeah, I think that’s a great way to make that adoption and to get comfortable with it is to using containers in those lower environments because they do tend to move. You want to spin them up, you want to take them down. You want to shake and bake them a little bit more than your production environments obviously, and that’s what actually containers are really really good at as well. That’s an interesting approach.

Andrew: Definitely treat the containers as throw away object. If you go out and use this one don’t spend time restoring databases into it, fixing things. Blow it away, run a script, build new container.

Carlos: Right, so now what happens if you wanted to add a new database to the container? What’s that like?

Andrew: It handles and works exactly like a normal instance of SQL through Management Studio. You don’t get the agents so that’s the only thing you will see. But you create a database and if you wanted to then replace your base image, you stop that container, blow away your original image and save that as new image. And you’ve got an updated image with your new database. I mean that’s part of one of the reasons as well that we went for containers instead of deploying to VMs. Because the guys we’re deploying our databases from base lines back up files basically kept in TFS, and then applying a load of updates to each and every database to get it to the level of production. Whereas what we’re doing is now is every week we have a certain amount of releases and at the end of the week I deploy all our releases to our base image, update that image, blow the old one away, save a new one. And then next week can all blow the containers away, build new ones and in a couple of minutes and they’ve all got up to date containers exactly the same as those environment.

Carlos: Got you.

Steve: So then for that development environment you’re then able to get those containers, sound like up to date a lot quicker than if you had like even standalone SQL Server Developer Edition installed in every developers machine. You are able to just grab that latest package or latest container, and then everyone is on the same baseline rather than each developer maybe getting out of sync somewhere.

Andrew: That’s it, yeah, so we can say to them shut down, you’ve all got your Dev VMs pointing to the containers. Your containers are now out of date, blow these away, build new ones, and you’ve got, every single one of you has got a replica of the exactly the same image which is up to date with production.

Steve: Yeah, I could see how that alone would just save a massive amount of time.

Carlos: And confusion.

Steve: Yup.

Andrew: The deployment worked on my VM but it’s now working on yours. Why is that? Because there is a mismatch there which we’ve now managed to get rid of.

Carlos: Yeah, exactly. How many times have you heard that? Right, well works online.    

Steve: Yeah, that’s one of the standard developer lines I thought.

Carlos: So I guess we’ve talked a little about some of the pieces that you’ve used. I guess anything, any other advice as far as getting up and running.

Andrew: I mean it is very simple to get it up and running. Bear in mind that this is brand new technology, we’ve had a few issues with integrating it into our own processes. I think the bit of advice I would give to anyone looking to go with containers is don’t go with containers just because you want to use containers. Have a look at your businesses processes and practices and see how they’re going to fit in there. Don’t try and work containers into them if it’s not going to be right because they are great technology to work with but you really need a business case for them there.

Carlos: Ok. But I guess I’m going have to say that I think particularly that scenario that you just provided that almost everybody has that circumstance with their lower environments. Well, I guess assuming their development shops, right. If you’re installing SharePoint for example and that’s your application then maybe that wouldn’t necessarily apply. But I think most development environments where you could make a case for that.

Andrew: Yeah, agreed. I would suggest have a look at and see what they can do for you. I’m very guilty of which technology I want to use and I want to play with it. But I really have to put out business case to get this in. It’s been working pretty well. I mean, we’ve had some issues to the fact that sometimes the containers are up and running and it’s new and it has crashed, and then we have to redeploy and do things a lot. But we get into the habit of the throw away objects. Get rid of them, rebuild and you’ve got another one straight there.

Carlos: Exactly.

Steve: Alright, so then I mean, sounds like containers have been around with Linux for much longer than they have with Windows, obviously because this is brand new Windows. But are there any real conceptual differences between containers in Linux versus containers in Windows other than the obvious operating system differences?

Andrew: I won’t say there are any conceptual differences. I think the main thing is that, as you said, they’ve been around on Linux for a lot longer so that technology is a little bit more mature and some of the features. I think some of the networking stuff isn’t available on the Windows side yet but it’s really just a case of just watch this base I think.

Carlos: Right.

Steve: So then on the networking side, I know if I spin up VM with hypervisor it’s doing this virtualized network interface for that VM. Is there something similar for containers where they’re getting a virtual network interface or they’re using the network card of directly of the OS?

Andrew: I think they used the network card. I’m not too sure I’m afraid actually there. The way I’ve been working with the containers as each one of them, each SQL instance listens on its own specialize port that you can specify when you create the container, so you can map the host port to the container port and when you’re connect them locally or connect on host port. When you connect them remotely you’ll get your individual pullage specify when you create the container. So that’s the one thing we did when we’re building the containers and the aliases that I mentioned is that we use aliases because you can’t use say like host because you can’t specify port. But with aliases you can say connect to this server on this port and that takes them straight into SQL instance.

Carlos: Right, so that mapping actually happens in the container where you can specify when the incoming port is this send it here.

Andrew: Yes. So it’s just a little switch when you specify say, run, create a container from this image, this port to this port, off you go. It does mean there are differences when you connect say, if you’re on the VM where the host is running, where the containers are running. You’ll be connecting all different ports as you were if you were outside of that VM.

Carlos: Ok, so then for someone like me who has not used containers before but it sound interesting and I want to go and give it a try. Is there any advice that you would just recommend to watch out for or be aware of on that first time giving it a try?

Steve: Well, I think if I was going to do that, just to jump in here Andrew, I think also obviously on the Windows side, right, you have Windows the Community Edition to going to get started. And then I would jump to Andrew’s blog and start there.

Andrew: I mean there are blogs there I’ve written from Part 1 to Part 4 that will take you through initial setup and then playing around a little bit. And then actually working with the Docker hub as well which will allow you to upload and pull repositories down and things like that.

Carlos: Alright.

Steve: And that’s side is dbafromthecold.wordpress.com. Is that right?

Andrew: That’s it, yes.

Carlos: Yeah, and we’ll have that available in our show notes as well which will be available at sqldatapartners.com/containers.

Steve: Perfect. Well, I think that wraps up the questions I’ve got about getting started.

Carlos: Ok, awesome. Well, I mean so great conversation, great, you know, interesting technology. Thank you for coming walking us through some of these use cases that you’re using it for. I think particularly on the Windows side, because admittedly that’s where I feel more comfortable. I’m interested in trying Windox and kicking the tires there, and see how it goes.

Steve: Yeah, thanks for having me guys. There are a couple of things I’ll come back to on especially about the network adapter. Those are interesting questions although I wasn’t too sure. And also I do also want to double check the upload and download of containers into the Docker hub and see if it will actually just keep the actual states of the SQL instance or it will take the databases with you. So I can come back to them if you would like to.

Carlos: Alright, yeah, that would be great.

Steve: Or I’ll have to keep an eye on your blog and see if it shows up there I supposed.

Carlos: So shall we go ahead and do SQL Family?

Steve: Yeah, let’s do that, so how did you get started first using SQL Server?

Andrew: I started using, well the first job I got was this local government coming out of university, and it was writing Crystal Reports. I didn’t have any idea with the backend was. I was just writing SQL queries. I thought it was Oracle at first, it was actually SQL Server. I was doing research on, you know, I was writing, and things were running slowly and I start doing a lot of googling around how to speed things up for. So basically how to do certain things with SQL I didn’t know. I eventually stumbled at sqlservercentral.com. I’ve found this huge wealth of information about this whole entire industry of professions that I’ve had absolutely no idea about. And just started peak my curiosity from there and eventually decided that I wanted to become a DBA.

Carlos: Very cool. How then did you get your first job as a DBA?

Andrew: I think completely out of luck. I was very very inexperienced. I was applying for jobs and eventually got into with a place called the United Kingdom Hydrographic Office. The interviewer there gave me a chance and gave me my first job as a DBA. That was about 5½ years ago now.

Carlos: Very cool, ok. Now if you could change one thing about SQL Server what would it be?

Steve: I want to have dark theme for Management Studio but it’s not going to happening anytime soon

Episode 79: SSIS Catalog

Although the SSIS Catalog DB is created with management studio, it does not behave like other databases.  Our conversation in this episode revolves around the catalog, SSIS packages, and some of the complexities of migrating packages.  Steve and I are excited to  chat with our guest Andy Leonard about his thoughts on the catalog and how this feature provides some really interesting benefits for ETL architecture.

Episode Quote

“[The SSIS Catalog] is really there to facilitate a lot of Enterprise functionality that is needed in any data integration engine you use.”

Listen to Learn

  • The benefits of project deployments
  • Some of the issues migrating packages
  • Why restoring the catalogdb can be problematic
  • The various levels or components of a package
  • How you can view reports without going into SSMS

Restoring the SSISCatalog DB
Integration Services stairway series
WorldWideImporters Sample Database
DILM Suite

About Any Leonard

A former chicken farmer, Andy is now is a Data Philosopher at Enterprise Data & Analytics, an SSIS Trainer, Consultant, and developer; a Business Intelligence Markup Language (Biml) developer and BimlHero; SQL Server database and data warehouse developer, community mentor, engineer, and farmer. He is a co-author of SQL Server Integration Services Design Patterns, and author of Managing Geeks – A Journey of Leading by Doing, and the Stairway to Integration Services.  He lives with his family in Farmville, Virginia.

Transcription: DILM Suite - SSIS Catalog Viewer

Carlos: So Andy, welcome to the program.

Andy: Thank you!

Carlos: We want to have you on the show ultimately one because you’re pretty close to Richmond. And of course you’re Richmond famous if nothing else for helping start the local group and we’re very appreciative of that. You’ve been doing quite a bit in SSIS, written several books on it, been kind of putting together this catalog, this DIML Catalog which we will talk about here in a minute. But ultimately, I guess our conversation today is going to revolve around the SSIS Catalog. So let’s go ahead and maybe jump into it and kind of give the 101 overview of what the SSIS Catalog is and what it’s going to do for us?

Andy: Well, first I want to thank you, Carlos, for that warm introduction. I’ll just say that there are a lot of people involved in the Richmond, Virginia SQL Server community and just all of our technical community. We have a great collection of really different communities there from JavaScript across to the .Net community, to SQL Server. And as you know there’s a lot of interaction between the groups. And I’ll just also give you props for the amount of work and the amazing work you’ve done in the past few years after stepping in. I stepped away. I’ve been an attendee and a person who’s been enjoying a lot of the advance there that Utah put on recently. So thank you for that, it’s not just us. There’s a huge team of people involved and we could name names but they know who they are. So with that said, I’ll jump into a Catalog 101. And this aptly time, Kent Bradshaw, a friend and co-worker at Enterprise Data Analytics. He and I earlier today put on a free webinar where we talked about this very topic, an introduction to SSIS Catalog. So what the catalog is, it’s an SSIS framework. And what do I mean by that? It manages SSIS package execution, logging and configuration. And before the SSIS Catalog was introduced in SSIS 2012, a lot of people wrote their own, and I was one of those people. But it’s really there to facilitate a lot of Enterprise functionality that is needed in any data integration engine you use. It doesn’t matter if it’s Informatica or SSIS or one of the others. You always need some kind of framework. You need something to manage execution, something to log, something to manage configuration, so at a very high level that’s what the catalog is.

Carlos: Now, that’s an interesting topic. I didn’t realize that you could use the Informatica packages with the SSIS Catalog.

Andy: You cannot but Informatica has its own functionality built in, its own framework, so it does a similar set of functions as the SSIS Catalog.

Steve: So then, with the SSIS Catalog, people can still run SSIS packages without it but you get a lot more functionality in the logging and capabilities there if the catalog is installed and used. Is that correct?

Andy: Yeah, absolutely can Steve. Microsoft, the way they built SSIS starting with 2012 is they realized they were offering this big shift in paradigms for execution and they made the existing way of operating. They made that backwards compatibility, a very easy to accomplish inside the new, I guess the new paradigm. So what they did was they gave the old way of executing packages a name. They called it package model and the package model deployment, and that is the old way of doing things. And then they created a new default the project deployment model. And so in order to execute packages in the catalog you need to have them in that project deployment model and that’s a property of the Visual Studio solution, the SQL Server Data Tools or SSDT solution that you built. And they really created this really very rich variable boost. And I’ve never seen it failed once, wizard for walking through. First, importing packages and two, SSIS 2012, 2014 or 2016, and they come in as this package deployment model if they are earlier than 2012. And then, once you walk through that wizard they’re now in the package deployment model, and then after that if you want you can convert that project to a project deployment model and deploy it to the catalog and run it there. But you’re right Steve you can do everything the way you’ve been doing it for 10 years if you’d like even in 2016.

Steve: However, that sound like there’s a lot of benefit to using the catalog and not doing it the old way or the package deployment way.

Andy: There are some benefits to doing it that way, I’d say logging. And really the way the logging engine has been architected for the catalog that was a big step in the right direction. Microsoft decoupled a lot of things right at that logging works and just because of that you get package performance boost anywhere maybe from 20% to a couple 100%. So you can see in just a performance boost by importing the projects into, say, running in the file system to running in the SSIS Catalog, and again, it’s all around that logging.

Steve: Interesting, that’s a good point to know then, so if you’re having performance issues it might make sense just try moving them into the SSIS Catalog and see if that improves things.

Andy: Absolutely. You know, a lot of people schedule jobs in SSIS especially if you’re running again SQL Server and if you’re using SSIS you have to have SQL Server. Although the engine itself is free and you can develop and run in the debugger anything you’d like. In order to have it run outside of that development environment you need to have a SQL Server license on that machine, you know, that’s always been. But certainly, if you pull SSIS projects that are pre 2012 or even if you’re running 2012 projects, 2014 projects, and package deployment. Pull those into the Visual Studio Designer. You do the upgrade right there in Solution Explorer, you convert to a project deployment model and then you can right click the project and deploy it right over to our catalog. Yeah, you’ll see performance improvements. It’s not just the logging but the logging is probably the largest one. Probably the next largest performance boost you’ll see is they’ve got a different execution engine that they built specifically for running SSIS packages that are executing in the catalog. So it runs a little faster which isn’t surprising. And when the catalog executes packages, it executes it in that. And I don’t know if you had this experience, but I have. I’ve built a version of something, and then comeback a few years later, been asked to upgrade it, make a new version. And that second version is always better.

Steve: Oh, sure. Yeah, you all know the mistakes for the things that you would do differently the second time around. And it sounds like that’s what they’ve done, is that they’ve just done a better job that second time around.   

Andy: I think so. There’s spots in it where it could improve. And that’s part of the definition of software development, right? Anytime you build a database or piece of software you can always improve it. And I’m sure, I know for a fact talking to some of the folks involved with the project they’ve looked back and said, “Gosh, yes, we should have done that, we should have done this”, but seriously kudos to them. It is a stable engine and it does what they built it to do, and it does it well I think.

Carlos: I am getting flashbacks to the 463 Exam Prep with all these talk of converging from package to projects deployment.

Andy: Yeah, I mean it’s involved. But once you’ve done it a couple of time, Carlos, I don’t know if you had that experience but it’s really not hard. The wizards are, you know in my opinion the wizards are work of art. It’s well architected and it’s something that is really easy to, you know, the kind of fall out of a project plan. The importance of those conversion wizards or either upgrading projects or changing them, they’ll go back and forth. You can go to a project deployment and back to a package. Not only that but the deployment wizard that they built for deploying stuff into the catalog, again it’s another snappy piece of curd. Very, you know, there’s a lot of Enterprise features in there. And if nothing else, I’m really glad we’re talking about this because I don’t think people realized just how well some of that software works and how flexible, and well architected, well designed it is.

Carlos: Yeah, when we talk about the flexibility, I think particularly that’s the big benefit I’ve been able to see is the ability to change the environment parameters. And then have it execute in different places without actually having to then create all these copies of the project of those packages and then have version issues and all that kind of stuff.

Andy: It’s definitely more aware of lifecycles then in any previous version. There are still some corners there when you start thinking about it in the lifecycle management perspective especially for Enterprise software. You know, some of it is just, some of it is not Microsoft’s fault or anybody’s fault. Really, it is things like the nature of XML. So you know that just any XML engine, any document object model that is semantically managed if position, relative position in a file for instance doesn’t really count. What counts is where it is in the tree. And so when you start doing things that you would expect in an application lifecycle management like compare between two versions of a package. You may have identical versions of a package functionally. But internally the text maybe moved around because it is XML. And from a semantic perspective it is identical but coming up with a good semantic compare is not easy.

Steve: Yeah, and I think that’s one of the interesting challenges that I’ve come across and trying to do source control around SSIS packages is that you committed to source control and then it tries to do a text based gif and you end up with a lot more gifs than what you might have expected because things shifted out in the XML.

Andy: It is very challenging. Yeah, and even the act of deploying an SSIS package to the Catalog alters some of the value that are stored in there. Some of those are just insignificant things like there’s a last deployed field that maybe updated. But there are some other things that actually happened to the package XML itself for instance the package protection level is set to a value that in 2012, 2014, 2016 does not appear in the list of protection levels. So in the old days there were six and they were, don’t save sensitive. It’s all around sensitive information so there was don’t save sensitive, encrypt sensitive with user key, encrypt sensitive with a password, encrypt all with a user key, encrypt all with a password, and then the sixth one which isn’t in the new stuff. In the old days there was one called the server storage. And that primarily existed so that you could set your packages to be encrypted on server storage and that was only used when you deploy to the MSDB database which you can still do that’s part of package deployment. But what they did when they create a project deployment is they made the SSIS Catalog and the SSISDB database. It uses encryption and in order to create it you have to supply a password that will solve the encryption. In order to restore it you also need that password and that we’re going to talk about that in a minute. But what happens is in the act of deploying it changes whatever protection level you set for the development environment into the equivalent of server storage. But again that’s not in the list. You can’t pick that as one of your options on your side but the catalog will switch whatever of those five you pick. It will change it to server storage and it’s using SQL Server native encryption, transparent data encryption. And, you know, that’s pretty rock solid. I know a lot of people have moved to that and I was absolutely thrilled when SP1 came out. At the time of this recording we’re doing, we’re doing this at the end of December 2016 back in November at the MSConnect Conference when they announced SQL Server 2016 SP1. They made encryption available across all of these additions and so much more. It was really nice to see that.

Steve: Prior to that announcement the TDE option or transparent data encryption that was only available with Enterprise. Did that you mean that you had to have Enterprise edition of SQL Server to use the SSIS Catalog in this encryption mode?

Andy: Yeah, that’s a great question, Steve, you did not. They had encryption, I don’t know how they did that but you had it in Standard. What happens when you create an SSIS Catalog is there’s a backup of the SSISDB database that shifts with SQL Server. It’s actually still there if you go look. Any place that you’ve checked the box for integration services in your installation if you go dig through program files into DTS directory. You’re going to find an SSISDB.bak file. And what you’re doing with the GUI that they supply in SSMS when you create a catalog you have to enable SQL CLR. It has to have that if it’s not already enabled because all of the execution and a lot of validation and other functionalities that occur in the catalog is actually happening down the .Net framework so there’s a lot of SQL CLR stored prox in there. So that part has to happen and you have to supply a password because you’re going to be using that encryption.

Steve: Ok, so that’s a good transition I think for us then into the topic of restoring an SSIS Catalog database. And I think that’s one of the things as a DBA a lot of people might be really familiar with normal backup and restore but they may have never done it with SSIS Catalog. What kind of thing do you need to know or you do differently when you’re doing that?

Andy: Well, there’s a great article out in the MSDN, and I think there’s a companion article over in TechNet. Both say basically the same things. So they walk you through this, this process, there’s in a lot of ways it’s like restoring any database that uses encryption. So whenever you’re using encryption you have certificates, and you have passwords and all of that sort of stuff. The catalog has all that as well. In addition to that, there are some users, some logins that you need to create to fully restore to operational status in SSIS Catalog. One of the things that I get calls, it’s a tragic call, is someone will have lost the database and they will have been backing it up for years, and SSISDB is out there and it’s all backed up. And they say, “I’m having trouble, I restored it.” You can actually restore it without the password you can do a standard restore. And it will come in there and a lot of your metadata will be press and you can get to it and read it. But when you start executing packages from that, or when you try to export a package for instance, or deploy a package to that catalog all of a sudden you’re getting this errors you never saw before. And what’s happened is the encryption is broken. And in order to make that work and work correctly you have to follow those instructions out there at the MSDN and TechNet sites. I wrote a blog post about it, and I know you guys sometimes include blog posts. I blog at SQLblog.com, and If you go to SQLblog.com and click on my name and then you can search for “restore SSIS Catalog”. I took it from the perspective of what it’s like when you try to deploy an SSIS package or project to a catalog that’s been restored incorrectly. So I started with those error messages and kind of work through, “This is the problem you have now and here’s how you can solve it.” I included my template script that I use all the time. I gave it away but it creates the users and the login you need. I put dummy passwords in for a bunch of stuff and you just go in and everywhere I mark places where you need to update it because I don’t you using the same passwords. But if you do that it restores at about 5 seconds. And you’ve got a, well, an empty one restores about 5 seconds let me say it that way. Your mileage will vary. The more stuff is in the catalog your mileage will vary but the key here is really understanding the importance like so much in our field. Like you see, like the password for the encryption. You’ll think I never need that again, right? Export the keys, export the encryption keys and you see these warnings all the time for years with Reporting Services. You know, all of these products that are great but you can really leave your Enterprise in a lurch if you haven’t done these best practices.

Steve: I think it shows there is the difference between, I won’t say a normal database, but maybe application database one that’s kind of being dig in to the SQL Server Management Studio that is so tightly integrated with several of those components that it kind of require its own special handling.    

Andy: It really does, and you know, for the longest time and I’ll still say this. In fact, I said it today on the webinar, it’s just a database. The SSISDB database but I would hang in Astrosoft that and say that, you know, with encryption and then you kind of need to hang another Astrosoft that says that with the special set of users that are happened, you know, the logins happened to be hard coded into some of the interactions that happened between the database and the applications especially those .Net framework classes. And they’re specific to each version of the catalog by the way so you can’t mix and match two much with it. Yeah, you can get yourself into a world of hurt there just by skipping a step and it’s a very easy step to skip. And like I said, I’ve had the tragic calls from clients saying, “How do I hack this password out of here?” And I’m like, “I don’t know.” I sent up someone like you guys, and say it’s now a database issue. These guys can help, if anyone can help, they can, that’s over my head.

Carlos: The links that you mentioned earlier will be available on our website for the show notes today at sqldatapartners.com/ssiscatalog. We’ll make sure to put all of those components there.

Andy: Awesome!

Steve: Now, one more thing that you mentioned when we were talking before we start recording the show was your Data Integration Lifecycle Management or DILM Suite work that you’ve been doing. What exactly is that and who would want to use it?

Andy: Well, it’s great timing, Steve, to bring that up at this point in the conversation because an option to remembering all of your old passwords and stuff that you may or may not have available is to export this metadata. And the SSIS Catalog has a slew of metadata associated with it. Carlos mentioned earlier about environment variables and parameter values. You can store all of these configurations, metadata in the catalog itself. And as Carlos said, if you setup a lifecycle where say, you’re moving from dev, to test, to hopefully QA, and then production, something like that. I know years ago when I worked at we ended up setting up six of these tiers all the way through. And, you know, the more the merrier. It’s actually a good way to do it. I imagine you can get crazy with the dozen or something but. I always say, “You need a minimum of three tiers in your environment.” And you want to run as close to production in each one those as it makes sense to. And different people are going to define what makes sense differently. Here’s what 10 years of experience with SSIS has shown me makes sense. I want three environments at a minimum because I want to be able to build it in one environment. And I’m actually going to deploy it to the catalog in that environment and I’m going to run local unit test and integration test there. I’m going to look for functionality. I want to make sure that the rows are moving and any kind of transformation is happening as I want. Then I want to deploy that to something other than production and here’s why. When I build it I’m going to design it with all of these defaults configured. I’m going to have default connection strings pointing to my either to my local laptop, or a virtual machine, or some development server that I’m login to a client. I’m going to have everything pointed natively to that. If I deploy it straight from there to production and I have forgotten to externalize something, maybe a connection string or variable value, a path. Then I’m going to figure that out when I try to run it in production and that’s called testing. And we don’t want to test in production. Certainly, any application but definitely not databases. So want to send it to some third location that’s not production, that’s not where you developed. Anyone who tests it there makes sure that it does what you wanted to do. So when you start making that transition there are really five things you need to think about in an SSIS Catalog.  There are the folders because there’s a logical separation of objects in there. All projects have to be in a folder. All environments have to be in a folder. We just talk about environment variables. There are projects and packages of course and those are really managed well by IS pack files. Wrap up the deployment in a nice little container. You double click it, it starts that wizard. I was going on about it earlier, I like that wizard. Those are managed well, that’s number two. So we’ve got folders and projects. And projects and packages are mostly together. The third thing is those environments that Carlos mentioned earlier. They have a collection of environment variables inside of them, that’s number three. Number four is really some catalog specific overrides. In Catalog they are called literals. And the fifth thing is really a reference, and this is how you get environments and projects and parameters to talking to each other. So think about projects and packages, they both can have parameters in them. And then over here we’ve got environments. And you can’t see me but I’m doing hand signs for this. Maybe I should turn my camera on. Like a reference is really a bridge between this project and this environment. And then beneath that you can like you have environments and environment variables you have these little attributes that I call reference mappings. And what they do is they connect the environment variable to the project parameter or the package parameter. So if you could think about the hierarchies kind of going on there you can, if you’re talking all the way down to say a package you can have something that’s kind of linked through about six or seven layers.

The project and the package Microsoft covers well, migrating the folders not so much. And as you get down to environments and the references, the environment variables and the reference mappings, it really the story from a lifecycle management perspective just sort of falls apart. But to get to the environments you have to double click on an environment or right click any properties. And you may have a couple of dozen environment variables in there, data types configured and values configured. Some of them maybe encrypted. There’s a sensitive bit you can set for every value individually. So you could choose what’s encrypted and what’s not. And you open that up after you’ve configured it. It’s different when you first configured it. Well the DILM Suite, when I started working on that, I started working on something else called frameworks. And I’ve got built frameworks forever but I was working on an interface for my framework, so GUI. And I realized about after three weeks, you can’t move this metadata in the catalog. There is no clean way to do this. And I needed that working before I could do anything with frameworks so I wrote a product called “SSIS Catalog Compare”. If you go to dilmsuite.com, there’s like six projects on the page. That’s the only one that I sell. The rest are either open source or are just flat out free. And one of them is called “Catalog Browser” where it’s a tree view. It will show you everything that is associated with those SSIS projects deployed to the catalog. It will show you the reference mappings, it will show you the references, it will show you environment variables and their values and their data types, it will show you environments. And if you’ve ever looked at what is in the node in SQL Server Management Studio. You can get to everything I showed there but you have to open different windows to see everything. And this just presents it in one view. And that’s one of the free tools out here called Catalog Browser.

Steve: Ok, so the Catalog Browser and Catalog Compare those are then applications that are built using the DILM framework. Is that correct?  

Andy: Yeah, that’s a fair way to say it.

Steve: Ok, so if somebody wants to do something that was maybe different but wanted to access those same pieces they could download and use the DILM Suite. And that’s your community edition, right, so people could write code to do that. But if they wanted to jump in and use the Framework Browser or the Catalog Compare they could just download those or purchase those from you and use them as well.

Andy: They certainly could. So what Catalog Compare can do is it’s really built to manage that metadata. It’s built to facilitate and support Devops. Anybody practicing any kind of lifecycle where they are promoting code, you know, stage control is a piece of this. Source control is a piece of this although I’m not touching source control. I’m just looking what’s in catalog 1 and catalog 2. And I’m not so much focused on the piece that Microsoft really does a great job on which is the projects and the packages. I’m focused on the rest of that stuff, the folders, the environments, environment variables, the references, the reference mappings because like I said, it’s look like they got to about that spot and stop for whatever reason. And without using some of these very clever and cool solutions that other people have developed out there. You have to either write your own clever and cool solution or you’re doing a lot of copy and then paste. But the new version is coming out in about, I’m hoping some time in the next few weeks. I’m testing it now. It’s going to include a command line interface. And everything in the Data Integration Lifecycle Management Suite is all built around this idea of achieving a piece of the puzzle so that we could do continuous integration with SSIS. That’s really what I’m after. It’s not going to have all of the pieces. I’m never going to be able to touch testing because frankly Pragmatic Works led you test does a fantastic job of testing SSIS. And they’re hooked right into the testing harness that comes with Visual Studio, John Welch and Brian Knight and the crew down there have done a bang up job. I don’t need to build that, that’s already built. But what I’m hoping to do with this command line interface, one of the commands is DCD for Deploy Catalog Differences. So Carlos, you may be developing SSIS packages on your laptop, and Steve you may be developing on an Integration Server in the office, and what I want to have is the ability for you guys to run this script every night that says, compare this to the catalog in this integration server set over here, and then just move whatever is different there. So whatever I have that I built today that since the last time that the script ran go ahead and deploy those services to this integration server. Now we’ve got that piece of the continuous integration going, right? We’ve got a target server where everything syncs up. So imagine that piece and then imagine running, having some way to automatically execute SSIS packages and then have tests to go look at the results of those executions and provide feedback right into TFS or whatever the testing framework can talk to in Visual Studio. And you’ve got a big piece of the picture now, right, for continuous integration.

Steve: Oh yeah. I can see how that would really save a lot of time in getting that continuous integration environment setup there for your SSIS work.

Andy: That’s what we are hoping for. I kind of mentioned in passing you need to execute the packages. Well, if you’re looking at the dilmsuite.com website, that SSIS framework community edition is a free framework that will actually do that. You have to do a little more work than if you pay for the commercial edition of the framework or the enterprise edition but that’s what it’s there for.

Steve: Wow, good stuff.

Carlos: Very cool!

Andy: Thank you! Yeah, thank you, trying to solve that problem.

Carlos: You know, that’s right. So thanks for the conversation, Andy. Should we go ahead and do SQL Family?

Andy: Absolutely.

Steve: So Andy, how did you get first started with using SQL Server?

Andy: So it was in the 1990s, and I was doing manufacturing automation at the time. On one of my projects we were storing the outputs of tags as they’re called in Human Machine Interface software in an Access database. I believe it was Access too. And to stress test it, I turned on all of the tags, it was over a thousand. And I set the software to record the value every second, and I left for a holiday weekend. I came back and there was a 3GB MDB file on the drive and when I tried to open it wouldn’t open. I went to altavista.digital.com and I typed in Microsoft database and searched and I saw this thing called SQL Server. This was about 20 years ago, maybe a little more. Carlos, I don’t know how long you’ve been around Richmond area but back then they used to have this computer and software sales and shows over at the showplace in Mechanicsville. I went over there, I’m going to say this and then, because it’s been 20 years. I think the stature of limitation is over. But I purchased an NFR copy of SQL Server 6.5 for like $20. It was like a week after that and I took it down to this plant and we installed it and I did the same test and it succeeded. I was able to open up Enterprise Manager, I believe it was Enterprise Manager, maybe it was Query Analyzer, I was able to open up stuffs and still run queries against the tables and get data out of it, even though it was 3GB of data. When they saw that that worked, that the promises were true, they bought a licensed copy. Actually they bought 7.0 because it was out. It just come out I believe and we upgraded, and of course tested it again, it run. That’s what got my toe into water with SQL Server.

Steve: Interesting. I’m amazed on how often we hear that, started with Access it didn’t do what we wanted to do so we did something different from there and that’s how we got to SQL Server. It’s been a pretty common theme along the way with other guests as well.

Carlos: If you could change one thing about SQL Server what would it be?

Andy: It would iterate better. So all software has corners right? Everything has something that it does really really well and something it doesn’t do so well. I do training, Tim Mitchell and I actually deliver the SSIS immersion events these days for SQL skills, and one of the questions I’ll ask to students is, are cursors bad? What do you think? Are cursors bad?

Carlos: depends on the usage, right, your mileage may vary.

Steve: I know, in my experience I’ve seen some cursors used very badly. And I’ve seen some cursors that have been used more appropriately and they’ve worked well. However, they’re not as efficient as I’d like to see them.

Andy: And that’s why I’m going with whole thing about iteration. It’s not just about cursors. Anytime you’re doing any kind of looping inside of the engine. Again, you know, Microsoft love it when I say this way, cursors aren’t bad, SQL Servers are bad at cursors. I feel that’s accurate and I get why the advice is out there, you know don’t use cursors. I had this experience about 10 years ago where or maybe it was more than that, but I was converting some Oracle code over from PL/SQL to SQL Server. And there were some cursors in it that were like three layers deep and they were nasty. And at that time, I knew as much as I do and I promise I’m not picking up on anybody because I’m still learning. But I looked at that and went, “This is just wrong. How could this ever perform?” And we had a test server set in there with it and I ran it and darn men it flew. You know, I rewrote it obviously. I think that may have been my first experience with, and the names of the tables flew out of my head just now, but it was Jeff Moden wrote about it. Back about that time I was looking for it. Some kind of tables, but I’m getting old I can’t remember what things are called. But basically you create a little lookup tables and you just get what you wanted they’re lightning fast.

Carlos: Tally tables.

Andy: Tally tables thank you. I can never remember what they are called. Yeah.

Steve: There we go.

Andy: Yeah, Tally tables, so I have my first experience with those but we had a solution that worked and I checked that box and move forward. Yeah, that would be my answer to that question. I’d like to see it do that. As long as they can do that without breaking any of the other cool stuff because SQL Server doesn’t awfully a lot really well.

Carlos: That’s right.

Steve: So what is the best piece of career advice that you’ve ever received?

Andy: Oh gosh! I’ve received a lot of really good career advice. I’ll share what probably impressed me the most recently. Do what you are, and you know, I’ve been a big believer of doing what you’re passionate about. I modified that belief since listening to Grant Cardone and other speakers talk about this advice where they say, “Become passionate about what you’re doing.” And I agree with that more. I think I was very fortunate in that I’m able to make money doing what I’m passionate about, technology. And, you know, it kind of just fell into place for me. But there are a lot of folks out there who want to get into technology, they’re not particularly passionate about it and they come into the field kind of hesitant and resistant and afraid. And I’ve been tailoring most of my career advice these days to those people and saying, “You know, find a way to get passionate about this.” And I’ve been sharing some of the things that I like and I don’t like.

Carlos: Andy, our last question for you today is if you could have one superhero power what would it be and why do you want it?

Andy: Oh wow! That’s a toughy. I’m a Batman fan, ok, I’m 53 years old. My wife got a picture of me in a kiddie pool when I was about 18 months old with a Bat mobile toy, ok. I am a half century into Batman. I have always been a Batman fan, but the truth is Batman isn’t so much of a superhero. Well, I’ll say it this way, he doesn’t have like those powers like I think you’re asking about here. He just train really hard and he has that discipline to do all that and that drive him to his tragic, it did drive him to that, been a huge Batman fan. But I don’t think he qualifies as having a superpower so I’m going to say that first because first and foremost I’m a Batman fan.

Carlos: There you go.

Andy: I would want to fly. I think Superman’s powers are cool but I would want to fly. You know, I like the idea, I like what Dr. Strange does, right, with folding time and being able to manipulate stuff. But as I’m saying that I’m thinking about what I just said about failing and I will probably blow up the universe. Probably not a good choice after all. Maybe, you know, my third choice, I’ll go to speed. I’m not a fast person at all in any aspects, mentally or physically. And I’ve always been a really slow runner and a poor athlete. It would be cool to run as fast as Flash. I want to stay with that one.

Steve: Nice I like that.

Carlos: Well, Andy, thanks so much for joining us in the program today.

Andy: Thank you guys! It’s been a great conversation. I hope it turns into a fun show for people listening. I really appreciate the opportunity. I think what you guys are doing is great. I’ve listened to several of your podcasts. I haven’t listened to all of them but it’s great to listen to people and learn more about what they’re thinking. And I love the SQL Family questions, those are awesome.

Steve: Alright, well thanks, Andy.

Andy: Thank you guys!

Episode 78: Conversion to SQL Server 2016

As database administrators, we will all upgrade our environments at some point; however, we don’t normally have the opportunity to upgrade to the next version of SQL Server before it becomes available to everyone else.  In this weeks episode of the podcast, Steve and I chat with Brian Carrig about the journey channeladvisor took to implement SQL Server 2016 in their environment, that it was like working with the SQLCAT team, and how they go about making use of some of the new features.  Brian shares with us some of the struggles they were having along with how the 2016 version helped address these issues.

 Episode Quote

“SQL 2016 is probably the version of SQL Server that has had most attention paid to performance improvements in quite some time probably since the 2005 release. They, I believe, I won’t say they promise but it’s not atypical to get 20% performance gains just right out of the gate.”

Listen to Learn

  • What it was like to work with the SQLCAT team
  • The features channeladvisor was after, but also how they went about using them
  • How the in memory options are helps with bursting
  • Why Brian how to create a connect item and what it deals with

Brian on Twitter
BWIN in memory OTLP whitepaper
Using memory optimized table variables
Memory Estimates for memory-optimized tables
Connect Item to store query store in a separate filegroup
SQL Server setup checklist in Github

About Brian Carrig

Conversion to SQL Server 2016Brian Carrig is a Microsoft Certified Master of SQL Server and manages a team of talented DBAs at leading e-commerce cloud solutions provider ChannelAdvisor. In a previous life, Brian spent some time as an academic and holds a PhD in Computer Science. He is a native of Dublin, Ireland but now lives with his wife and two daughters in Cary, North Carolina.

Carlos: So welcome compañeros, this is Episode 78. I am Carlos L. Chacon.

Steve: And I am Steve Stedman. And today’s guest is Brian Carrig.

Carlos: Yeah, Brian is the DBA Manager over at ChannelAdvisor. He is currently living in Raleigh, North Carolina. But you’ll notice an accent. I know that he is not from that area. It’s great to have him and we’ll be chatting with him a little bit about the 2016 features and some of the issues that they have or issues/challenges perhaps, experiences that they’ve had in rolling 2016 out and they’re actually a preview customer so before it was released they had access to it. And so we’ll be going through that. Couple of things we wanted to talk about first and that is I wanted to remind everyone that our SQL Server checklist is out on GitHub and so if you’re interested in contributing to that we’d love to have you. Again, that’s an opportunity there for you to contribute kind of give your ideas in two cents and make it a little more available for others as things continue to change. We’d love to get your experiences there as well.

Steve: Yup, and you know what I think is nice about that is if we get more input on that and update that we’ll be able to put that checklist out maybe a couple of months or once a quarter which is here’s the update with everyone’s suggestions. And perhaps build one of the best SQL Server setup checklist out there.

Carlos: There you go. Yeah, it’s interesting. There are still lots of even though people aren’t necessarily installing SQL Server. It’s still something just to reference because as they’re going in and reviewing environments. It just kind of, “Ok, am I setup correctly?”

Steve: Absolutely, and I’ve found that, I mean a lot of those setup items get missed when a SQL Server initially gets built. People who set it up are not always the ones who know the right things to do for the SQL configuration.

Carlos: That’s right. Ok, so with that let’s go ahead and welcome Brian to the show. Brian, welcome!

Brian: Alright, thanks!

Steve: Yeah, Brian, good to have you on the show.

Brian: Yeah, good to be here.

Carlos: Yes, thanks for coming in. I know we’re all a little bit under the weather. The winter weather has kind of come and then it’s worst and so thanks for taking a little bit of time with us. So ultimately our conversation today is about your experience in migrating to SQL Server 2016, and what that experience was like and some of the challenges that you had? So I guess, first, set us up with that story, what ChannelAdvisor is doing? I guess why did you feel that need to upgrade to SQL Server 2016 before it was released?

Brian: Ok. Well, just to give some background I guess not everybody will know who ChannelAdvisor are and what we do. So we’re an e-commerce software solution provider. We’ve been software solution since 2001, since probably before the name existed. What we do is we whelp retailers and brands to sell online, right. Marketplaces like Ebay, Amazon, Walmart, Jet. And there are a lots of different, there are actually hundreds, possibly thousands of these marketplaces throughout the globe. Basically, we help companies to sell across these marketplaces, manage their inventory and all that kind of stuff so that they don’t have to worry about writing API’s for the various marketplaces. They can just interfaced with us and we will manage their inventory, manage their online sales, digital marketing, all of that. And so we power about I think 3,000 retailers.

Carlos: Oh wow!

Brian: Yeah, with some pretty big customers, Macy, Samsung, UnderArmour, Staples. Yeah, there are quite a lot of big names in there. Before I moved to the US, you mentioned my accent, when I go home to Ireland people say I’ve lost all my accent so. Apparently not.

Carlos: Yes. Isn’t that the case. My dad has the same way. My dad is from Costa Rica. He came here when he was 15, lived here for 30 some odd years. Goes back to Costa Rica now and people are like, “Is Spanish your first language?” He was like, “Yup.”

Brian:  Well, when I got home they say I sound like an American now so.

Carlos:  You’ve been Americanized.

Brian: Yes. So we recently just went through what we call Cyber Five, right at thanksgiving that whole weekend Cyber Monday. And during that weekend we[00:05:00] will, last year we were close to a quarter billion in sales. I won’t say this year’s because I believe the results are not actually official yet. I don’t want to get in trouble with the SSC. But so far as to say we do a lot of processing during that weekend and that’s kind of what drives our uses of SQL 2016 or why we wanted to move to SQL 2016.

Carlos: So that’s your peak usage time?

Brian: Yes. Yes.

Steve: So then, given that that’s the peak usage time, I would assume that probably in July or August you’re really starting to ramp up for that. Is that about the right time for it?

Brian: Yeah. Generally, we’re trying to get the infrastructure in place probably around July or August and kind of gearing up then for the load in the 4th quarter. And then based on just natural growth, we might see a bit of a dip and Q1 of 2017 say. And then we start to ramp up just from natural growth and we kind of get ahead of that. And SQL 2016 is probably the version of SQL Server that has had most attention paid to performance improvements in quite some time probably since the 2005 release. They, I believe, I won’t say they promise but it’s not a typical to get 20% performance gains just right out of the gate. So that’s what interested us. We’re quite sensitive to cost even though we have quite a large SQL infrastructure because we make money when our customers do. So if we just go crazy and expand and deploy a huge amount of infrastructure we can impact our own profitability.

Carlos: Sure. Now, has any of that, I mean using cloud technologies or is it still On-Premise.

Brian: Pretty much everything we do is On-Premise. We do have a footprint in AWS. We have some customers that where we run SQL Server in AWS for because we have geographic restrictions on where their data can reside so we leverage AWS’s data center in Ireland actually to run their workload.

Carlos: Ok, got you. So you’re an early adaptor before it was available. I guess, what was that process like?

Brian: It was fantastic. I can’t compliment to SQL CAT Team enough. It was a real eye opening experience. I once have had terrible experiences with Microsoft support prior to these but this is another level entirely. So yeah, it was really enjoyable.

Carlos: Well, so you mentioned you are looking for 20% increase so that’s the marketing was telling you, right? So you get it in. You get the SQL Server 2016 version. How do you go about to testing that and kicking the tires to see what’s that going to do for your environment?

Brian: Right, that is difficult, right? Currently, we have our 118 SQL Server instances in production. I think it’s close to 3,000 databases and our environment peaks at about 1.6 million transactions per second. It’s really really hard to get a dev environment that reflects that. But you know, we do what we can. In particular, we knew we had some problem areas, and one of the most In-Memory OLTP. We’ve been using that since SQL 2014. While the performance gains were incredible when the features that’s kind of limited in terms of what data types and SQL syntax you could use. We also had a lot of stability issues with In-Memory OLTP particularly around the checkpoint engine. So we have cases, sorry go ahead.

Steve: Oh no, I’m just going to say what type of issues did you run into with that that’s interesting?

Brian: Oh ok. I think I found the connect item on this or somebody did. It was fixed in a hot fixed. But we had an issue where if you run out of transaction logs space. So basically your transaction logs filled and even before Auto Growth could kicked in. The checkpoint thread for the In-Memory OLTP engine would die and fail to
respond. So the only way you could kind of get that process to restart was that take your database offline and bring it back online which in highly transactional environments is problematic.

Steve: And given at that checkpoint files really the only place that new data is going to exist. I could see how big of issue that might be.

Brian: Yeah, so the first time it happened we did not notice and the transaction log on the server was 600 or 700 gigabytes and it did not get truncated until it filled because of this. It would basically have a full transaction log that was 600 gigabytes in size. It ended up being quicker to restore the database from backup rather than first taking the database offline and having it start and go through crash recovery.

Carlos: Oh wow!

Brian:  Yeah. That’s a lot of role for it.

Carlos: Yeah, exactly, scenario you see every day. Now that’s on the 2016 environment or that’s 2014?

Brian: That was on the 2014 environment. They actually fixed it in the service pack for 2014 but I guess that was the point of which we were kind of looking around and saying, you know.

Carlos: Ok, we need something different.

Brian: Yeah, we heard that they had rewritten a lot of the engine for In-Memory OLTP for SQL 2016. So that really prompted us, apart from performance gains. That was another thing that kind of prompted us to go, “Hey, we’d really like to get into 2016 as quickly as possible.”

Carlos: Ok, so then, I guess any other changes to the In-Memory objects if you will. Or what is really just like I needed a new version?

Brian: There was the stability side but also availability of constraints. I think that using In-Memory OLTP has become a lot more viable now that constraints are available in terms of foreign keys, and default constraints, and unique constraints.

Carlos: Now, it reminds me, so the foreign keys, the In-Memory objects. Can I have a foreign key from an In-Memory table to a disk? What are they calling them? What do we call them, not In-Memory table?

Brian: Disk based tables.

Carlos: Disk based tables. Yeah.

Brian:  I actually do not know. I have not tried that.

Carlos:  But you’re talking about foreign keys between two In-Memory tables?

Brian:  Yes.

Carlos: Oh yeah, ok. I wasn’t sure. I didn’t think that was an option and I thought ok maybe that had changed. So ok, so then I guess can you tell how many In-Memory tables do you actually have?

Brian: Right now? I want to say 3 or 4. So it’s not extensive. It could be more. Kevin designed a lot of them so he’ll probably correct me after this confessional.

Carlos: Another excuse to get Kevin back on the podcast.

Brian: There you go. He can back and explain everywhere where I was wrong. I want to say 3 or 4 of the regular In-Memory tables that’s in the database. But what we’re using extensively now and I don’t know if enough people know about this is you can create a table type that is memory optimized and use this as a table variable. Either as a table variable in the procedure or, you know, if you use TVP you can pass it in as a TVP. We’ve seen some incredible performance gains from this. And it’s really simple to implement. I think I have given an example where we reduced the CPU consumed by a procedure by about 75%.

Carlos: So when you’re doing that at that way basically you’re kind of skipping TempDB for the type of jobs or the type of tables that would normally go into TempDB and you’re doing it as a memory only, In-Memory OLTP table, is that right?

Brian: That’s right. It doesn’t touch TempDB at all. So what we were finding after we deployed SQL 2016 and we got those kind of performance gains that we’ve mentioned is that like always here. You’re just shifting the bottleneck to somewhere else so where we start to see some stress was contention in TempDB around reading and writing from the system tables. And the way we looked at reducing that was basically take some load off on TempDB by using memory optimized TVPs.

Carlos:  Always the big system. How do you take it to account the memory
considerations now? Because I mean obviously one of the things with memory optimized objects is that you have to account for that In-Memory now which, you know, previous to that it was just, ah we’ll allocate, you know, 85% or whatever the percentage was of the memory to SQL Server and be done with it. Now we have to carve that up little bit. Can you talk to us about how you decided to do that or what were your considerations there?

Brian: Yeah, certainly. So if you read books online, they say for your fixed tables, right, not temporary objects that you should allocate about 3x the size of the table. But really what it depends on is what your frequency of insertion and deletion from that table is. So basically how much work does the background garbage collection process need to do and how much overhead do you need to handle all the raw versions essentially. And typically if you’re looking at In-Memory you probably have a really high frequency of insert and delete and update type operations.

Carlos: Right, that’s why you’re using it, right?

Brian: Right, it wouldn’t be a problem if your table wasn’t heavily access to. You wouldn’t be looking at this. And also, you know, concurrency because the wider and more sessions that are doing this the more versions that will be held In-Memory. So we have found that in cases you might need to allocate as much as 10x. Yeah, and we talked to a company called bwin.  These are like the pioneers of In-Memory OLTP. They’ve worked a lot with SQL CAT. They’ve got some great white papers out there about using In-Memory OLTP to really boost your performance and the amount of transactions that you can support, and they had similar experiences.

Carlos: I admit I don’t have extensive experience with the In-Memory objects but knowing, because that’s what the recommendation is playing for 3x but you’re seeing a 10x increase. Are these for the DMVs or what are you using to then monitor how much memory the tables were actually taking up?

Brian: So I use Resource Governor and there are also DMVs that will tell you how much memory it is using. And one of the tricks that you can do and I recommend doing this because in a lot of cases your big concern is that you will run out of memory and you will cause the server to crash. If you have just memory optimized objects in one database you can use Resource Governor to bind that database to a resource pool and fix the amount of memory that it can use. So if you’ve run out of memory that you’ve allocated to that database, queries will fail but your server will stay up.

Carlos: Will stay up.

Steve: That’s a very creative approach there. I think often times resource governors really gets downplayed a little bit is not being that useful. But I mean that sounds like a great way to be using it.

Brian: I love Resource Governor. There’s not many people that use it. But we use it extensively here.

Carlos: Interesting, that might be another topic because I agree. It seems that Resource Governors kind of on the wayside but scenarios like that do make it very appealing.

Brian: Yeah, because we’re  and we have a lot of different product teams that often use the same database infrastructure. We use Resource Governor sometimes without doing any actual limiting or governing of resources. What it does is it splits your plan cache. So if you have happened to have teams that use procedures in different ways. They get their own plan cache and you can reduce parameters anything issues.

Steve: Oh, interesting. So another question around the In-Memory OLTP and this is kind of the concern that I came across as I’ve been exploring it, is that and this probably comes from my background with database corruption and checkdb and things like that. But checkdb and checktable won’t check In-Memory OLTP tables. And I guess is that something that you’ve run across or have problems with?
Brian: We have not encountered any problems with that. Yeah, it’s definitely a concern. I mean, the tables that we use for In-Memory OLTP are largely can be taught of as like staging tables, ETL processes. And obviously we extensively use the
memory optimized TVPs. And just to talk about hopefully not to kind of frighten some of your listeners too much, if you’re just using the memory optimized TVPs so these are all temporary objects, you probably won’t even notice the amount of memory that’s been consumed by In-Memory OLTP because they’re short lived.

Carlos: Yeah, exactly, it will be almost like Temp tables, right, you know, in a sense.

Brian: Yes. But no, we haven’t encountered any type problems there. We have encountered corruption on our disk based tables but nothing in the In-Memory OLTP space yet.

Steve: Well, ok, that’s good news on the In-Memory OLTP side.

Carlos: Yeah. So I guess that’s maybe an interesting point we can potentially take us off the line. But how, so the corruption on the disk happens, you know, I guess there’s something get switched there. As long as the database knows what records are there, I guess how could corruption occur there. So your memory would have to become corrupt. It would then affect more of your hardware.

Steve: Yeah, you mean with the In-Memory tables, Carlos, right?

Carlos: With the In-Memory tables, yeah.

Steve: Yes, so I think the thing I’ve seen and I experimented with a little bit is if you have corruption in one of the checkpoint files where all of the changes are tracked in that In-Memory OLTP table to the checkpoint table or checkpoint file. If one of those is corrupt you’re not going to know it until you, and the only way you’re going to know it is when you try and back it up, either with a full or differential backup.  And at that point you’re backup is going to fail, and it will tell you that the, it usually checks on there that you’ll see. And then you can’t actually backup the database and your only point in that, or the only thing you do with that point is go to a previous backup version of that database.

Carlos: Do you have anything else you want to add there, Brian?

Brian: I mean the checkpoint files are really just used for recovery purposes, right, and also for the backups. So in that scenario where you encountered at the backup has failed and you have some kind of storage corruption, presumably your actual data will be fine, right?

Steve: Well, if you’re pulling that data from another table that is a disk based table, yeah it will be great. But if it’s going into In-Memory OLTP table that data may only exist in that checkpoint file until it gets backup.

Brian: Oh, ok.

Carlos: Got you. Yeah, crazy.

Steve: Yup, so. It’s definitely, I mean In-Memory OLTP is definitely a huge performance gain there. And it sound like that’s what you’re using it for and you’re definitely seeing those results.

Brian: Yes. I mean, I think our plans for 2017 definitely involve more In-Memory OLTP.

Carlos: But that’s not the only feature that you’re using in 2016?

Brian: No, another feature that we’re using extensively is Query Store.

Carlos: Right, and this made waves recently because they had enabled it by default in Azure SQL database.

Brian: Oh, that’s right. Yeah, that’s right.

Carlos: So they maybe, you know, who knows in the SP2 it might become available. This is a pure conjecture. This is not an announcement. But I could see that coming in future versions being enabled by default. But currently it’s not.

Steve: Yeah, I could see it and even during the preview top program. They were changing the defaults almost on a monthly basis. I don’t know if anyone was paying a lot of attention to the CTPs but almost each month they kind of they got tweaked.

Carlos: Yeah, I know when I talked with Borko, you know, the PM about that. They we’re looking at those usage patterns. They wanted to get that feedback, see how people were using it and then being more responsive if you will.

Brian: Yeah, for us because we have a lot of adhoc queries. They switched the cleanup the queries that are not frequently used was important.

Carlos: Ok, so I guess talk to us, so I guess we should review quickly what the Query
Store option is. And for my perspective, right, it’s basically tables that are inside the same database that is going to keep a record or a history of all the executions that have happened on your database for further analysis. Is that fair?

Brian: Yeah, I think that’s fair. I think Borko describes it as a flight recorder for your database. And I think that’s a really good description.

Carlos: Yes, that’s right.

Brian: So it keeps a record of all of the queries as they execute and then the plans that they use. So even though, probably the primary use case is that, right, looking for plan changes that are problematic. There’s a lot of rich data in there that you can use for other things as well.  Even cases where it might be queries that are aborted, or cancelled, or timed out for some reason that’s all data you can pull out of there.

Carlos: Right, I know, from the DMV perspective, you know, that you can only see either cumulative totals or the last execution counts. And so that scenario where, “Oh, this morning I was having a problem but now it seems fine”, you know, not much that you can do about it now.

Brian: Yeah, so we typically, prior to Query Store we would typically run Adam Machanic’s sp_whoisactive every minute and dump that to a table. So we would have some kind of history of what’s executing when a developer or a customer comes to us and says, “What, you know, there was an issue an hour ago or two hours ago. What happened?” Right, it’s really hard to tell. So that’s what we use that for and that Query Store is kind of like a more enhanced version of that.

Carlos: Ok, interesting. So, I mean, you talked about aborted queries, you know, that you’re starting to look at or I guess understand better. Other things that you’re getting out of there that maybe you haven’t thought of before?

Brian: We use plan guides and once extensively but we have certainly areas where we use plan guides and if anybody has worked with plan guides to force plans, I mean Query Store is amazing, right.  It execute a procedure, you pass in two parameters and your plan is forced. None of this messing around with like huge blocks of XML or trying to capture the correct plan from the cache.

Carlos: Ok, so now you’re actually saying I can go back into Query Store. I can capture the plan. I have two plans, one is good, one is bad. And I basically say, “I want to use this one.” And that’s your plan guide?

Brian: Yeah. It is a force. It’s not, “Hey, would you mind using this one?” You’re forcing it to use this plan.

Carlos: Sure. Well, that’s kind of an interesting option and I say this only because I was just writing test prep questions and so I’ve been looking at force. Force is kind of an interesting option because any plan is really a force, right. You’re telling it, “Hey, you need to use this.” Instead of what you think you’re going to use.

Brian: Sure, but you’re not giving the optimizer a choice. So the optimizer is being told, “Here is your plan, execute that.”

Steve: And I think that’s a much more direct and guaranteed way of using plan guides rather than saying here’s the plan guide and here are some suggested hints to use.

Brian: Yeah, certainly what we have found is that we’re often using hints to try and get to the plan that we want. As you say, this is more direct. Here, I know the plan, I won’t use it.

Carlos:  Right, right, right.

Brian: I mean that should be a pretty rare scenario. Usually the best thing to do is to leave it to the optimizer to figure out the best plan. But there are always those cases that you have to deal with.

Steve: Yeah. I’ve seen that where, like, if somebody adds a bunch of hints into a query that then make the query behave poorly you can then use the plan guides to override those bad hints that were given to the original query and if you can pull that information around the Query Store that would be extremely useful.

Brian: Yeah, and that you could use it to address the same problem that plan guides largely used to address. Cases where you can edit the query to put hints in there so you use plan guides to force the behavior that you want, you can use Query Store in the same way. So if you deal with a lot of ISVs and you’ve got issues with parameters anything this could be a way to address it.

Carlos: Very good. So the last feature we want to talk about is the column store indexes.

Brian: This is one you’re definitely going to have Kevin back in for. He is our store guru.

Carlos: Yes, so one thing I’ve always, I guess because the example that they frequently
give for the column store indexes is like state. Right, and like ok, I get it state makes sense. I’m constantly grouping by that, I’m reporting by that. You know, things in that nature so that makes it a good for column store index. But outside of that, like, it might be just basically looking for common things that I am either grouping by or reporting on? And that’s where I want to go for my column store indexes?

Brian: Possibly, I mean, one of the things that maybe kind of gets overlooked a little bit. It’s just how good the compression can be on column store. And a lot of your performance gains might be from that so, we had a case where we had 4TB fact table that was reduced to about 600GB in size.

Carlos: Wow!

Brian: I’m making it, yeah, clustered column store index. Also you can generally reduce the amount of non clustered indexes that you need when you basically re arranged your table as a column store.

Carlos: And I guess that’s the new feature in 2016 is I can now create that as a clustered index whereas before I couldn’t.

Brian: Yeah, you can basically mix and match now so you can have clustered column store with non-clustered V3 indexes. Or you can have, probably less common, you can have a regular V3 clustered index with non-clustered column store indexes on the table. Yeah, and one of the gains now as well with 2016 is batch mode execution for column store.

Carlos: Right, now batch mode meaning being able to update and write to.

Brian: Yes, it will execute in batches and you can get considerable performance gains there.

Carlos: Right, very cool. Very cool! Well, awesome. It’s always good to talk with other folks about why they’re doing certain things? How they’re doing them? You know, a lot of times even on our podcasts sometimes we get kind of carried away. Well, I think we always try to dive in to why we’re doing that but discussions like this help bring the problem a little but more to the front and then how we are going about solving that. Should we do SQL Family?

Steve: Yes, let’s do SQL Family then, so one of the first questions there is keeping up on technology? How do you keep up with technology and all the changes that are happening specifically around SQL Server?

Brian: That’s kind of easy in here because we have so many people working here who are really into SQL Server. So everybody can be everybody else’s filter. So everybody is kind of looking at blog posts and so forth. We have a very very active user group in Triangle area. So yeah, basically just conversations that work will usually keep you firmly in the loop.

Carlos: Yeah, it’s interesting. Brian, you’re actually the third person from ChannelAdvisor we had on the podcast. We’ve had Mark. Mark Wilkinson was actually the first, and then Kevin, and then now yourself. Yes, we run into your folks from time to time of course being on the East Coast.

Steve: Yeah, I think you’ve got like, maybe 10 more SQL professionals to go if you want to get the whole set.

Carlos: Only 10, there you go. How did you first get started with SQL Server?

Brian: In a previous life, I was a Linux administrator. Mainly working with MySQL, and the company I worked for at that time took on a project. They didn’t have any in house developers so they took on a project where the project was going to be developed in Delphi so it made into the SQL Server backend and nobody was around to manage it so I did.

Carlos: Ok.

Brian: I started from there.

Carlos: Alright, and then if you could change one thing about SQL Server what would it be?

Brian: Apart from the licensing cost? I will say, TempDB, and I think that’s a major bottleneck for SQL Server still. I would definitely like the ability to allocate TempDB on a per database basis.

Steve: Yup. Alright, so little bit of background then there, Carlos, this question about changing one thing with SQL Server. Where did that originally come from?

Carlos: Yes, we reached out to the audience and asked them what questions should we asking and Brian came up with that one. And it’s been interesting to get, it’s been a good one, right, lots of different thoughts there.

Steve: Give credit where credit is due.

Brian: Thank you! I only realized before the show. I didn’t necessarily have an answer
myself.

Carlos:  Yeah, it’s much easier to ask the questions than it is to answer sometimes. What’s the best piece of career advice you’ve ever received?

Brian: Alright, well so I’m going to credit the leader of the user group in Dublin for this piece of career advice, a guy named Bob Duffy. Some of you in the SQL community have probably heard of him. And the advice was that I should sit the Microsoft Certified Master Exam. When they announced that it was going to be closed, I kind of assume my chance was gone. He was the one who really said no just go for it. And I kind of squeaked at the end there.

Carlos: Wow, ok, very good.

Brian: Early days before it finished.

Steve: So if you can have one superhero power, what would it be and why would you want it?

Brian: Ok, I was going to say the amazing power to reduce SQL licensing cost. I don’t know if that’s. But actually now that I think about it, I think flight would be good. I’ll fly over back to Ireland and, you know, when you’re paying that ticket for four people it kind of adds up. So if I can only fly everybody over with my superhero power that will work out great for me.

Carlos: There you go.

Steve: Yup. And I will be looking for that new comic book of the new superhero that reduces the SQL Server licensing cost.

Carlos: Marvel, if you’re listening, it was our idea first. Ok, well Brian, thank so much for being on the show today.

Brian: Thank you, happy to be on here.

Steve: Yup, thanks Brian. It’s been fun.

Episode 77 Reporting Improvements SQL Server 2016

We invited Paul Turley on the show to chat with us about some of the new features in SQL Server reporting services, but were pleasantly surprised to have the conversation take a different direction.  Paul talks about some of the new training available for those interested in visualizations and why this community can be hard to define–and connect with.  There are so many visualization options and even within Microsoft we have many products that overlap in functionality.  In addition to talking about the pick list items of new features, Paul gives some guidance and why you should choose certain products and what makes good use case scenarios for some of the new features.

With the new analytics features now available in SQL Server 2016 via SP1, I think there is going to be additional requirements for data professionals to provide better analytics features and this episode is a good starter for how to frame those conversations.

 Episode Quote

[There is] “this challenge where the industry is very fragmented. It’s moving in different directions because businesses are adapting technology in different ways.  The reason there are different tools because there are different needs and disciplines. ”

Listen to Learn

  • Why it can be hard to connect with others in the visualization community
  • Some of the new features in SQL Server 2016
  • Why choosing a visualization tool can be difficult
  • The initial architecture of reporting services
  • Good candidates for mobile reports
  • High level architecture for mobile reports
  • The new architecture of reporting services and support for multiple browsers

edX course on data visualizations
Paul’s book on Reporting Services in 2016
PASS Analytics Day

About Paul Turley

Paul TurleyPaul Turley is the owner of Intelligent Business, LLC a consultancy with dedicated Business Intelligence practitioners, visualization designers, mentors, instructors, project managers and solution architects.  He is one of the leaders of the Portland SQL Server users group.  Is both a Microsoft MVP and a Dun and Bradstreet MVP.

” open=”off” use_border_color=”off” border_color=”#ffffff” border_style=”solid”]

Episode 76 Availability Group Improvements

Availability groups provide some exciting features in mixing high availability and disaster recovery; however, from a performance consideration, there are a few drawbacks.  With the advances in SQL Server 2016, our guest Jimmy May, set out to test the features and see if they could get some really high performance out of an availability group with two synchronous replicas.  In this episode he talks with us about some of his findings and some of the pains associated with getting there.

Episode Quote

“I’ve been a big fan of flipping the faster bit ever since even before I was a member of SQLCAT.”  Jimmy May

Listen to Learn

  • What improvements have been made to the log transport process
  • Changes to the log redo.
  • A new startup parameter -k and what it does
  • Why hardware validation is still important
  • Jimmy was the guest of what celebrity from the 90’s?

Jimmy on Twitter
Get the whitepaper with all the notes from Jimmy’s test

About Jimmy May

Jimmy MayJimmy May is a SQL Server technologies for SanDisk, a Western Digital brand.  His primary skill set is related to tuning & optimization of SQL Server databases. He formally worked at Microsoft on the SQL CAT (Customer Advisory Team).  When he is not working on databases, you can find him on skiing wherever they might be snow.

Carlos:             So Jimmy, welcome to the program.

Jimmy:            Hey guys, thanks for having me. I really appreciate it.

Carlos:             Yes, it’s nice of you to come out. Always interesting to talk to the former Microsoft folks. Guys have worked on the SQLCAT team, lots of experience under your belt. So thanks for coming and being willing to share a little bit of your time with us.

Jimmy:            Sure. Thanks!

Steve:              And I know, I’ve seen Jimmy present a couple of times and I’ve always had a great time learning along the way so I hopefully will have that same experience today.

Jimmy:            Crossing our fingers.

Carlos:             And one last thing just to butter this bread a little bit more. One of the nicest people I think in this SQL community always willing to talk, kind of get some feedback, or share your thoughts, opinions with those at conferences, SQL Saturdays, you know, what have you. And so if you ever get a chance to go and say Hi to Jimmy take advantage of it because you won’t regret the time.

 

Jimmy:            I really appreciate that guys.

 

Carlos:             Ultimately we want to talk about tonight is we want to talk about your experience that you had in putting together an Availability Group, and trying to understand some of the pain points that customers are experiencing, and then trying to figure out how to get around those pain points, and kind of putting the proof in the pudding if you will at some of the advancements in SQL Server 2016.   Yeah, so let’s go ahead and dive into that. Give us a quick recap of kind of that how you put that together and then we’ll dig in to which you found.

Jimmy:            Ok, well, the goal is to, as you stated, we want to put together an Availability Group architecture that was actionable not just, we weren’t just trying to set a super high world record numbers that something we can tassel the transom for an arbitrary customer to read and implement. And along the way we, of course as you know, I work for a company that solves a lot of flash. And I’ve been a big fan of flipping the faster bit ever since even before I was a member of SQLCAT. And so that was big part of the picture and there are some challenges even on flash, storage latency, things like that. And we have a handful of lessons learned. We proved out the, really exciting a log transport improvements that were a bottleneck in Availability Groups in 2012 and 2014. No longer we get that data across the wire in a near real time. However, there is still some issues with the log we do at the high end performance and along the way also implemented some other findings. The –k startup trace flag which is something that virtually any installation can take care of. We’ll go into it more in detail I hope later but we have startup trace flag, fully supported, not well documented to a throttle checkpoints at a level that you designate and that’s pretty exciting. There’s amount of lessons learned.

Carlos:             So why don’t you take us through I guess what that setup looked like. You have a couple of boxes and ultimately you wanted to use Availability Groups and you wanted synchronous commits from the primary to both secondaries.

Jimmy:            Yeah, to setup itself was largely pretty vanilla and we do that by design. The only “exotic” pieces were some of the hardware components to make sure we didn’t run into bottlenecks. But we had three 2U boxes – HP DL380 G9s. Our friends wouldn’t like us to call them commodity boxes they’re super fantastic and amazingly performant but they were off the shelf boxes. Two sockets, we used Broadwell processors, also started with Haswells but the tragic, during the duration of the project we upgraded. 256 Gigs of RAM, nothing crazy. And we had some pretty high network throughput. It didn’t come close to hitting the bandwidth that we had. For availability, we put in two metal [inaudible – 4:15] 40GB a piece and we theme those using standard Windows theming. Pretty simple to setup and configure something which I had not a whole lot of experience but it turns out to be very easy to setup. And even a non network geek could do it with a lot of ease. So we have a total aggregated bandwidth of 80GB so if you wanted to use it. And that would serve as well want to put together multiple AGs, Availability Groups. And what else can I tell you? Oh, the data of course of, the storage, the stars of the show. We used, flash doesn’t come in just a lot of us are used to seeing the, what used to be Fusion I/O, PCIE cards, right? Well Flash comes in 2½ inch standard small form factor flavor now. And what we did was we started out with eight, decided to end up with ten 2½ inch disks and a Flash. Only 800GB a piece, relatively small, there is coming a 1.6GB flavor. And in fact, our company makes 4TB 2½ disks. It’s amazing the whole 2½ disk in your hand and you’re holding a 4TB. And very very soon we’re coming out with a 8TB disks. By the way, these were, a total of ten 800GB disks.

Carlos:             Inserted right to the box?

Jimmy:            Right to the box. Yeah, that’s one of the amazing things about these “commodity” servers off the shelf. A variety of volumes including HP. Make this boxes with 24 slots. Not a special order. You would say, “Hi, I want the box with 24 2½ inch slots in the front.” And you get it. And so think about that you put 4TB disks in there you got 986GB of raw storage. Put the new 8TB disks in there and you’ve got almost 200TB of raw storage in boxes that’s about as big as two pizza boxes. It’s amazing. It’s really amazing and you’ve got the new processors from Intel. It’s just amazing you have CPU to drive that capacity and the performance. Anyway, but we used six 800GB disks for the data and four 800GB disks for the log.

Carlos:             And then it’s all yours. You’re not going to share those disks.

Jimmy:            That’s correct, it’s a non-shared storage. Oh my! All yours.

Steve:              I’m just listening to describe that. I’m just thinking about configurations we have where we have shared storage or a bunch of virtual machines sharing the same hardware. And I was just thinking the configuration you’ve got there is certainly going way more performant than what I typically see.

Jimmy:           No question about it. Well, wait until I start talking with performance number. You know, again, a guy who used to dedicate his career to flipping the faster bit. We got some latencies that were so low that I had to double check the math. We will get into that later, hopefully.

Carlos:             Ok, so I guess, let’s go back and now kind of set the stage for some of the pain points you managed to resolve there that we’ll talk about. And so ultimately again, in our podcast we want to make sure that we kind of set the level of playing field and we’re going to try to explain some of these concepts. So with the Availability Group, you have a server. You’re setting it to synchronous commit meaning you have a second server. The transaction has to write to the first server, send it to the second, commit on the second, get a reply back and then the transaction is complete.

Jimmy:            Exactly, that we have three replicas not just two or one primary, two secondary. We actually have to get this act from two servers before we proceeded.

Carlos:             Right, the act meaning an acknowledgement?

Jimmy:           Yes, yes, before the data can be hardened or committed back on the primary log.

Jimmy:            Well, the way it work in terms of AGs is the log is flashed from the primary database. Let’s call this the main server, the primary server. The data is flashed to the Availability Group’s logs. In this case we have a simple one database one AG. Ok, and the data on that log is what is shipped across the wire. It’s received by the secondary AG logs and then it’s applied the replica data files on that side.

Steve:             So when it’s supplied on that side then that causes the write to the data file appropriately and then that log get saved as the local log file. Is that correct?

Jimmy:            Yeah, it’s the local log file. And on fail over for example that log file becomes the primary log file. The secondary, the recovery queue is emptied. The secondary is the AG does a little bit of inter manipulation to say. Preparing myself to be the new primary and then that what was the secondary becomes the primary and the user transaction transpired there just like it did originally in the primary.

 

Steve:              So then you mentioned some enhancements with SQL Server 2016 around that process of getting the logs transported then the logs redone on that end.

 

Jimmy:            Ahh! We were so pleasantly surprised. As early as far back as CTP1 we were seeing this enormous in performance improvements of getting the data across the wire. So you see historically and this is part of the hashtag that just runs faster. SQL 2016, a lot of amazing performance improvements have been implemented. I’m so truly, again, this is Jimmy May former Microsoft employee, former member of the product team, but third party now, ok. So I can say this with complete impartiality, “Huge fan of what the SQL Server product team has done.” SQL Server 2012 is when AG is introduced and there was a bottleneck. It was tragic. AGs were pretty cool conceptually, you know, an enhancement of database [inaudible 11:09] but we were limited to 40 or 50 mbps across the wire. No matter what hardware you had. No matter how many CPUs, no matter how much memory, no matter how broad your network pipe was. There was a hard coded bottleneck build into the Availability Groups. It was a visage of the database [inaudible 11:30] code buried very deeply in the valves of it. When it’s code was written way back when there was no hardware in the world that require the kind of throughput that you could get 4-5 years ago and certainly not today. So SQL Server 2012 and 2014 there was this hardware bottleneck that you just could not work around, 40-50 mbp. In 2016, they unleashed the hounds. Again, without no special tuning, same server, hardware you just use a different version of SQL Server – 2014 vs. 2016. And I’ve got some great charts. We have, in fact, I’ve got a whitepaper you guys need to, it’s part of the podcast. You need to publish the link to it that has the documentation for all these. And what I hope is a very lucid presentation. And the appendices are amongst the most valuable pieces of a whitepaper. You got the usual “yada-yada” and the body of the paper. But the appendices have some very interesting findings that a geek like me and like you guys would be interested in, including charts comparing head to head matchups of SQL Server 2014 log transport vs. 2016. Again, just changing the version of the software, version of SQL Server on the same hardware. We went from that 40-50 mbps up to a quarter of a gigabyte. Boom! And that genuinely expanse the scope of applications that are suitable. Even warehouse, ETL, any kind of index maintenance etcetera. It was easy to hit that bottleneck in 2012 and 2014. And the problem, let me back up a little bit, the problem with this bottleneck is it isn’t just that your bottleneck and you can only get 50mbps across the wire. While that data is being generated on the primary application it is queued up on the primary bottleneck waiting to get across the wire. Waiting to get, you know, inserted into the log transport mechanism. And if something happens to that primary you’ve lost data. I mean it’s possible you can regenerate it depending on your application but at a fundamental level that data is gone. And that’s not a formula for higher availability whereas now up to, easily up to 250mbps, and with little tuning you can get more, is in real time sent across the wire via the log transport mechanism hardened on the secondary logs in real time. It’s just amazing. So on fail over, boom, the data is there. No data loss, so not only do we have better performance, you’ve got better HA and then DR. Ok, so that’s the log transport mechanism. Do you want me to dive into another aspect of the AG log related stuff?

 

Carlos:             Yeah, I think we are going to talk about the log redo?

 

Jimmy:            Yes, exactly. So we’ve just been talking about the log transport and the exciting performance improvements that 2016 gave us. Log redo is the other piece of the Availability Group log related mechanisms that is required for the process to work properly. And the log redo is simply a continuous restore process. Your

data that shoveled across the wire is hardened on the secondary log and then via redo process. It’s applied to the constituent secondary replica database files. This is something you eluded a little while ago, Steve. Nothing too exotic about it just a continuous restore process. The challenge here though is that the high performance levels we are generating. The redo can’t keep up. Now the team, the SQL 2016 team, did a great job. They’ve turned it into a parallel process. That helps but it still just can’t keep up. And so at the maximum kind of levels we were doing the redo was falling behind. That’s not a formula for true high availability. Disaster recovery, yes, but in the event of fail over with if the recovery queue in the secondary are building up, when you do try to fail over that recovery queue is got to empty before the server is actually available to accept transactions. Now the good news is at this point there aren’t very many applications that require the kind of throughput we were throwing. You know, again, it’s pretty impressive. This 2U boxes, two sockets, you know, two CPUs, a little bit of flash, workload generator throwing 250mbps across the wire. Now how many applications require that kind of performance? Well, not that many frankly today. Even some of the apps I used to worked for at SQLCAT just a handful of them. So the great news is, as always the SQL Server team is very receptive. They are aware of the issue and they are actively working on it. So unlike the log transport issue which took them two versions of SQL Server to remediate, this issue, which again is not a problem for hardly anybody in the world today. It was likely to be remediated of relatively medium term before becomes a problem for the rest of the world, for your day to day application.

 

Steve:              So then just to recap to make sure I understand that bit. If you’re pushing the data through on the log transport at up to that quarter of a gigabyte per second then your log redo queue is going to fill up on the secondary side and get backlog to the point that it may take a while to catch when you fail over.

 

Jimmy:            Exactly. Exactly correct. Good summary, very peaty there. I wish I could be that peaty. Yeah, and so the log redo recovery queue just continues to grow at these high performance, at these high rates of performance. Most high performance applications aren’t get even close to that kind of a throughput and you’re going to be ok. But this is a theoretical event. It’s really interesting though, again, I mentioned the appendices in this whitepaper we just published. The appendices has this in detail. It actually shows some of the internals how you can interrogate it looking at what’s going on the secondary replica etcetera to see for yourself what’s going on.

 

Steve:              And we’ll have to include the links to that in the podcasts notes.

 

Carlos:             That’s right which will be at sqldatapartners.com/jimmy.

 

Steve:              Alright, so unto another topic. I know you mentioned the –k startup flag. Can you maybe jump into that a little bit and what you’ve found in there?

 

Jimmy:            Ok, this is one of the most exciting takeaways of this work. I know during our podcast, when we are level setting. I asked of you guys if you’ve heard about it. And the truth is very few people have heard about it. I first heard about the –k startup trace flag in SQL Skills Training during immersion, IE 01 with Paul, Kim, Jonathan, Erin, Tim, Glenn etcetera recently.

 

Carlos:             I managed to look it up on the MSDN and it’s not listed on the engine services startup options.

 

Jimmy:            It is fully supported. And it has been, believe it or not, since 2005. So we need to get that fixed. You know what we have great connect item. It is hard to find documentation. In fact, when I was first, I remember using it, I was involved again as a third party person, post Microsoft. I was invited by the CAT team come in and assist with the lab about a year ago. And I first, I saw this implemented in real life with one of the customers that do crazy cutting edge stuff. And my friend, Arvin, one of the first MCMs in 2008 was implementing it with this customer, and so I got the chance to see the behavior of –k in real life. And it’s

amazing. It’s something that I’m pretty sure a lot of places that need cutting edge performance are going to start adopting. And here is why. Even in conventional shops, you know, as a consultant for Microsoft services, you know, 10 years or more so ago. My old day job, big fast database, when checkpoint occurs, yeah we didn’t talk about it, we talked about –k. What is –k? It throttles the checkpoint to a user defined level. Many of us who have any experience in I/O or familiar with the checkpoint spikes that occur. You know, your dirty pages in the buffer pool, SQL Server at times sufficiently often to keep the database recovery on fail over to be, again, by fail over, I mean on restart I should say, independent of Availability Groups. Now, this –k is important not just for availability groups, not just for any stand alone SQL Server instances that experience challenges with checkpoints overwhelming the disk I/O subsystem. And it’s very very common, it’s in for years and these challenges can go anywhere from a spike when the dirty pages get flashed disk to a stalagmite. You know, a little bit of a pyramid, the checkpoints basically never finishes. And so 5, 10, 20ms latencies for spinning media. 1 or 2ms latencies for flash can suddenly turn into 100ms, 500ms latencies for an over long period of time. And it just hammers performance and plus you’ve got inconsistency problems, you know, suddenly the whole system during the duration of the checkpoint throttles the entire system. So in our case for example during the default behavior without implementing –k we were getting, without checkpoints, 99,000, you know, between checkpoints 99,000 transactions per second. During checkpoints we were only getting 49,000 transactions per second. So if you look at the chart in the whitepaper you’ll see what looks like literally a rollercoaster. Just high, vhoov, high, vhoov, so it’s quite a ride but it’s not something that your user wants or your system wants. Implementing –k, and you implement it by simply adding trace flag like any other followed with no space by an instanture representing the number of megabytes per second of checkpoint throughput you want SQL Server to provide but no more. And in our case, the sweet spot, we did several tests. In our case, the sweet spot was 750, -k 750, said, “Hey, SQL Server I don’t care how many dirty pages you have never shove more than 750 mbps down at my disk.” And doing so allowed this rollercoaster ride manifested by the default behavior we all know and hate to a remarkable flat system performance. Checkpoint pages per second are throttle etc. in 750 mbps and that allows the system and because you define it, you develop the testing, you define the sweet spot and suddenly CPU is basically flat. Your transactions per second are basically flat, log flashes per second are basically flat and consistent. And the great news is that not only do you get consistent behavior but the average is far higher than the average of the rollercoaster ride. And so, yeah, I’m telling you I posted the paper when it was published to the MCM distribution list and I had several comments back about it. “Oh, where’s this been all my life.”, and that kind of stuff. I think you’ll be seeing, you’ll start to be seeing some adaption. And so again, one of the biggest takeaways of this Availability Group paper isn’t just the great things that we can do with the 2U box and a little bit of flash and improvements to 2016 but also an outcome is the first time I’ve ever seen it documented anywhere. The outcome of the impact of the –k startup trace flag.

 

Carlos:             And when we start talking about trace flags one of the things we want to at least make sure that we understand the downside. Or at least why it is not enabled by default. Maybe a good reason or a good thought. And if I think about it it’s really then what you’re going to potentially increase, again, kind of your mileage may vary depending on your sizes or what not. Is that when that instance restarts you

may have to go through more of those redo logs to get back. Is that a fair?

 

Jimmy:            Ah, that’s a good guess. That’s a very good guess but first let me be clear. This –k isn’t directly related to anything involving Availability Groups. It’s basically.

 

Carlos:             I guess said redo log while I meant the log, right. So you’re going through the log and move forward or back.

 

Jimmy:            Right, so you could do a –k that is too low and you basically get a sustained checkpoint that never ever actually finishes, and that’s what we did, that’s part of our seriously. And you would run exactly that situation the recovery on database restart, on server restart will take far longer than you want. But it turns out by doing the, when I refer to the sweet spot, part of the sweet spot means not just limiting the value so that it’s below the limited disk I/O system can sustain in terms of how many dirty pages it can take per second. But also you want the checkpoint, you want that to be never sufficiently high so you don’t run into that situation where you restart the database and it takes forever to recover. And so that is a potential downside, because, you know, we didn’t talk about this beforehand so I’m impressed that you came up with that. But the reality is with just a little bit of testing, we test that thoroughly and take less than a day, half a day probably. And we get a number that is again not, so the recovery is normal you can restart the database, you know, in a few seconds yet we’re not overwhelming the disk I/O subsystem. You asked the question, you Steve, why isn’t this on by default? That’s something I hadn’t thought of but the answer came immediately to my mind. The SQL Server product team by its nature they do things conservatively. No harm etcetera. And like trace flags 1117, 1118. How long have we wondered why aren’t these the default. Why don’t they create a TempDB files or four, whatever by default. Well, they were conservative, right? They finally realized in 2016. They said, you know, the case was made, “We’re going to implement 1117, 1118 at least in the context of TempDB by default.” It’s a known bottleneck. It’s been the best practice for years for users to implement it. We’re going to save them from trouble and finally make it the default even though it’s been the best practice for 3, 4, 6 versions, who knows. I have to think really hard how far back we have to go discover when they were introduced. So I think it’s kind of like that. Especially as flash becomes more and more a predominant host for SQL Server data and log files. And with these crazy performance we’re able to provide like a temporary servers. It’s a shame that people are going to have to resort to software needlessly with the default checkpoint behavior. So, it won’t surprise me if some subsequent version of SQL Server we see that implemented by default.

 

Steve:              However though you would still need to have a way to adjust it to your specific environment even if there was a default for it. I would assume.

 

Jimmy:            No question, no question, that’s going to be, no, maybe SQL Server can be smart enough. Maybe, because you know, because first it’s no harm, it would be easy to implement a default that in some cases could be harmful. You’re right there Steve, absolutely. But SQL Server is getting smarter all the time so who knows what they’ll think of next.

 

Steve:              Ok, so on that then, how about any other best practices that you learn while you’re going through this process of building this out and testing it?

 

Jimmy:            Oh man, I could go on. At least give me time for two please. Hopefully a bit more. But the two next most important ones I think hardware validation, again this is independent of AGs. This is independent of SQL Server 2016. It’s been a long best practice I have as a geek parachuting in to cities all over the planet for Microsoft’s consulting services. I evangelize, evangelize, evangelize, “Validate your hardware before you put it into production.” You know, you flip that production bed on that big million dollar sand before validating it thoroughly then you realized you have a problem. You’re in trouble. Seriously, promise, they’re not going to be fix very easily. And no one is going to be happy about it. So the time to validate performance is before you put your hardware into production, whatever the hardware is and especially storage. So the goal is to, I would say, “When you buy a new hardware you need to hit two different

thresholds.” One you need to hit the specs that are documented. You can download from whoever your vendor is. The second one is to make sure you hold the sales geeks feet to the fire for that vendor. Make sure that the promises they made you’re able to hit and if you can’t hit them get their people in there until you do. And so circling back to the work that we did we were struggling. We can apply with our own best practices, believe it or not, the cobblers kids have shoes in my shop. I was able to hit the numbers I needed. I said, “What, here’s the specs? Oh, what’s going on here? I was consistently by 20% below what the nominal specs where.” And we struggled for couple of three days until we realized we had the wrong firmware on the SSDs. We were really expecting our hits and had to call in some bigger geeks than I am. For the record, validating the hardware wasn’t on my plate. It wasn’t my job to do that.

 

Steve:              Someone else’s job but it’s to your problem, right?    

 

Jimmy:            Yeah, exactly, but the point is we discover this not halfway into the hardcore testing where I had to restart all the experiments. This was at the fundamental part in the work. We were pretty early on and we got it done. Takeaway from this is not merely, yes, validate your hardware but also, and this is very important, this is flash awaited, most of us don’t upgrade the firmware or drivers for our spinning media, ok. I certainly never have. Occasionally, when I’m asked, I say, “Yes, yes we do.” But though if answer buts, answer no questions. Flash, like many most of the other electronic in our servers require updates, both the drivers and firmware. And make sure you have the latest and greatest. Test and verify and all of that of course. But get your greatest and latest stuff out there. And I know from experience working with SandDisk and no Western Digital and formerly Fusion I/O. That’s stuff could make a big difference, so two takeaways there. One other one I want to throw at, one of the takeaways I have. We have a luxury working on all flash environment using crazy parameters to do our backups and restores. We backup and restore multiple terabyte databases routinely as a matter of course in a matter of minutes, and it’s pretty cool. Whereas environments I worked in the past, you know, multiple terabyte database can literally take hours, overnight, multiple days. So it’s cool to be able to sling those bits around in a matter of minutes. Then we do this by modifying some long time parameters that have been available to us in SQL Server for basically over a decade. Since, well, gosh, a decade and half, two decades. The number of devices, max transfer size and buffer count. My historical default has been to use 8 devices for data files, max transfer size, you can’t make it more than 4mb, the default is 64K. So at every request from a data file the backup process will grab a 4mb chunk instead of a 64k chunk. And also buffer count, and this is how much memory backup processes are allowed to use. And the default varies, as I understand it based on the hardware configuration that the database is on. But to maximize performance, backup performance, we multiply the number of logical cores times four. This is something that my old buddy and mentor Thomas Kejser taught me, and I’ve been doing that for years. Well, you may remember, circling back to the purpose of this whitepaper was to provide some actionable guidance for people to implement. And part of a real life, real world scenario is doing backups wherein we chose to implement our log backups every five minutes. And that we needed to find values for both database backup and log backup that wouldn’t impact the latency of the application. So instead of tuning for backup performance we have to balance that with application performance which is something I wasn’t used to because I haven’t been a production DBA for well over a decade now. And that was a lot of fun. So if you were to download the whitepaper you would find that the parameters we used, I don’t need to go into the details here, we used different parameters for data files vs. the log files. Easy to implement but the point is we did some of the heavy lifting for you and provided a template that you could use to implement it in your own installations.

 

Steve:              Yeah, that’s great. I would bet that probably 99% of the backups I see out there are just going on with the defaults on those.

 

Jimmy:            And it’s tragic.

Carlos:             Yeah.

Steve:              Absolutely.

Jimmy:            I know tragic is probably overkill of a metaphor but we are talking squandered resources. Okay. So there we go.

Carlos:             Well, so Jimmy, thanks for coming on and chatting with us today. I know that you’re willing to take the pains and kind of take one for the team and then share with us so that we can learn from your experience.

Jimmy:            Are you kidding? This is the best job I’ve ever have in my life. I mean SQLCAT is a lot of fun but this job is just a blast in terms of, so I don’t mind. These are the kind of hits I have to take, fine keep them coming.

Carlos:             Before we let you go shall we do SQL family?

Jimmy:            Sure.

Steve:              Alright, so the first SQL Family question is on keeping up with technology? How do you go about keeping of all the changes that are continuously happening? Besides writing whitepapers.

 

Jimmy:            That’s really a tough one. You know, because the SQL Server is so broad and deep and, you know, the impostor syndrome, a lot of people are. Gosh, I could go 20 minutes on this. I’m keenly aware of my deficits, they are broad and they are deep. Man, I just take it until I make it. I’m on candid at this point of my career can be candid with my deficits. And my buddy John Stewart our shiny new MVP. He told me, I think it was John who told me, “If you stay one chapter ahead of your customer you’re an expert.” That’s part of the secret and the truth is I’m actually taking a new tech. I live an intentional life, seriously I create a vision statement from month to month, quarter to quarter, year to year, I craft my goals. And one of the reasons I’m very excited about this December, I have a couple of weeks really hardcore weeks off that between ski sessions, ski days I’m going to craft my vision statement. And that includes a hardcore plan, not just I’m going to read powershell[inaudible — 37:18], I’m going to plot when I’m going to do those chapters. Michael Fall another shiny new MVP highly recommended that book among other people. And also I’ve enrolled in and I have just started the Microsoft Data Sciences professional curriculum. And my goal is to finish it by the end of December. So to answer your question besides taking it until I make it, besides being candid, besides not buying into impostor syndrome, I live an intentional life and I plot out my goals. And one of the things, speaking of impostor syndrome, a dear sweet, a beautiful gem of a friend, Marilyn Grant during a training session at Microsoft Consulting Services said the following and this is very important. It really helped me in my dark days when I didn’t, when I though I wasn’t smart enough to tie my shoes, stranded by brilliant people within Microsoft the customer side, and that is the following. You are as smart as they think you are or you wouldn’t be here. And that’s a very valuable aphorism for me. Okay, I hope that answer, not peaty like you guys but I hope it was, I you enjoyed it.

Carlos:             So you’ve worked for SQL Server for a long time, Jimmy, but if there is one thing you could change about SQL Server and maybe, you did talk about that redo log, if there is one thing you could change about SQL Server what would it be?

 

Jimmy:            That would be it right now. The redo log, it’s the big bottleneck. It’s the thing right now that is keeping, that has the potential to keep SQL Server from actualizing its true capabilities. Now again, not a lot of people need a hit right now. But it’s limiting us. I can’t invest time documenting AGs solutions if we have this bottleneck. If that bottleneck is remediate tomorrow we would be off the races and with various OEMs documenting one Availability Group solution one after another.

 

Steve:              So what’s the best piece of career advice you’ve ever received?

 

Jimmy:            Man, you guys are hammering me. I used to be in grad school so this is actually a question I’ve heard before so I know the answer to this. I used to be in grad school, I don’t know his name. We’re doing our thing that day and he without solicitation gave me this advice, he says, “Find, specialize, find a niche and dig deep.” And that’s paid off for me. You’ve heard some more things maybe from

other people but it’s really paid off for me. Some of you may know my original “claim to fame” was I published the stuff related to disk partition alignment. I evangelized it. Hey, you know, I did invent that but I was, let’s call it clever enough, smart enough, had a head enough with around me when I heard about it I was just gobsmacked and couldn’t believed this existed. And you know, no one really knew about it. And so I searched for an opportunity to actually do the testing, seized that opportunity, validated it for myself and we’re off to the races. I mean it was good enough for, that is what brought me to attention of the SQLCAT team and SQLCAT team has been a very valuable part of my success and put me to a position to be successful at SandDisk, Western Digital for example with the Fusion I/O people. So specialized, find something and go for it. One other thing, I’m sorry can I interject two other things? They just came to mind.

Steve:              Of course.

Jimmy:            Oh, thank you! Focus on the fundamentals. If you notice you go to your SQL Saturday sessions etcetera. Some of the sessions on fundamentals or some of the most well attended. And it’s true, fundamentals are important. You can’t get too much of them. And speaking of those sessions, community, community, community, can’t get too much community. You know, you guys, referred to the SQL Family questions. These whole set of questions is based on SQL Family. This close knit group of folks we hang out with. Pretty cool.

 

Carlos:             So you’ve mentioned a little bit about your background and some of the places that you’ve been. But give us the nickel tour how did you actually get started with SQL Server?

 

Jimmy:            Oh my gosh, I could get in trouble with this answer. Okay, you said, nickel, I went to grad school, went to California, came home a few years later with my toil between my legs and got a job. That’s the part that get me in trouble. But I got a job way back in the days of Windows 31 and I needed to do some, we did everything in paper and I said, “We can computerize this. I didn’t know anything about computers but I thought I was, I could figure it out.” And the computer that I wanted to use that had Excel on it was always busy. So I went this other computer it didn’t have Excel but have this thing called Access on it. And I thought, “Well that looks close enough to a spreadsheet to me. I’m sure I can figure it out.” And I had a hard copy manual of Access 3.0. I smuggled it at home with me and curl up with it every night. I never slept alone.

 

Steve:              So Access was your gateway to SQL Server.

Jimmy:            Yeah, Access was my gateway. In fact, I remember the first time I saw SQL Server. I sat down and I thought, “Where’s the GUI. What am I supposed to do now? What is this crap?”

Steve:              Alright, so I think our last question is if you could have one superhero power. What would it be? And why would you want it?

 

Jimmy:            Oh, I would be, I wish I had more charisma. I know I’m a nice guy, but I wish I was more charismatic. But the truth is, I want to be indefatigable man. I need my sleep, you know, as I approach my middle age, wink wink, I need my, I need not only my 8 hours. I could start turning into Mr. Pumpkin or you know, in fact, I went to the Microsoft. I was invited to go to Microsoft Server holiday party the other night and actually had to make an implicit, my friend got an implicit commitment from me that I wouldn’t start whining after 9 o’clock. I would man up and hang out until 11. So, that’s my thing, if I could get by with six hours of sleep on consistent basis instead of 8 I’ll be a different man.

Steve:              I’m with you there, make it until 11 is much harder than it is used to be.

Jimmy:            Yeah, but I will say though related to health is the importance of fitness. I know, Steve, you’ve lost some weight. Carlos I don’t know what your situation is. I’ve lost 85 pounds three different times. Speaking of a yoyo, you know, rollercoaster ride, three different times I’ve gained and lost 85 pounds. So I can’t tell you how many times I’ve gained and lost 50 pounds. I finally, I have gone to myself to a point where I am relatively stable, and healthy, and I make fitness a priority. My claim to fame, I mentioned my internal, my misspent youth awhile ago where I came back where toil in my legs while out there on the left coast, Richard Simmons, you guys have heard of Sweatin’ to the Oldies?

 

Episode 75: Testing Storage Devices

Storage testing and validation is something what we to add under roles and responsibilities as DBAs. Every database we ever manage is going to need one, but how often do we kick the tires? Many times we’re basically told to go verify that array or we’re doing this POC, we’re testing this new storage, but are we really only testing connectivity?  In this episode of the podcast, we chat with Argenis Fernandez about how he goes about testing a SAN array and the important metrics you should consider for your storage.  If you are still using SQLIO or even Diskspeed to test the IO for your SQL Server, don’t miss today’s episode.

 Episode Quote

“Storage testing and validation is actually something we end up doing as DBAs.”

Listen to Learn

  • Why the now deprecated SQLIO tool was poorly named
  • Why you should question your storage vendor about allocation unit sizes
  • Why you should consider garbage collection when you test SAN arrays
  • How compression and deduplication make testing more complex.
  • Why testing storage in the cloud is difficult.

Argenis on Twitter
Argenis blog at Pure Storage
Hyper-convergence 
DiskSpd Utility
VDBench

About Argenis Fernandez

Argenis is a Solutions Architect with PureStorage, a Microsoft SQL Server MVP, VMWare vExpert, and a well-known face to the #SQLFamily community, especially since he’s now the Director-at-Large of SQL PASS. He’s a regular speaker at PASS events, including SQL Server Summit. He also founded the PASS Security Virtual Chapter.

Transcription Storage Testing

Carlos:             Argenis, again, welcome back to the podcast.

 

Argenis:          Thank you so much for having me again. You guys, it’s always awesome, you know, I actually requested this one. So we upfront with everyone in the audience, I did told you guys to please go ahead and schedule me again because I wanted to go on yet another rant. Like I won the one rant on the previous one, I’m ready to go on another one man. Let’s go!

 

Carlos:            Very good, and as you all know compañeros what Argenis wants, Argenis gets. And of course we’re happy to have him back here on the show today. So last time we talk was kind of just a SSD conversation, talking a little bit about disk. And today, I think we’re going to be, our topics might be wide in range but we want us to kind of kickoff potentially with testing. And testing your arrays or testing your disks that you’re using in your servers.

Argenis:          Yeah, so storage, testing and validation which is actually something what we end up doing as DBAs. You know, like a lot of times we’re basically told, “Yeah, you need to go verify that array.” No we’re doing this POC, we’re testing this new storage, you need to go ahead and test it. And what is it, you guys answer this for me, what’s your favorite tool to use when validating storage?

 

Carlos:             Yeah, so they deprecated the SQLIO, so Diskspeed is the other one that comes to mind there.

 

Argenis:          Ding, ding, ding, right, so that’s what everyone looks for. So SQLIO, I love SQLIO, rest in peace now. Because it was completely wrong name to use like might possibly the most awful name to use for that tool because it actually had nothing to do with SQL Server or whatsoever. It just so happens that somebody in the SQL team wrote that utility but it was actually never meant to validate SQL Server patterns. It was just a storage benchmarking tool basically. It wasn’t even good for base lining because you couldn’t take like one, you know, I guess you could save the outputs and then compare them to some other output that you took in the future, right, to see how you’re doing etcetera.

 

Carlos:             Exactly

 

Argenis:          But you don’t see those things in SQL Server like that. For example, you would get all these fancy scripts from people out there, you know, that would run SQLIO at multiple block sizes and multiple frets and. You know, you will get some number so, what’s funny is that as DBAs we would actually never know if that was good or not. We would be like, “Yeah, we got some numbers. They looked alright. Let’s go.” It’s actually kind of funny at least that was my experience way back when I was giving a, I can’t remember who the manufacturer of that array was, and I was basically told here, “You go test this stuff.” I was like, “Ok, what do I do?” So naturally, I found SQLIO and then SQLIO is what I ran and I got a whole bunch of numbers and I’ve got a bunch of pretty graphs and then I showed them to the storage guy and the storage guy is like, “Yeah, yeah, yeah.” Ok, what does that mean? Is that good, is that bad? Are we, you know, if I put my databases on this thing are they going to work? Or you know, what is it? So on the era of magnetic storage and this is changing super fast like way faster than any of us expected. As you guys know, I work for the flash arrays all the time so I’m not really motivated or particularly interested on validating storage on magnetic anymore. But back in the day when we have the old HDD comprised arrays, magnetic arrays or spinning rust or whatever you want to call them. We wanted to test using something that made sense so, SQLIO would be a thing because they would actually generate some workload against the array, you know, regardless of the fact that it would be a patterned dataset which is actually very important to understand. SQLIO would generate a pattern of data to be send down to the storage array, not even close to real dataset, not even close. At that point, whatever performance characteristics you would see on your storage array at that point you will be happy with because you would basically ask the storage guy, “Hey, I’m getting, I don’t know, 800 a second and that thing. Is that good?” The storage guy would be like, “Yup, that’s good.” “Alright I’m done. I validated that my performance was ok.” You will look at the latency and see if the latency will be acceptable of different block sizes. And you would commit the most frequent mistakes of all which would be tying your allocation unit size on NTFS to the actual block size that gave you the best latency. That’s what everyone would do, like they would actually make that mistake right there and then. You would go ahead and see what SQLIO told you in terms of latency for a given IO block size. And you would say, “Yup, I need to format my volume at this particular allocation unit size because that’s what’s going to give me the best performance.” That couldn’t be any further from the

truth, like literally I have no idea why people got stuck in on that myth. And I actually have a blog post that I’m sure we can reference from the podcast here that mentions that I/O block sizes and SQL Server in general.

 

Carlos:             So Argenis, just to be a knuckle dragger here.

 

Argenis:          Yeah, please.

 

Carlos:             Because everybody knows, isn’t that because just run this test it’s telling me, “Hey this is the allocation unit size that you should be using.” And then you’re like, “Ok, well that..”

 

Argenis:          That’s not what the test is telling you. The test is telling you that a given I/O block size your storage array behave in a certain way. The I/O bock size as you all know has nothing to do with the actual allocation unit size of that of an NTFS users. They are two completely two different things so it makes no sense for you. So if you got the lowest latency at 4K you will not going to format your NTFS allocation unit of 4K. That’s just not it, right? Because allocation units are meant to be used for allocation, that’s what they are for. And so larger allocation units sizes, so 64K, and quite possibly larger than that with newer file systems, like ReFS which is becoming a thing nowadays. You would not consider using smaller allocation unit sizes because you want less metadata for your file system as long as that metadata doesn’t become a contention point for allocations. This should not be a contention point because this doesn’t work like DFS [inaudible –] You’re not going to be continuously allocating new expanse  for your database all the time like one by one, right? Doesn’t become a contention point for your database in terms of growth. Yeah, Steve, you wanted to ask something go ahead.

 

Steve:              So then with that, I mean it’s used to be you look SQLIO and then you format your partition size in your disk to match whatever your best throughput is. And then came along the rule that general always format it with 64K.

 

Argenis:          Just go 64K. Yeah, that’s what you should do. There are other storage arrays that still tell you to format that at a different allocation unit size. I literally have no idea why they ask you to do that. Because I don’t even know if this people are guided by the same things that I mentioning here all day just looking at the computer one thing, right? At my company, we just told everyone, format at 64K you’ll be fine. They are just going to perform just fine. All other storage array vendors tell their customers to go 4K on transaction log and go 64K on data files. I have no idea why they say that? I do not have an idea on how they architected at that granule level that it actually matters. To me it shouldn’t matter on in any storage array for the most part. Now, would you see differences on SQLIO? Yes, because you are allocating a file upfront for SQLIO. That’s one of the things that matter, right? Like SQLIO actually takes time to allocate that file and Diskspeed does that thing as well. So where we actually swayed by the fact that SQLIO was creating a file or what? I think that is actually part of the problem.

 

Carlos:             Other examples that might include is if I’m moving or upgrading from one server to the other. I already have a larger file and I have to allocate all of that space, right? In the case of a growth, I start with a very small database and I’m growing in chunks. Those patterns are also different.

 

Argenis:          Well, I mean, at that point you should need to start thinking in terms of file system, right? You have a set of database files that are sitting on a given file system. If you’re migrating a database from one server to the other and you’re just using a restore that destination file system doesn’t have to match the allocation unit of the source file system by any means. If you’re using something like Storage vMotion or Storage Live Migration on Hyper-V or whatever it is called on the Microsoft world. You were just to migrate something and you would end up with that same allocation unit size because you’re performing basically a block by block copy of the entire thing including the file system. Alright, so, new ways of doing things etcetera etcetera but what matters in the end is, me personally I ask your storage vendor what is it that they like. If they like 64K ask, you know, it would probably safe to end the conversation right there and format in 64K. If they ask you to format at 4K, ask them why? Why is that a thing? Are they doing the right thing by telling you to format at 4K? Anyway, rant over on that one guys let’s not go into that. Let’s take a step back, we started talking about SQLIO, right? And we said rest in peace, it had the wrong name to begin with, it generates pattern data, it was replaced by Diskspeed. But guess what I love Diskspeed and I kind of hate Diskspeed at the same time. Because you know what Diskspeed does? It also generates patterned

data. Why does that matter? Most of the storage arrays that they sell today, pretty much every single one out there has intelligence built-in to the controllers, so whenever you send a patterned data set to it, it will basically just mark a little metadata entry for it and nothing else, so you’d actually wouldn’t be stressing. This is actually super important for everyone to understand. You would not be stressing your final media so if you’re laying data down, pushing data down to your hard drives you wouldn’t be stressing them. If you’re pushing data down to SSDs you wouldn’t be stressing those either because the controllers will basically drop all that data at the head, at the front end and it would basically just say, “Yup, I got some activity going on here but it wasn’t enough to merit me using the SSDs or the Hard Drives at all.” That’s kind of very important, right?

 

Steve:              So then in that case then Diskspeed may still be good for testing like local drives that are installed in a server but not the storage array? Is that right?

 

Argenis:          Here is what Diskspeed is good, yeah, if you have local storage obviously Diskspeed is good. However, if you have something like, I don’t know, like Storage Spaces Direct or whatever that might be that might have some other backend behind the scenes, you still have to go through all of that, right? You may have an intelligent controller on the other side maybe managing disk for you. You don’t know that. As a DBA you’re not told exactly how your storage stack is laid out like. So if you’re connected to some storage array they’re not giving you the specifics of whether that stuff is doing something a certain way or not. So your job as a DBA is to make sure that you test with the best dataset possible and the best workload possible. So is Diskspeed the thing that’s going to generate the most realistic workload for you and the most realistic dataset for you? I think the answer to that is a flat resounding NO. So this is why I only use Diskspeed to validate plumbing, meaning I’m getting good throughput, I’m getting good latency really quick, and like my channel is good. I have a good line of communication between my host and my storage. That’s all I am testing with Diskspeed. If you have an old fashioned, you just dump SSD for example, that you just attached directly to a server and you want to run Diskspeed against that. Yeah, that’s good but is that actually going to test everything on that SSD. I will dare say no because there’s more to that. An SSD is actually comprised of NAND, right, like NAND chips. And the NAND chips actually perform garbage collection, right? So overall, the SSD has a controller and that controller would decide at some point to perform garbage collection on that NAND because NAND is not byte-addressable. NAND is actually addressed kind of like database pages, like the entire pages are written at the time. You don’t writer on a byte by byte basis. So that actually makes a huge difference, right? When you’re testing at something that is byte-addressable like a regular Hard Drive used to be that would be byte-addressable, right, like sector addressable. You wouldn’t trigger a garbage collection on HDD because there is no need for that. But on SSD you will trigger a garbage collection and the only way to stress an SSD so that you make sure that you trigger garbage collection on it is by priming it, so filling it with data and start writing data to it again. So you would trigger the garbage collection mechanisms, ok. Did you see how storage testing is a complex story? It’s not something that’s straight forward. It’s not just running Diskspeed and seeing what kind of numbers do I get. It is so much more than that. It’s running a continuous workload at different I/O block sizes with the given reducibility of the data. Actually we haven’t even talked about this. Reducibility of the data is a thing on all flash arrays and it matters a whole lot. Because most of the storage arrays out there are doing the duplication and compression of some sort. Like ours does, right? Like the one company that I worked for does it. EMC does it, SolidFire does it, Nimble Storage does it, THL does it. Everyone does compression, right? So when you want to validate that all of the storage arrays are doing the things that you want, they’re going to reflect a certain performance characteristics upon a given workload and a given dataset. You want to test with the real workload and the real dataset. So again, Diskspeed doesn’t take you there. Diskspeed only helps you test the plumbing. Make sure that, you know, whatever is between you and your final storage device is not bottleneck somewhere. It is really good for that and it is really also good for testing dump byte-addressable storage. Here is another thing that a lot of people

don’t know. Diskspeed actually has a way to generate random datasets. As you can tell Diskspeed here go ahead and generate a random buffer but getting it to work. There’s actually a huge bug on Diskspeed. Getting it to work is actually really complicated. I don’t remember exactly all the conditions that you have to do. Like you have to use a file of a certain size and then the buffer that you pick has to be evenly divisible by the size of the file or something like that. I can’t remember what it was like. Because its buggy and it’s not quite perfect then you have to have all these conditions for it to generate a random workload. But then that random workload will be so random that it wouldn’t actually be representative of your real dataset either. So it will be like the complete opposite of what I said before. You could start with a patterned dataset which is very easy for the controller to drop, right? And then in the complete opposite end of the spectrum you’re sending a completely garbled high entropy dataset that makes no sense whatsoever and it’s not actually reflecting your real workload. This is why I keep telling people you want to test with a real workload and a real dataset. So restore your database, replay your queries against it. That’s the best way to actually validate that your storage array is going to do the things you wanted to do. And you can see how it reacts to your workload and keep it on a continuous loop right. So run it over and over and over. Increase the frequency of your workload if you can. So there’s multiple ways we can talk about that, multiple things that you can do to kind of replay your workload and kick off more threads and stress the storage array a little bit. Or if you can’t do that, there’s synthetic testing that you can do against the storage array that will kind of mimic database access patterns but it wouldn’t be your real workload. In the end, what I want people to do is forget about running Diskspeed, foget about running SQLIO. If you want, run Diskspeed really quick just to make sure that you got good connectivity against your storage array. But in the end what you want is to replay your production data. Your productions datasets, so restore your production dataset and replay your production workload against that storage array. That’s really what’s going to give you the best picture.

 

Steve:              So at that point, when you say replay your queries, replay your workload I assume you’re talking about the distributed replay controller that point then.

 

Argenis:          That’s one way of doing it, right? You also have, Orca which is also another way of doing it. There are other tools out there that, you know, from vendors that allow you to do that. I believe Quest Software has one if I’m not mistaken. There’s others right.

 

Carlos:             What’s that second tool that you mentioned?

 

Argenis:          I’ll just mention three that I know about. And even if you don’t have time to do that you can just run an ETL, right. Which most people have an ETL or you can write very intensive read queries or write queries provided that you do things cold cache. So you flash your buffer with DBCC Drop Clean Buffers then run your queries or your I/O intensive queries against that restored copy of your database and see what kind of behavior you see. But more importantly when you’re testing and validating a shared storage array it’s not one workload that’s going to be hitting that storage array so things get even more complicated. And this is why, you know, I could be talking about this stuff literally for weeks.

 

Carlos:             Even then Argenis, you just flashed the buffer cache but that’s on the database server. And you mentioned even like so that caching at the SAN level. Now is the first time you pull that it’s not going to be in that cache. But how do you?

 

Argenis:          Well, that’s another thing, right? If you have caches on the SAN side how do you invalidate those caches, right? If you’re going to have a cache into your SAN do you want to use all that cache for writes or do you want to use all that cache for reads. So it depends on your workload. So what I tell people is, you know, what kind of workload do you have? If you have a data warehouse, there are going to be times during the day when you do ETL and you load data into that data warehouse that you want to have that cache to be just for writes as much as possible. And then at some point, you’re going to start pruning that database and at that point that cache be better used for reads. Is that the best way to set it up? It depends on your storage array. It really really does and this is why working with your system engineers from your storage vendor matters a whole lot. You need to understand what knobs are available to you or even if you want to tweak them or not. When you’re validating a storage array it’s very important that you understand that there’s a cache or not. So ask that question from your storage guy, “Hey, is there a cache involved in this thing?” Because if there is a cache you want to make sure that you do enough testing that you over run that cache and see what the behavior will be after you overrun that cache and just start operating it a different speed. Because that’s something that storage people love to do. They love to abstract you from the fact that you’re running on

this low storage by throwing another to your storage in there. Is that solving the problem? Maybe, right. Maybe it does fix your problem for you. But is it a long term solution for your workload that’s continuously growing and continuously getting more aggressive? Maybe not.

 

Carlos:             Well, it’s the old adage. I mean, it’s just a new version of throwing more CPU and memory at the server, right?

 

Argenis:          Right. Tiers are the thing for storage has always been a thing. And even at the CPU level you have different kinds of caches. You have an L1, an L2 and an L3 cache for instructions on data on the CPUs. So this is not a new concept, right, by any means. It’s just that, you know, how many times do you have to change your storage so you can get a solution going which is also a very big thing, right? How many times do I have to migrate my database before we actually run on a thing that actually supports my workload? It’s a pretty common problem out there, right? Like a lot of people hearing this podcast will probably identify themselves with this. You know, I migrated my database once because when you switch to new storage, six months into it we realized that it sucked and that we have to migrate it again, so data migrations are very much a thing. But in the end it’s all the storage testing and the validation that you did that’s going to give you that confidence of using that storage array the right way. You’re making the right purchasing decisions. It’s very important, you know, and going back to this point that I was talking about before, reducibility of the workload is very much a thing, right? If you just create one database that’s completely empty and what are you testing there? You just created a database that’s completely empty. What are you testing there? Absolutely nothing, right? If you’re validating, the performance characteristics of a TD Encrypted database that you’re placing on an array that performs the duplication, compression and things like that are going to be different that if you send a database that’s in the clear. So those are any compression whatsoever. The performance characteristics will be different so you want to validate both. You want to make sure that your array reacts a certain way for your datasets. So if you have TD Encrypted databases make sure that you restore TD Encrypted databases to that array and you run your performance test on top of that. The future is filled with arrays that perform data reduction all the time everywhere. And, you know, even if your array only offers certain features and you know you’ll going to use some of those features like for example there are some arrays out there that have data services turned off but you can pick to do compression only for example. Then, make sure that you enable compression on the volumes that you are going to use for your databases and drop your data in there and whatever final form it’s going to be. So if you place compressed data on top of an already compressed dataset on a compressed volume on a volume that will perform compression, you know, what kind of performance characteristics are you going to see out of that. In general, it can be kind of eye opening to see how storage arrays react to different datasets. A lot of people don’t think about this because they just think that, you know, running a quick Diskspeed will get them out of the way quick. It’s not what you want.

 

Steve:              So let’s say you’ve got your production server running on a new storage array that’s shared and you start out with SQL Server there and everything is running great. And then overtime the SAN administrator decide to load that storage up with additional systems using it. Bunch of backups right in there, or file stores or other systems using it. One of the things that I’ve seen when this happens is that your SQL Server performance degrades over time. And I guess as far as testing that one of the things that I’m really curious about is how do you kind of test on a regular basis to know that your production performance is not degrading there on the actual I/O side of the storage.

 

Argenis:          So, I mean, you nailed it, right when you mentioned that this happens very frequently and that’s just because people are not doing their homework, right? Every single storage array has limits. There’s a given amount of workload that you can fit on it first and it works just fine. As you add more workloads into that array you start seeing different performance characteristics from that array as you’re adding different workloads for different patterns. That is just natural. You know, all of a sudden just start, you have your data warehouse running at full throttle on that array. You start adding a bunch of VMs that are going to do, I don’t know, you’re running Exchange on those or whatever, or SharePoint or whatever you’re going to see a different kind of behavior from those storage arrays. So how do you test that? The only way to test it that you can be

comfortable with is mimicking what it is actually going to be as much as possible, and as much work takes and as painful that sounds is the only way to make sure that your storage array is performing a certain way. So take your real workloads with your real data sets and hit your array with that. I you want to fit an additional workload and you don’t even know what that workload looks like then you do have some homework to do. Like you need to understand what are your current, how busy your storage device is and whether it will accept that additional workload without tipping it over. Or will it actually cause it to, you know, start generating a lot more latency or your throughput will reduce because of that additional workload because now you have, you know, everyone’s drinking and we have a big fire hose before now we have two people that are drinking from it and not just one. It kind of works like that, it’s just, every single computing element in your data center has limits. And the storage array or your storage devices are just one more. You need to understand what are those limits are, and when you are going to feed more workloads into that you need to make sure that you do your homework, and understand how busy that storage device is. And when you are ready to drop that new workload in that it fits nicely and it’s not going to tip it over.

 

Carlos:             Yeah, it’s interesting even recently we put out our SQL Server checklist and you know lots of administrators are trying to standardize our processes and. You know cookie cutter in a sense, repeatability, automation, right that’s the big mantra. And it almost sounds like, “Eerrk, you know, hold the presses here.” There are certain areas where, not that you can’t automate all of it, but at least you’re going to have to understand. You’re going to do some homework first and then choose your own adventure if you will.

 

Argenis:          You nailed it. That’s exactly what happens. And do you know what a lot of people think that they can get away with not doing this in the cloud. “Oh, it’s just a cloud so it’s fine.” “Oh it’s not fine alright”, let me tell you. And the cloud you have a bunch of, you know, Amazon may not tell you, Microsoft may not tell you, Google may not tell you but they do all these things that we talked about, right? So they do their caching, the tiers, and they have this devices on the behind the scenes that do a whole lot more than just storage. They might have actually data services involved in all that. So it’s not as straight forward to test storage on the cloud either. What I like to do is just tell people remember the basics. You could get cache some idea with the throughput and what the latency will be like by using kind of dump synthetic tools like Diskspeed or Iometer or, I don’t know, CrystalDiskMark which in the end uses Diskspeed underneath the hood, or Atol, or whatever. There’s like a hundred of them. You can definitely use those. We’ll give you an idea but testing with real workloads and real datasets is going to be the thing. That’s why I get people, I literally have a friend who was spanking me on Twitter the other day. They bought this hyper-converge solution. It looked great when they run SQLIO on Diskspeed against it. But guess what that thing was doing? Exact same thing that I mentioned before, right, it was dropping the patterns of the controller side. So they never actually saw it perform with real workloads and real datasets. They put their workload on it, they started seeing all this latency and the workload was actually kind of in the same way that it was when it was running on magnetic. Because they didn’t test the fundamentals which is test with your real workloads and real datasets. I keep hearing hyper-converge by the way. It’s a thing now, like everyone is talking about hyper-converge. Need to remind people that hyper-converge is great but not necessarily for databases. One big problem with databases is that they need low latency. You’re going to be doing transaction log writes. You’re going to be doing a lot of reads and those reads better be performed quick. Because that quick turnaround from storage is the thing that’s going to give you better performance and so.

 

Carlos:             And so I’ve been hiding under a rock Argenis. Describe to me what hyper-convergence is?

 

Argenis:          So hyper-convergence, it’s kind of involved now for describing a new kind of architecture where compute and storage is all same thing, and even networking is kind of hidden. You can do on the same compute nodes you can have storage attached to them and, you know, everything is kind of flat and it gets you like that kind of cloud experience when it comes to provisioning storage kind of

thing. But in the end you’re sharing resources between storage nodes and compute nodes. So you can end up with a situation where your databases are running on the same exact node as your storage device and your storage. The actual serving of storage takes compute resources. And those compute resources are being kind of colliding with your databases. So there’s obviously other kinds of designs or features on hyper-converge where you can have storage only nodes. But that’s really no different than, you know, having your own storage device out there, and so they basically spinning storage again into its own node. It’s basically, you know, having less things to configure on your network so basically, it’s more like the appliance approach to computing where you just buy something. Hook it up to your network and two seconds after you have it up and running. Well, provisioning is part of it, right? Provisioning and getting things to run fast is one part of it of course but you know it’s an on-going operation and on-going performance so you are going to need either of that thing that really matters a whole lot. So if you’re looking at hyper-converge solutions, please, please, please make sure you test with the right tools. And if I can mention one synthetic tool that actually works really really well. I personally hate Oracle but this is one thing that comes from Oracle that’s quite kind of decent. It’s called Vdbench, V-D-Bench, so Victor David Bench. That is quite possibly the best storage testing tool, storage testing and validation tool that would allow you to get, you know, a better idea of how your workload and how your dataset are going to behave on whatever storage device you’re testing. It actually allows you to specify the duplication and compression settings for you workloads. So you can say, oh this is like a database so it will dedupe 1.2 to 1 and it will compress 2 to 1. Or it will compress 5 to 1. Or I’m testing VDIs so I’m going to have a lot of reducibility on my workload that’s going to reduce 19 to 1 so I can test it that way. And then you can also generate workloads by saying I have, you know, this kind of I/O block sizes this time of day. I have these peaks and valleys. You may actually specify that the target device needs to be filled with data before it can actually be tested. So there’s kind of whole bunch of sweet sweet sweet features that you would love to leverage when testing and validating your storage.

 

Steve:              Ok, great. So one of the questions I have here is that it seems like when we talk about testing with an actual real workload that really that often times in my experience happens after someone has already made a decision on purchasing your storage and it gets to the environment and here’s what you have, figure out how to best use it. I mean, is there a process that you generally goes through like the pre-sales when people are looking at different storage where you can do this kind of testing usually?

Argenis:          So most of the storage vendors out there will want you to perform a POC. Some of them actually don’t like it because it takes resources away from them, right, like, you know, they have to ship you an array. They have to be on top of you and you only have so many days to complete that POC etcetera, etcetera. Most of the storage arrays vendors are or the storage vendors in general not just storage array. They will happily give you something for you to try and it’s during that period that you were able to validate that, that storage array or that storage device. If your storage vendor doesn’t do that and your management will sold on that storage array and you get it right away. I need to remind you that most things are you can actually return. So if you get something, test it right away. Like drop everything you’re doing and test it because you don’t want to pay for this in the long run. Especially if you’re the DBA who’s going to be babysitting processes that are going to be running on top of this thing. Because imagine you end up with a dot, imagine you end up with a storage array that does not kind of give you the performance you need or storage device doesn’t give you the performance you need. You are going to pay for that dearly. You’re going to pay for that with your weekends. You are going to be suffering. You are going to have to watch those backups, and watch that maintenance, and watch that update stats they’ve meant. It’s going to be painful, right. So just make sure you test things as much as possible that you inject yourself in the process of acquiring the storage as much as possible. That you become knowledgeable on the storage

side as much as possible because storage is a thing that is fundamental to every database out there. And every single database is backed by some sort of storage and if you don’t understand how that storage works, and you don’t get into it a little bit, even a little bit then you’re going to pay for it down the run.

 

Steve:              And I think that’s where if you’re working in a shop where there’s a wall that’s been built between the storage administrators and the DBAs. That’s where you get the most trouble but when they’re the same person doing both sides of it or that there’s no wall and there’s very good communication between the DBAs and the storage admins then you’re able to make those kind of things happen.

 

Argenis:          This is something that you have to ferment. You have to make sure that whatever walls exists in your environment today you can overcome. Like, you become best friends with the storage people. You know exactly what they have. The storage people know why you care about certain things. They understand why databases are the pain in the butt that they are for storage. They are, they are a pain in the butt like the storage people hate DBAs for a reason. Because the databases are nasty especially, you know, take backups right. Who has a storage guy that loves to see the performance of the storage array when taking backups? Nobody, right, because backups are just nasty on every single thing. Backups are nasty on the source storage array, backups are nasty on the networks and backups are nasty on this target storage array. Or jbuff whatever, may not be an array. But whatever you end up backing up too. So it’s a thing like people need to understand that if you don’t talk to the people that work in your company that do different things than you do. Then, in the end have power and oversee the things that matters so much to you. You’re just hurting yourself. That’s actually part, you know, we could go on another rant about why you should be best friends with your developers as well. But it’s basically the same thing, right. Like everyone is working towards the same objective which is making sure that your company continues to operate at the highest level. And you can crank out features and crank out new functionalities as fast as your business wants so don’t become an obstacle right there. That’s actually something that I advocate to everyone. Don’t become an obstacle on the database side. Don’t become an obstacle on the storage side. Don’t become an obstacle on the virtualization side. Offer solutions, tell people, “Hey, this isn’t working let’s go and see what we can do to make it better.” Engage more people and make sure that everyone understands what’s actually going on in your environment. Because the last thing that anyone wants is surprises, “Oh, we never told anyone that this actually wasn’t working.” You need to make sure that everyone knows the status of your environment out there.

Steve:              Right, right.

 

Carlos:             Sounds like very good career advice.

 

Steve:              Absolutely.

 

Argenis:          Alright, man, we went a little rant over there. Man, I need to catch my breath here.

 

Steve:              Oh, it’s all good stuff.

 

Carlos:             Well, thanks again for being with us today, Argenis.

 

Argenis:          It’s my pleasure. Thanks for the time you guys. I really really really wanted to do this because you know having, I make sure people kind of got a taste of what it’s like to pay so much attention to the little details on the storage side. A lot of us happen kind of complacent in the past and said, “Oh, I just got some storage array that I own, some storage device that I got from this guy. I’m just going to quickly run something. Ah, numbers look good. I’m moving out.” That’s not it. It’s so much more to it. So you need to understand how things work and why testing in a different way matters so much.

 

Carlos:             That’s right. So before we let you go we’ve had you on the shows, we’ve done our SQL family questions with you already. I guess for those who may not know all of your history. Give us the brief story how did you first get started with SQL Server?

 

Argenis:          The year was 1998. I remember that Google was barely getting started back then. I remember when Google came out I was like, “Oh, I’m not using Yahoo anymore. I’m not using Alta Vista anymore.” Whatever we were using back then. That felt awesome. I remember I was SQL Server 65 that I was powering the backend for this product that I was working on. I was working for an ISP data administrator on an internet service provider. The largest one done in Venezuela and the mandate was to migrate from a FreeBSD environment to a Windows NT 4.0 environment. So you guys can imagine, right, like that was really really really controversial so we got a lot of pushback in there. But in the end that was the mandate that came from management and we purchased this product from Microsoft called Microsoft Commercial Internet System. It’s called MCIS, right? And the version of that was version 2.0. So that server was, that suite of

products was NT 4.0 with the option packs so IIS and all of that, and you had an additional set of binaries for running an SMTP Server, a Pop Server, an Ldap Server. And that Ldap Server was powered by Site Server’s commerce edition, Site Server 3.0. So if you guys, you know, I like reading, ancient folks like me that worked that technology wayback when you remember all of these stuffs. Actually I can’t remember if it was Site Server 3.0 or earlier version on the first version of MCIS that I used. I can’t remember. The truth is that it was powered by SQL Server 65 at first and then they develop support for SQL Server 7.0. So that’s kind of how I got started with it maintaining that database and I was basically one of the, there were many of us, accidental DBAs for that database. That’s kind of how I got started with it. And so we were sent up here to Redmond so that was January 1999 when we came here to Redmond for the first time. We spent a month here training on that and we were basically given a crash course on Windows administration and SQL Server administration. So that’s how I got started with SQL way back when. And you know, change stops a hundred different times and, you know, made my way through this admin a little bit and I did development for a little bit, and I did kind of management for a little bit. But in every single one of those positions that I did consulting even for Microsoft at some point. Through all of those positions I was always working with SQL server in some way or another. So you know, it’s been 18 years now. It’s been a great great great career and I just went to the MVP Summit. I knew some of you guys were there. Man, the MVP Summit was awesome because it’s all those, you know, they laid it out in front of us the future of SQL Server and what it looks like. It’s now going to run on Linux. Those learning on Linux, so if you guys out there haven’t seen that you need to go take a look because I think it’s going to be all the rage. SQL Server on Linux is going to be a huge huge thing.

 

Steve:              Oh yeah, I think so too. I know I’ve got it running on a virtual machine here on Linux and it’s pretty nice.

 

Argenis:          It is, it is. So that’s kind of the backstory on me starting with SQL Server way back when.

Steve:              Excellent.

Carlos:             Awesome. Well, again, Argenis thanks so much for being here and taking some time with us today.

 

Argenis:          Hey guys, thank you so much for letting me do this again. I really really really appreciate that you guys kind of give me another slot of your podcast which is wonderful by the way. You know, to get some more of my opinions out there and I know, you know, the previous podcast actually generated a lot of controversial discussions out there on where my stance is. I actually owe some people blog posts on that to follow up on some of the points that we talked about. And they will probably ask me a blog post on the things that I talked about today so I will promise. I promise, I promise I will make that. I will make those blog posts happen and we’ll get people really really interested on testing their storage the right way.

Steve:              Sounds good. We’ll definitely look forward to reading those.    

Episode 74: 2016 in review and 2017 goals

As the year 2016 comes to a close, Steve and I share our thoughts on our accomplishments for the year and what lies in store for 2017.  Steve reviews some of the changes to the database health monitor tool and Carlos shares details about his new book and how it came to be.

 Episode Quote

“2016 has been a busy year”

Listen to Learn

  • Some of the accomplishments Steve and I have had over the last year
  • What we are looking forward to in 2017

Carlos has a new book – Zero to SQL
The SQL Server checklist repository

Transcription: 2016 in review

The transcription will be available shortly.

Episode 73: SQL Server Performance Tuning

t some point all data administrators will have to tune their environments and there are may ways you might go about that.  In this episode, we talk with Pinal Dave about how he does performance tuning and the thought process behind the approach.  What I found interesting about this discussion is we have our own version of the how to do things, but there are some basics that everyone should have covered.

This discussion does not get heavy into the how to for all scenarios; however, I think you will find the discussion very informative.

 Episode Quote

“I let them explain in the first five to ten minutes their own problem. Understand their concern first.”

Listen to Learn

  • The first steps in performance tuning
  • Why MSDB can play a role in performance
  • Why server setup is still so important
  • Different approaches to troubleshooting performance
  • Why DTA still gives us heartburn
  • Pinal’s take on how long the server should be up before you use the DMVs to diagnose issue.
  • Who Adrienne Frost is

Missing Indexes
Missing Index Advisor
Pinal’s session from PASS Summit 2016
Adrienne Frost
Pinal on Twitter

About Pinal Dave

Performance TuningPinal Dave (pronounced Da vey) has been a database professional for nearly 20 years.  He is the author of the very popular blog sqlauthority.com, which just completed its 10 anniversary.  He has written a book on interview questions and is a frequent speaker at conferences.

Transcription: SQL Server Performance Tuning

Carlos:  Ok, well Pinal, welcome to the show!

Pinal: Thanks for having me! It’s a wonderful. I’ve been talking to you guys and we wanted to so glad to be here and I’m super excited.

Carlos: Yes, it’s almost a pity that has taken us this long to have you on the show. But thanks for taking a little bit of time and coming to talk with us. Ultimately, our topic today is performance tuning. And this is one of those topics that it seems like we can never get enough of, right? There’s always something to be tuning, some setting, some tweaking, some metric that we need to capture and then compare again. And so, I guess let’s just go ahead and kick it off with what are some of the, maybe. Let me say that again. What are some of the top things that you look for when you want to start performance tuning, SQL server, database or instance?

Pinal:  That’s a very good question! So, this is investing what I want to look into and what people want me to look into. They are two entirely different things. And I find it funny and when customers, so I’m an independent consultant for a SQL Server performance tuning so people come to me, hire me and say the system is slow. So far we are together with them. And this is where now the difference comes, so they come to me and say, “We want you to look at our indexes, and tell us why my index is not used by SQL server?” Or sometimes they come back and say, “Hey, look at this particular query, I believe it’s running very very slow.” Or they come back and say, “Our application is generating a lot of unnecessary data can you help us to clean it so it only generates necessary data.” They come back with this kind of request and that’s so interesting so this is what they want me to do. But you know what, I know behind the scene that most of the time they are just seeing things which they are seeing but our real problem lies somewhere else. Probably in TempDB, probably in I/O distribution, maybe they are facing deadlocks or [inaudible – 02:14]. So I think this is like the very difference so. Do you want to know how I approach this particular problem when customer come and say this is what they want to do and how do I try to bring them to the real problem?

Carlos:   Yeah, I think that’s a very interesting point. Even as database administrators we’re kind of stuck in that rut as well where someone will read an article, right? Some new webinar comes out with a specific metric and they’ve decided that that is the key that will solve all their performance problems.

Pinal:   Absolutely, so most of the times when a customer comes to me and says, “This is the particular problem.” I never said, “That may not be their problem.” I always say, “Well, then we need to fix it.” And I let them explain in the first five to ten minutes their own problem. Once I hear them properly, uhm, and they feel confident that they have explained the problem, and we try to look into. So, I know in that first five to ten minutes are always little bit back and forth but that gives me an idea of what level customers are, what they have done so far, where the problem situations are? And then I try to bring them to couple of very interesting point which is standard in my performance tuning consultancy. But the point is these are the things which we need to do to identify better your problem is. So, definitely wait statistics where I start, then I start looking up their I/O distribution because I figured it out that most of the people sometime put all the data on a CD-Drive. They sometimes put everything on a D-Drive where log and data index is anywhere, backups are together. So if they have situation like this, ok, so you know something to be improved. And the third thing is that, you know, I try to run their long running queries, CPU intensive queries, I/O storing queries or and try to make a list of the queries. And I have figured it out that as soon as I start reach to that particular point where I start bringing up and throwing I/O storing or CPU queries, CPU intensive queries, customer immediately start recognizing them, “Oh, this makes sense. Oh this is also our problem.” Yes, this was the reason and we end up looking at that time together or comprehensively and then I put them together. I said, “Look at this. You are saying you have a problem at this particular query but reality was that this is what is happening.” Yesterday, I fix a very very interesting problem. And I was so proud of me when I fixed it. Maybe I would share with you in one minute and that would sum it up. A customer come to me and say, “This particular query suddenly starts slowing down.” And we’re like, “Oh ok, why is it so?” We try to look into it, and they say, “This query runs once a day only, use to run for two seconds now taking ten minutes.” And we’re like, “Ok, that’s interesting problem. Let me look into this one.” We looked there and there was, it was sending an email. And he was like, “Ok, it seems like sending an email.” Then we went back and try to. I said, “Ok, I do not know how to fix it but let me go and do an analysis of your system.” We did analysis of that system and figure it out that they have a problem in pretty much sending every single email not that particular one. But other one were asynchronous so they didn’t care much where the email comes so email that are coming to the mailbox are going to customer. But for this particular query they have received the report that sending email very late. Now, email was a problem so what does it mean? Is it the SQL is the problem? Well, it was in fact in that case because they had MSDB database where they were using SQL mail to send email. And MSDB database had all the history of email by default it stores and that email table of email table I forgot the exact name but that particular table was huge, over 10GB. And inserting data into that table was taking so long and so much time, and this is why the problem was for this one query. We fixed that problem in MSDB and suddenly the system picked up and they were saying, “I never knew I was able to get such a big performance.” Now, they had a problem in the query. We fix in something else, somewhere in the database and take out amazing performance. And this kind of challenges makes me happy that, you know, when how we diagnose it, how we figure it out, where the problem is, when we fix it, how we get performance. A long answer to your short question but I think it was impressive and I was very impressed yesterday when I fixed it and it was very interesting as well.

Steve:    Well, I know that definitely sounds interesting to me because it seems like anytime that MSDB grows to a certain size. It’s usually an indication that there’s something in there that is bigger than it should be whether it’s job history, or mail history, or other histories are being kept there. And I think that’s a common thing that people overlook and going in finding that and speeding up the email where the assumption is usually it’s going to be the query. But it turns out that sending the email is the real problem. I’ve seen that more than once and in MSDB in my experience.

Pinal:    Absolutely.

Carlos:   And I think Pinal puts an interesting twist on that first step and that is let the user explain what they think the problem is because at the least while, again in his example, it ended up being something different. They weren’t necessarily. Originally, they thought they’re going to be doing performance tuning on a query. It ended up being something very different. But when you report back to them, you need to indicate what you have found relates to the problem that they’ve identified so that they can feel like, “Ok, yes, now my problem is solved.”

Pinal:    Absolutely. And that’s what that gives them more confidence and I think hearing. One of the things which I see with lot of customers and lot of consultants is a little bit arrogance in them and sometimes I do not like it. They think they know and the customer do not know and they go there with such a heavy ego and said, “I know what I’m doing just step aside. Let me fix it for you. Just participate when I’m done with this, doing this. I will show you the internal things.” Any consultant, maybe let’s say, a plumber does to my home, or carpenter, or maybe any pesticides consultant. When they do this kind of thing, I also feel little annoyed. Let me explain my problem.  And I think that just let’s do even SQL Server because now I’m a consultant and I want to make sure that my customer or my client explains the problem, understands it, and I understand it. And I go back to them saying, “By doing this I fix your original problem.”

Steve:    Yup. So listen, understand then go track it down and put your ego aside and all things will come out good.
Pinal:    Absolutely. I think this is something, I think I shared with Steve as well as I never see like no matter what you do Steve, you never have any, or you always answer very politely. And you know so much about corruption which I learn just sitting with you like the other day. And you were so kind to just open a book and explain and that’s motivating me. See, this kind of thing about the community motivates me. People are willing to share and that’s what all about sharing and that’s why the SQL authorities are out like, “What I know, I want to share.” And just like all of you have been doing it. Like, Carlos this webcast is sharing, right? People are just learning out of hearing it. So, you know, we do a lot for community and we share. And I think it’s amazing that we have a feeling to give back to everybody.
Carlos:   So, we’ve talked a little bit about a scenario in which we kind of dug in. You mentioned wait stats is an area that you want to start looking for performance tuning, I/O, and then your data, and the file allocation.
Carlos:   Those seem like very. It almost, well, because not maybe a quite server setup but when I start thinking about performance tuning some of the, I just say the ground level problems if you will. You mentioned TempDB as well. Almost come back to how the SQL Server was set up.

Carlos:   And interestingly enough we have a previous episode in which we kind of went through. And talk about the checklist if you will of this are the things that you should be, the knobs you should be turning when you install your SQL Server and which you should be looking for.
Carlos:   It seems like in certain instances kind of going back to the beginning making sure everything is set up properly and then we can kind of get into the “hard stuff.”

Pinal:    Absolutely because, you know, execution plan and understanding the people’s query is so difficult thing to do. I mean, even if it sounds simple, we can definitely hit the execution plan and said, “Aha, there is a bottleneck. But aha, this one is not a good operator.” “What are the alternatives?” Not always we have alternatives because 95% of the time the developer comes back to us and say, “Well, if I change this I will have to do a lot of test, and lot of, I need approval, I need to produce that.” Or sometimes they just come back and say, “Hey we don’t have a control on the code because it’s a third party tool.” Sometimes they come back and say, “Look, we understand what you want to do but we have this business need and this is how we’ve been doing it. We don’t want to change it.” I figured it out that 50% of the time when  I want to change a code I get a pushback. I get people say, “We don’t want to do this.” And that’s why I think the focus of performance tuning is more about setting up right. And I think as you said, I think duty of SQL Server is, well it’s duty or whatever you want to call it, that it’s so easy to install. But it doesn’t tell us a lot of thing when you’re installing so you just go next, next, next, done. Wow, it’s so easy to install.

Carlos:   Exactly. Everybody can do it.Right, we want to make that as simple as possible.

Pinal:     Right. That comes up with this kind of issues like, you know, file root is 1MB. Your field factor has no relationship with your workload. Uhm, you have log file and data at the same place where you have probably C-Drive. And even though you have extra drive there is no knowledge for it. Your MDB will be just like that and a lot of people even do not understand that Model DB, they should not be playing around or they should be played only once they have understanding because when our new table or new database is created that’s a copy of Model VB. Now, this kind of basic things are missing in explanation, or understanding, or education. Whatever you want to or whoever you want to blame. But ultimately you end up being on the system which there is a lot of things you can fix. Our interesting story, I feel like telling when I mentioned the Model DB. One of my customer that I met said, “Every new database I create is created by 1GB.” And I was like, “Ok, maybe, what are you doing there?” And he said, “We are not doing anything. Every database is 1GB.” And I was like, “Interesting!” Went back and check. They have some kind of data loading done in the past in the Model DB because nobody knew what it is so there was  huge table was being replicated again and again kind of thing. And there was so many store procedures which was what’s used. And I was like laughing and I said, “Don’t do that.” And now they were very much worried that because some of the things they used and don’t know how to do the clean up. The older databases is now, oh it was so mess and hours and hours of consulting and where I was just sitting on the other side of screen and hearing their conversation about this one. And they come back to me and said, “Should we drop this or not drop it.” And we do some diagnostic so this kind of thing. So when people do not do know the right thing and they just to make a little mistake sometime they pay a lot on consulting hours, or developer Model DB afterwards.

Carlos:    Another interesting point you made. Talking a little bit about pride, now you kind of talked about it in the developer’s sense. But even as DBAs right we have the data, it belongs to us, we’re masters of the domain, uhm, and one point there are all these monitoring tools that are out there that will help us collect historical information about the SQL Server. I think at times we shy away from that and I know even when I first started in the past communities. Start learning about all the DMVs and you’re like, “Ok, I’m going to make this monitoring program if you will myself.” And I actually heard a talk from Adam Machanic, right, which I lot of respect for. He’s been on the show SP_WhoIsActive, kind of going through all of those  DMVs to collect that information and I remember him making the comment, “Oh yeah, but we put a monitoring tool on all of our environment.” Now it happened to be at an event of a company that has a monitoring product and I thought he was maybe just saying that because they’re at this event. So I followed up with him afterwards and I’m like, “So, is that for real?” And he’s like, “I don’t know. It’s for real.” He’s like, “Why do all that hard stuffs yourself? I mean people kind of figured that out. Get something that works for you, that will capture that history and then you can start digging in particularly I think in the beginning.”

Steve:     And I think that there are just so many different… Go ahead Pinal.

Pinal:     No, please go ahead Steve. No, no, I think your point is same thing I’m going to say that there are so many different things out there that does not one solution we can do. But I think you were saying the same thing.

Steve:     Yup. I think we’re, there’s just so many different ways to do what people are trying to do with different monitoring and tuning that anytime you can avoid sort of reinventing it yourself every time if you can reuse something or use some tools that are already there. You can save a lot of time and save a lot of money. And I think that’s one of the things that performance tuning side of consulting. Having those tools available to you can really make a big difference there.

Pinal:     Alright, and I think Steve, I think you also took an attempt building this one, right? I think you are the right person to talk about it because I think you have your own monitoring tool and which you have people like databasehelp, right, that just updated in October. I installed it in one of my computer and it does a lot of things. So yeah, I think uhm, and it does, I’m sure you built it by thinking that you want to solve some one problem and then you keep on adding more things as people requested. And I think that’s how the, I think the monitoring tool out in the industries are also done the same way. They come up with the one with a similar request and things. I think same thing happening with me and my consultation also that people keep on asking more thing to add and, uhm, they want to always solve a little problem but once the problem is solved they want to add more things. And yeah, so they keep on doing it. And Steve, maybe one good idea would be maybe at the end of the webcast or maybe on a podcast page you can add your databasehelp.com link. I think people can try this out and I see you just updated in October.

Steve:     Yup. You know, that’s a great idea. Thanks for mentioning that. I know that we try and mention it when we can. There’s more more people using it throughout the world every day. But yeah, it’s one of those things that it’s just a monitoring tool that’s there. And it does a lot of the things that people try and invent on their own and save a lot of time doing it that way.

Carlos:    So, let’s turn a little bit back. I know you had a presentation at past summit I think on performance tuning just a few weeks ago. Didn’t you?

Pinal:     Right, I do. Yes. It was fun, I had 400 people and I got amazing rating. I was so honored looking at what people just said good stuff about me. I was like, I was honored. I think I’m humbled. People are very very kind. Yeah.

Steve:     You know, that’s true. The past community. Yeah, you are good at what you do but I think the past community is an amazing group. There is a lot of kind people out there, definitely. So one of the things that was on the list there was about your transaction durability and its impact on queries. And that’s one, maybe if you can talk about that for a few minutes.

Pinal:     Well, that’s a good point, so again, durability I think is one of the very very confusing topic. Every single time I try to present it to the people and I learn this thing that even though we all believe that we know ACID, not all people just know ACID because it’s a theory. We do not understand any part of ACID. And we keep on talking about, “Oh, ok, oh ACID I know about it. The data should be durable. And data should be there what we updated.” They do not get it that each of the component of ACID is also related to the performance. And I think SQL Server 2014 come up with this new feature about durability of a transaction and it directly impacts how the performance works but not how the data ultimately is taking shape. So, like for example, so it definitely helps if you have an application where you are doing a lot of transactions. At that time maybe you can just delay about in the committing of your data so you keep instead of committing it every statement. You can just say, “Let’s delay committing the data.” And you take to build it up everything in the memory and at the end you commit them together and that just reduces your durability of data. What it means is that your data doesn’t go to disk but that means your operation is completed faster because everything happens in the memory. And then, it push back to the disk but if during between this little moments, if something goes wrong then your data durability will be impacted. It’s a very very powerful thing. And one of my customer started to use it and they were like, “Oh, it’s giving amazing performance.” But under the hood it’s not also true that you get performance out of it all the time. It has to be used carefully on those kind of application where you can get, if you want a control back from your statement, if you want control back from your store procedure very quickly. I think this feature can help you very much otherwise you would be just over utilizing the feature and does not invent at the point. It does not give you some advantages because lot of operations which we do in our SQL Server are serialized. So things which are serialized are going to be serialized and you will not get some performance out of it so, one has to be very very careful when they use this kind of feature. And that’s good question. Not many people understand it. So people try that and eventually said, “Ah, I don’t understand it. I’m going to quit on this.” So you need to identify the application and this is true for any feature. Don’t think about this one that a lot of features introduce in SQL Server 2014 and 2016 they help to identify where exactly you will use it. Otherwise, or using them does not give you any performance and people ultimately keep on saying, “What’s the good of this particular feature. It doesn’t do anything or there are so many restrictions.” Just like In-Memory, like a lot of people are thinking, In-Memories like not doing good for them. Actually, it was released initially but now I see a slow adoption for this particular product.

Steve:     Yup. And I think, I like to think all of those features, all those new things, all those different ways of doing that as different tools. And sometimes you, one of them is the right tool for the right job. And sometimes it’s the wrong tool for the right job or the wrong tool for the job. And I think that one of the things that sounds like that you do well with the performance tuning work that you do is truly understanding what’s the right tool. So you can use the right tool when it’s appropriate.

Carlos:    The absolute worst thing would be you enable that feature that you’re not quite 100% sure about or you don’t understand what the downside effects are, right? There while we might not agree with all of the defaults if you will. There is a reason why it is that way.  If you don’t understand what we changed when you enable or change defaults then you need to do it carefully.

Pinal:     Absolutely. That’s the very very point. And I think people also, I think to us learning is also reducing because our attention span is now less and less as we go forward. Like, you know, things are, different people do not have patience. And I think that is also changing how the consultancies are shaping up as well as how these features are coming up and tools. Like previously people have patience. Now, they just want to collect all the data and do things. Now, they just want to use the various tools and that’s why I think lot of vendors organisation, lot of talk about tools and let’s say, we all try to build something which will just automate things and will make our life easier.
Carlos:    Ok, so with that, I know you mentioned the adding index is where they’re needed or if you just have tables or just heaps. But if you have, could you talk a little bit in some of the examples of removing indexes in order to improve performance, maybe specifically around some of the DTA indexes.

Pinal:     Right, so I do not like DTA at all. Ok, some say bad stuffs about it because I’m sure people can go back to the episodes and see every single good consultant just saying bad thing about it. So I will just save time and I’ll say DTA has a lot of limitations. And matter of the fact, since it was released still today I do not see much of the enhancements in the product itself also. It will just analyze your workload and base on each query come up with some kind of queries, weird name of the index. And then, suddenly, it will create like 10, 20, indexes on your system and that they are not efficient. Sometimes they are on a single column, they are not on a multiple column and included index are coming, columns store index are coming, and lot of In-Memory and columns operation analysis index. All these things are still, I don’t know if it has caught up with the DTA.  I haven’t checked it in the recent times. So index is more and DTAs are just there. And a lot of indexes of DTA are not more used by SQL Server because workload changes. So one of the query which I give to the, it’s there on my blog which I give to the people is how they can run this any query and analysis and figure it out if the index is really used or not. If they are not using it they should go back and drop those indexes. So give them a drop script. And also one of the things which I do is that I go and check if there are table that has more than 10 indexes. Now, this number is dividable. You can say, “Oh, my number is 8 or my number is 5.” Ok, it maybe your number. So I think I reach to a point if I see 10 or more indexes on a system. I think that table will have a lot of trouble by inserting data in it. That’s what my end experience is. Again, you can hit on a 12 indexes or 15 but 10 is the number which I start with and then I tune myself so lot of people say index is a word for selecting data but it slows down inserts we know but we do not get about insert. This is what I keep on hearing. You know what; I want to add one interesting point here, if your inserts are going to slow so why insert or update this being happening or when their  happening. And that time SQL Server is going to put some kind of lock around those data. It may be a base lock, excel lock or table lock. We do not know what kind of update it is but if your update is going to lock certain part of the data. Now what happens if select comes to that particular part of data which is locked? It has to wait. So even though you have lot of indexes it is slowing down your update that may indirectly impact your select statement and your overall performance will go down. So do not be liberal by creating too many indexes so this is what I try to tell. So there is another script also which is how to figure it out which good index you have to create. So I would be happy to share both the script here and if there is a place in a podcast you can put them and people can just download those scripts and start working with them immediately and with more detail they can engage me for consulting. But I would like to, just like you, give all the scripts which is built for free.

Steve:     Yup, so Carlos, we usually have a place we can put links. We can probably put those links there as well.

Carlos:    Yeah, well our episode show notes today will be at sqlpartners.com/pinal, P-I-N-A-L. And you can, we’ll have all the links and everything we talked about in today’s episode.

Pinal:     That’s fantastic!

Carlos:    One question I had on the 99 minutes or kind of attacking those indexes from, now you do make the point, you kind of getting them 80% of the way there. And that is I’m assuming that you are coming at it from a DMV perspective. You know, SQL Server is collecting that information about how many times just requesting it and the impact that it would have on the queries. You kind of aggregate all of that up and then you make an assumption, you know, as to whether you should, you know, deploy this index or not. What happens because, do you have a minimum amount or minimum time of plan cache that you require before you start making of those assumptions?

Pinal:       Absolutely. So I request people, that’s the very very point. I always say, if the SQL Server has not started for 7 days I would like to touch it because if it is like one day or two days that thing is not going to cut it if it is a week-end of thing. So, I at least one your SQL to be running for 7 days under business workload, so maybe Sunday to Sunday, Friday to Friday, or something like that. And then I would like to touch it so that way we can figure it out that it has the right amount of the plan cache in it and I can do the proper analysis. If it is run for like one day, I think it’s not going to help at all.

Steve:     So that brings up an interesting point. How often do you run across clients that you are working with where they attempt to speed up their SQL Server by rebooting it regularly?

Pinal:     Okay, so number may be different, surprising to you. I like every single time I think there’s something about Microsoft, right, that you just restart when things does not work. My all customer is before they reach out to me might have restarted the system for at least five to ten times I just see that. And a lot of people, lot of customers are just ok to hit restart as soon as they like. Like one time we ask to change something like you know cause special for parallelism in a system and the customers say, “Now I have to restart it.” I said, “No, you don’t have to do it, it automatically takes impact without restarting.” Or they would not believe it until they restart it. So after changing every configuration even though I prove them with demonstration they would just go ahead and restart it. And I said, “What about your customer?” They said, “They will just think it’s time out.” And I just see this kind of thing and I think people restart it. Restarting is what they love. Restart is like caring over your system that’s what I think people come to a point. And matter of the fact while I was talking to you I just restarted my phone because it was a little bit slow. So I think we are all have built this kind of habit when things doesn’t work we just go and restart it before we invest our time in investigating the real cause.
Carlos:    Yeah, that may also be because we don’t know what else to do and so rebooting it make us feel like we’re doing something.
Steve:     Yeah, that’s what I feel when it comes to my cellphone.

Carlos:    Yeah, exactly.

Steve:     Alright, shall we do SQL family?

Carlos:    Let’s do it.

Steve:     Ok, so Pinal how do you keep up with the technology with all the changes that are happening especially with SQL Server and all the great things that are coming out recently.

Pinal:     Wow! This is a tough question. I honestly have a hard time keeping up with all the new things that are coming out. Like for example, I haven’t touch even R Language and R Services, and now there are so many things like PowerGUI is still so far from me. So I have a hard time to learn. I have a hard time to catch up with them. And I think I’m doing pretty bad job at it but there are couple of things which I have made a point to hear. So I hear podcast like yours, and second thing which I do is that I try to read lot of blogs because I figured it out if I’m not learning somebody else is learning so I go out and read lot of different blogs and try to learn that what they are talking about it. And I think if one of the thing which interest me then I would go and double click it. But previously there was a time when I use to open the Microsoft documentation and take each of the word and try to learn that, and dive deeper, demonstration on it. Nowadays, it’s just impossible so I would go out, read blogs, hear the podcasts, see what people are throwing words at it each of it. And I think one of the word will catch me fancy and I will invest time in it. That’s the only way I can learn and when I learn I blog.

Carlos:    Another good point there, right? We can teach somebody and then you can learn it.

Pinal:     Absolutely.

Steve:     I think when you say, when you learn you blog. Well, if you look at your blog that’s pretty apparent that you’ve learned a lot over the last 10 years with the number of posts that you have.

Pinal:     Yeah, thank you! It has been 10 years now. And yeah, I’ve been blogging every single day since I started. So I haven’t missed a single day and 10 years anniversary just happened this November 4th so yeah, I’m very proud and just like as a father would be. Blogs seem to be like something which is a baby and it became my master. It drives me different way now. It motivates me going. And yeah, so blog is like everything. My family run the rounds and feeds my family so it’s a father to us as well. So yeah, blog is playing multiple roles in my family right now.

Carlos:    Yeah, that is impressive. Ten years, everyday for ten years. You’ve set the bar pretty high there.

Pinal:     Thank you! No, it’s a passion. It’s a part of life now.

Carlos:    If you could change one thing about SQL Server, what it would be?

Pinal:     This is amazing question. I have thought about it actually in the past. Two things I want to change. One, first thing I just don’t like the default settings but they are there for some reason. So I think I want Microsoft to educate us what each of the setting does. So they are there in MSDN but when you are installing the product. What if you just stretch max so you could do your parallelism? Now what does it mean actually, right? You have to know. But there is pretty much no help on the installation page, or there is no link, there is no pop up. Nothing and what does it mean? What value I should be keeping? They have a pretty, something I really wish Microsoft can at least give some kind of question mark when you click on it and would go to MSDN page and read base on what each of the version says. This is the one thing I wish I can request the Microsoft team to change it and that would just make things much easier for people to consume. Because otherwise we are just doing searches on internet with each of the word we end up on a lot of bad advices also on internet and I think that’s what one thing I want to change. So these two things are related, one I want Microsoft to educate us about what are the defaults do and second thing I just want them to link us to write and authentic information.

Carlos:    Great point.

Steve:        Ok, so what’s the best piece of career advice that you have ever received.

Pinal:       I see, ok, I have clear answer for this one. Actually years ago, when I was attending one of the summit in United States and I was very much worried about how do I present myself, how do I talk to the people if my content is up to everybody because I see all these greatest people in the world now using amazing content. And I was like I do not know so much things and should I be on blogging, should I be on writing, I do 13 sequels for books. But before that I was so scared to writing one word I would end thinking like what people would think and everything. I think this is the time when I met Arnie Rowland who is from Portland. I expressed my concern to him and he gave me one line advice and said, “If you stop writing about others then and then you will grow.” So I think he said, grow up, write whatever you want to, face up the consequences of what you do, and just improve yourself but do not ever stop thinking what others will say so. That’s what he said. And I think that was so powerful that since that day I have never stop and I keep on going, keep on going about what I’m doing. I definitely improve. I hear everybody I learn and I pay respect to everybody and try to be a person who hears them and improve myself. And I think I learned because Arnie told me that I should be never stopping myself and that’s why the blog is now 10 years. So, yeah, that’s the best career advice and I think I’ll tell everybody out there, “Don’t stop with your passion. Keep on doing but stop and improve yourself and take a moment to fix if you are doing something wrong.”

Carlos:    Great stuff.

Steve:     I think that’s a great point. I know that every time I talk to Arnie it seems like he has always good advice for me so maybe we should have him on the podcast someday on career advice.

Carlos:    Yeah, sounds good.
Carlos:    Pinal, our last question for you today. If you could have one superhero power what it would be and why do you want it?

Pinal:     Oh, I know the answer to this one. And it’s Adrienne Frost, I think she is from the Marvel Universe, and she is associated with X-Men. Adrienne Frost has an amazing psychic power where she can just, I think they call it expert of the psychometry, so she can touch any object and instantly know history of many event concerning to the object. Like I love if I have that one, I touch anybody’s SQL Server and without running any diagnostic I would know what DB was done last evening. I would know which developer has put a malicious code.  I would know what kind of workload change was done in the last week. And I would know that who was the hacker or malicious guy before leaving the job would loop running inside the system. I want this power of Adrienne Frost is in me so when I go consulting I am the man. Every single performance tuning consulting will come to me if I have this particular power by just touching their SQL Server’s Management Studio or Server Node I would know what happened to it in the past with sometime even DMV fails to expose to us.

Carlos:    There you go. You wouldn’t be able to hide from Pinal any longer.

Pinal:     Sweet. Yeah, I really really want that particular power. So I keep on telling my customer that tell us what you did in the last time, last day and then and then I would know what. It was working before, now it’s not, something must have came in between.

Carlos:    Alright. Well, Pinal, thanks so much for being on the show today. We’ve enjoyed it. It’s been great.

Episode 72: Testing Automation for Business Intelligence

In the data space we’re hearing a lot about dev ops, continuous integration and the programmers are getting a lot of love there with lots of tools to help them; however, if you’re on the data side–not so much.  If you’re running Agile in your environment and then you’re trying to do data warehousing or other data development we’re not quite getting the same love. Sure, there are a couple of unit testing software offerings out there, but it’s a little bit cumbersome.  This episode our conversation focuses on testing automation in a business intelligence or data warehouse environment–even in an Agile system.  Our guest is Lynn Winterboer and she has some interesting thoughts at the intersection of business users, data professionals and tools.

 Episode Quote

“I would say what we need the business users to do, if we’re on the IT side, what we need from our business users is their specific expertise and we need to be using them for things that require their specific expertise.” Lynn Winterboer

Listen to Learn

  • Where the burden of final testing should be
  • Roles for the business and the IT sides for testing automation
  • Ways to get started with testing automation
  • How to find the product owner for the data warehouse
  • How your test environments might look different than they do today
  • Testing automation is the goal, but there are several baby steps to get there
  • Why automation is not the most complicated piece of testing automation

Lynn on Twitter
Lynn on LinkedIn
Winterboer Agile Analytics

About Lynn Winterboer

Testing Automation Business Intelligence“I’m primarily a business person who has also spent over 20 years working in the data warehousing world as a business analyst, project manager or being the business lead on a data warehousing project.”

Lynn is a an Agile Analytics Coach and trainer with Winterboer Agile Analytics based in Denver Colorado.  She has two daughters and works with her husband on the business.

Transcription: Testing Automation Business Intelligence

Carlos L Chacon:               Lynn, welcome to the program.

Lynn Winterboer:            Hi there. Nice to be here.

Carlos L Chacon:               Yes, thanks for coming on with us today. Interestingly enough, we met in Kansas City at a SQL Saturday out there and your session topic intrigued me. In the data space or I guess in technology space we’re hearing a lot about dev ops, continuous integration and the programmers are getting a lot of love there, lots of tools to help them with that however, particularly if you’re an … you’re in the Agile space or you’re running Agile in your environment and then you’re trying to do data warehousing or other data development we’re not quite getting the same love. Sure, there are a couple of other unit tests, testing software out there.

It’s a little bit cumbersome. It’s kind of hard to wrap your mind around so ultimately our conversation today is on that thought of testing automation in a data warehouse environment and then even a step further, those who are running an Agile. I thought you had some really interesting questions about it or thoughts around it and so we wanted to have you on talking a little bit about this. I guess, first just let’s kind of set the stage here and talk about why testing automation is so important.

Lynn Winterboer:            Okay. Thank you, first of all, for inviting me to be on the podcast. It’s great to be reaching your listeners. I do feel really passionate about this topic because … I’ll give you a little background about myself. I’m primarily a business person who has also spent over 20 years working in the data warehousing world as a business analyst, project manager or being the business lead on a data warehousing project. What I’ve found over the years is that, in general, the data warehousing industry doesn’t have a whole lot of focus on testing. We’re very good at things like data quality and data modeling and managing large volumes of data and managing complex transformations so we’ve got some really good strengths that lead to great business results but we also haven’t had as much support in the industry for testing as a discipline and as a skillset.

The reason that I really care about this is because it is not uncommon for the final testing to be on the shoulders of the business stakeholders and they are the ones who know the data the bet and they are the ones who will do the best job of validating that what the data warehouse team has built is correct. However, it is not the best use of their time, nor the best way to build a strong relationship with between the business and the development team if what they’re catching are errors that we should have caught early on. You combine that with … We put a lot more burden on our business stakeholders than a lot of development teams do, I think, in my experience. You combine that with the … A lot of the Agile concepts that are really, really … they’re exciting and they are effective and they’re guiding projects really well. Of pulling your testing earlier and making testing something that the entire delivery cycle is involved in, or quality I should say, more quality than testing.

Some of the Agile concepts like having acceptance criteria to each user story or each small requirement and the acceptance criteria’s expressed from the point of view of the business of, “Here’s how I’ll know that you’ve met my need, when you can show me this.” If we can take that acceptance criteria, turn it into an actual test that everybody, the business, the developers, the testers can agree is a good test to prove the development is complete and then automate that, you have a really, really nice set of not only automated tests that prove to the business that you’ve done what they’ve asked but also regression tests that can be used down the road, which is … I would say really critical to being able to move quick and to be agile, regardless of what Agile framework you’re using.

Carlos L Chacon:               No, that’s right and it’s interesting that you mentioned kind of lessening the work of the business holders because, interestingly enough, in our last episode, episode 70, we had Kevin Wilke on. We were talking a little bit about Excel and kind of using it for the analytics piece and we actually, from the tech side, we actually came to it like, “Hey, the business users need to do more,” and so not it’s interesting to kind of throw that back on us and say, “No, no, no. We have enough to do as well.” We need to figure out some way that we can move forward. I think it is interesting that we all do need to kind of get together on the same page and put some of these things in place so that we can move faster altogether.

Lynn Winterboer:            I would say what we need the business users to do, if we’re on the IT side, what we need from our business users is their specific expertise and we need to be using them for things that require their specific expertise. If we can pull that expertise earlier through asking for acceptance criteria, that’s a … and clarifying that acceptance criteria. I did a webinar a couple of years ago where I said the most common acceptance criteria in the data warehouse team is the least useful and that is our acceptance criteria is that the data is correct. No duh. That’s our job, right? That is what we do. That’s what we focus on. That is what we’re dedicated to. How do we know what correct means from your point of view, Mr. and Mrs. Business person? I do think pulling the users in early to have concrete discussions about what do … I hear that you want this. What do you mean by that?

For example, if somebody says … I’m going to give a very, very simple example here just so that nobody gets lost in the domain. Let’s say they want a summary of product revenue by quarter, so revenue by product by quarter. We say, “Okay.” We have our common conversations where we say, “What do you mean by product? When you say ‘revenue by product’ is that the top three product lines the company has or is that the 30,000 skews that we have.?” We have that typical conversation but it’s also helpful if what we can come out of that conversation is them giving us a set of orders, for example, that if you can calculate product revenue by quarter correctly on this specific set of records, I will know that you’ve calculated it correctly because these records have the … They represent each of our major three product lines. They go down as far in terms of they’ve got all the right skews in them that I want to look at and, actually, in that conversation we might come out that it’s not just product revenue by quarter, which might by, say, three numbers if you have three product lines, three summaries. It may be that they’re looking for 30 numbers because when they say “product” they mean throughout the whole product tree.

Coming up with concrete actual examples is one of the most important things. Honestly, as a business analyst working in data warehousing, that’s been one of my go-to tools since the ’90s and lots of good BAs who worked with data teams. That is their go-to tool is an actual example of this. What do you want? What that does, if they can give us a handful of records, meaning a defined set of records … Even if it’s all the orders in Q-4 of last year, that’s okay. It’s a defined set and they can pre-calculate themselves, so this gets back to maybe the Excel comment you made earlier. If they can pre-calculate themselves what they expect that number to be for product revenue by quarter or that set of numbers, then we have something we can test against. We can automate it. You have to know your expected result to be able to automate something. If it takes a human to look at it and go, “Yeah, that feels right,” it’s hard to automate. You can automate the running of the test but you can’t automate the assessment of whether they’ve passed or failed.

Steve Stedman:                Lynn, what you’re describing there sounds a lot like what I’ve observed in development organizations where you have backlog grooming and planning meetings and the product owners are there and the developers are there discussing the user stories and acceptance criteria. Do you see that the business people are getting in the same room with the analysts that are in the same room with the developers that are building out the data warehouses or the BI structure and working in the same way development teams would there?

Lynn Winterboer:            Yes. That is what I’m talking about and you mentioned the role of product owner, which is an important role and typically that person is from the business and is very knowledgeable about the business. I shouldn’t say “typically”, by requirement they are. A product owner is supposed to have three criteria. One is they have the knowledge … I’m going to start with one is the authority to make prioritization decisions and typically that’s somebody who’s pretty high up in the organization in the business organization who has that authority or that authority’s been bestowed on them by somebody high up. They have the knowledge to do the deep digging to come up with, for example, the subset of records and what the expected results might be in this case and they have the bandwidth to do all that work.

It’s pretty hard to find a person in a large data warehousing team supporting a large company. It’s going to be hard to find somebody who has the authority, the knowledge and the bandwidth to represent all the various business entities that that warehousing team supports, even if it’s a specific project, let’s say, for a finance team. You probably would have a good product owner might be a financial analyst who’s worked at the company for a long time and who has the trust and support of the CFO to being making some of these prioritization decisions, or at least to be going out and getting the information on prioritization from the executives and bringing it back.

We do expect the product owner to be deeply knowledgeable in the business and the business needs, however, that doesn’t mean the product owner is hung out to dry all by him or herself. It means they are expected to bring in the people they need to do to get the … to meet the need, the need of product ownership. I’m starting to talk about product ownership instead of product owners because I think it’s a need that needs to be met. Then, you have a product owner who’s that one point of contact but they would bring into this up front discussion the right people they need to have in it from within the business. Even if we-

Carlos L Chacon:               I think even another critical point that you made there is that I think a lot of times, depending on the organization and Steven mentioned some of the steps that the Agile takes to try to convey an idea, but it still seems like a lot of times in the BI space they’re kind of like, “I need this report,” right?

Lynn Winterboer:            Right.

Carlos L Chacon:               They rip off the Excel spreadsheet, “I need this.” The IT folks start thinking about how, “Okay, how’s this going to look? Drop downs, drill downs,” all these kinds of things where they probably used to say, “Stop,” like you mentioned, “Let’s go back. I’d like to see you calculate this.”

Lynn Winterboer:            Yeah. You know, I actually think Excel is an incredibly useful tool for data warehousing. I know I have there are lots of people out there who would slap me for saying that but if I can get somebody in the business to do a manual export from their sources system and show me, in Excel, without re-keying anything, using pure formulas and referencing of cells, what they need and that meets their need that is a great prototype and that goes a long way toward helping us understand what’s needed. What I would suggest is that we would … limit the scope of the input data for acceptance criteria and acceptance tests to say … And again, it’s something you want to be able to pre-calculate.

I had a great customer I worked with. Our product owner at that point was a senior director in finance and she finally came down to, just to make the story short, after lots and lots of testing and lots of going down rat holes and discovering that the rat hole was because of a data anomaly that’s not likely to occur in the future, she finally said, “You know what? We have grown by, our company’s grown by acquisition for twenty years. We have a lot of weird date from the past. We might get weird data in the future but in doing what we need to do to meet the company’s needs right now, I don’t want us spending time on stuff that’s not going to happen in the future because we’re going to have a cutoff date and then we’re only going to be looking at future stuff.”

She came up with 16 orders that if we ran these 16 orders through and applied all the revenue recognition rules, of which they were many and they tripped all over each other a lot, she said, “If you can get these 16 orders to come up with the revenue recognition that I have allocated in this spreadsheet purely by formula, I will know that you’ve got 80 to 90% of the need met and then we can go down the rat holes of the corner cases and decide which ones are important enough to tackle before we go live.” She said, “I don’t want anybody looking at any other date but these 16 orders until we’ve nailed them.” A big part of that, which is not uncommon in data warehousing, is the business was … They had a revenue recognition system that was 10 years old and had been built and maintained by a single individual who had been just taking orders from the business and doing the best she could and what it turned out is there were very complicated rules and there were some conflicts.

There were lots of business decisions that needed to be made so this senior director of finance was able to say, “Once you get these 16 orders to flow through and give us this, then I’ll be happy and I’ll be confident I can tell the CFO we’re on track.” I think that was a great example and that was something we could automate. We were able to automate those tests so every time we put a new rule in, we could run the test suite and in a second know whether we had broken a prior rule or not.

Carlos L Chacon:               I was going to say that that’s where the next kind of transition or difficult part is, at least when I working with data warehouse teams, is then to limit them to a certain amount of data. It seems like they’re always, “We want a refresh of this data so we can do our testing because if we can’t have all the data then we can’t do our tests.” It sounds like if you can identify what that good data looks like, then you can go off of that and then again, like you said, looking at other scenarios further down the line.

Lynn Winterboer:            We do … we still need that good set of data further down the line and so another challenge that I think data warehousing teams face is that our testing environments, typically there’s a couple of problems with them. The first one being they’re typically shared testing environments so if you’re doing an ERP implementation and you’re building reports for that ERP or off of the data in that ERP system, your testing environment from a data warehouse perspective will typically be the ERP’s testing environment.

If you think about people implementing an ERP, they’re testing out workflow processes. They want to know that it can handle all these different workflow scenarios and so they’re basically creating a bunch of junk data to test to prove or disprove that the workflow works. That’s not useful to use in data warehousing. We need what we’ll call quote-unquote real data and junk data doesn’t do us any good. It’s not junk to them. It serves a purpose to them but we’re at cross purposes so that’s the biggest problem I see. It’s not uncommon for large, I’ll say ERP systems or CRM systems again, to not want to refresh the data in their testing environment but once every six months or once ever year because they’ve got all this testing in progress and it’s going to interrupt their testing.

First of all, I think we need separate testing environments. We get one that has a really good set of data that doesn’t change and that is reliable to be realistic as to how the business data looks and they need a different environment. They need a much smaller set of data. They just need to play around with creating records and pushing them through. The other thing is our testing environments tend to be much smaller and not configured like our production environments and that causes us a big problem because if they’re much smaller, we can put a whole lot of data in them. If they aren’t configured like our production environment, then we have all these potential huge problems when we try to deploy to production of things we couldn’t catch in test because it’s a different configuration.

Then I think the third big problem for us is actually what kind of data can we pull into our test environment? With data breaches and data security being so critical these days that if your test environment is not in the same security configuration as your production environment, which it typically isn’t, you’re often not allowed to pull quote-unquote real data in to test because of BI data or financial data or HIPPA regulations. That makes it really hard for the data warehousing team to test on quote-unquote real data because we were not allowed move it.

Some teams are deciding to … The teams who get the magic wand and they can get whatever they really want to do this are getting basically a copy of their production data warehouse behind the same firewalls and in the same security standards as their production systems that they’re drawing from and … are able to have, whenever they want, a refresh of either a subset or a full refresh of data from production. That’s pretty hard to get, financially, but I’m hoping that the benefits of test automation, which is one of the key components of continuous integration, will start to resonate within the executive world in terms of if we’re going to support our software development teams in continuous integration and configuration management and code management and test automation, that we want to support our data teams as well. It might cost something we weren’t expecting but we’ll see where that goes.

Steve Stedman:                Okay. Then at that point you’ve got your test data, whether it’s a complete replica of production or some subset of close to production or whatever it may be there, and you’ve got your acceptance criteria that you talked about and then you have the 16 orders that you mentioned that if these orders go through, that’s representative of everything. Then what is the path from there to really automate that and get it into a process where it doesn’t take someone to manually go through and … I don’t know, manually test that?

Lynn Winterboer:            Manually test it … Actually, that part is the easy part, frankly. Right now you have a business person saying that but the demonstration that you saw, Carlos, at SQL Saturday, came from a group of my data warehousing friends. We were sitting around having coffee one day, talking about life and stuff and I started asking them like, “Come on, guys. Why is it so hard to do test automation?” They all looked at each other and said, “It’s not that hard. We just don’t do it.” I said, “Really? Okay.” One guy had done some test automation in the past and he’s like, “The mechanism’s pretty simple. It’s just … “I can honestly say I think the reason a lot of data warehousing teams don’t do it is we’re a very vendor-driven community, and have been for years, in the sense that …

The example I give to demonstrate this is, if we’re looking to hire an ETL developer and the language, the software language that the ETL developers typically use is SQL, we don’t put an ad up for a sequel developer like other teams put an ad out for a Java developer. We don’t do that. We don’t even put an add out for an ETL developer, typically. We put an ad out for an Informatica developer or a DataStage developer or an Ab Initio developer, which are products that these developers use to do ETL, which is extract, transform and load. If a company’s looking for an Informatica developer and you’re a DataStage developer, they don’t want to talk to you. You won’t even make the first cut to be able to talk.

I have several friends who do development in ETL tools across a variety of tools and they say, “It’s not that hard to cross tools. They’re not that different,” however the industry still isn’t there. I think our vendors have done a good job of becoming very sticky. They have differentiated themselves from each other. They each have their own semantics. It was by design and I’m not upset with the vendors. I think they’ve done their job well. I think we need, as an industry, to start challenging that and stepping back and abstracting it a bit and saying, “In the end it doesn’t matter what tool you use. It’s the skill to know how to use that tool to do something as efficiently and effectively as possible.”

That is why, since there are very few tools out there that anybody has delivered as vendors for testing the data warehouse or for doing test automation … There are tools out there that’ll talk about testing the data warehouse but they’re really where you store your tests so you can manually run them and record the results. For test automation there are really only a handful of tools created by vendors that are available to data warehousing teams. They’re really pretty new to the market. It’s really a blossoming market.

Carlos L Chacon:               Right.

Steve Stedman:                Then those tools you’re talking about there, those would be created by specific vendors like Informatica for their specific product. Is that what you’re saying?

Lynn Winterboer:            Yes, Informatica does have some testing capabilities built into its tool, which is great and if, for teams who are suing Informatica and buy that additional module, I think that’s certainly worth looking into. I don’t know a whole lot of teams that are actually using it. Maybe just lack of exposure on my part to the right teams. Frankly, I don’t know a whole lot of teams that are doing test automation, to be honest. The other tools that I think are worth looking into, one is called Ice DQ, I-C-E like ice cube, D-Q for “data quality”. Another one is called Tricentis. It’s by a company called Tosca. Maybe the tool is called Tosca and the tool is Tricentis. Anyway, they’re out of Europe and I have European data warehousing friends who are using it to there’s their data warehouse so I know for sure that it’s being used and applied that way. Another company is called QuerySurge and then another one is called Zuzena, Z-U-Z-E-N-A. Those are a couple tools that I’ve been learning about lately that teams might want to look into.

Back to your question, Steven, “What does it take to automate it?” The automation piece is not the complicated piece. You really just need to define the tests, store that definition in a database, store the test records, which specific test records do you want to run through for that test, what is your expect result for that test and then have something do a comparison between what actually happened when you ran the test and what the expected result was and to then put the … Basically … Excuse me. Basically record the … Sorry, that totally stumped me.

You want to have … Then you have a tool that’s something that does a comparison. What actually happened when you ran the test versus what were expected results and typically we record all of the results, passes or fails. Typically it only notifies a human if it’s a fail and there may be certain tests you say, “I don’t really want to know right away if this one fails but I want to see it in the log,” or something like that. Other ones you say, “Yes,” you want the developer who last checked in the piece of development to get a notification immediately when something fails. It’s really not that complicated, the mechanism. The hard part in test automation, frankly, is deciding what’s that acceptance criteria that makes sense to the business, what are the examples of records that prove or disprove? Not only do you want to say, “If you can run these five records through and they all calculate correctly, I’m happy, but here are five others that shouldn’t be included in the query and we want to make sure they’re left out,” or something like that. You have to do your positive and negative testing.

That testing mindset and the skillset that testers bring into a team is something that we really need to embrace in the data warehousing world and get them to come into our team. We have to have patience with them because they don’t understand data versus GUI and they have to have patience with us because we don’t understand testing discipline and strategies. If you work together, you can really come up with some powerful things.

Carlos L Chacon:               I think that’s the key point. You said, “Oh, it’s not that difficult.” I think it’s that coming together because the IT teams, they can definitely build the tests but they need to know what the rules are, right? They’re going to get the results from the business and they can’t get that rule from a, “Here’s my report, I want you to build it.” They’ve got to understand what that … It’s almost like back to the beginning what we talked about. It’s, again, to use I guess another Agile word but that cross-platform teams to kind of come together and share information. I think that is what will kind of get us headed in the right direction and, of course, the tools and how you go about that, you can fight that another day.

Lynn Winterboer:            Yeah.

Steve Stedman:                It’s almost like, at that point, it’s more of a culture shift for an organization to break down walls and have that cross-functional capability there.

Lynn Winterboer:            It is.

Steve Stedman:                That’s almost more difficult than the technical piece sometimes.

Lynn Winterboer:            Oh, I think it is. I really do and when we talk about Agile, we talk about even a single data warehousing team or a BI time having cross-functional skills within a single team of say seven people and what we mean by that is we’d really love to see these data teams evolve to where, within a single group of, say, seven plus or minus two people, you’ve got the abilities to do the analysis, the work with the business, then do the data analysis, then do the data modeling, do the ETL, do the BI metadata work or the cube work, do the BI front end and do the testing and quality assurance. It doesn’t mean you need a human that does each of those things. It may mean that you grow your team so that a single human has skills across multiple categories there.

Then we could extend it even further and say a really agile organization is going to have … They’re going to look at things from the point of view of a value chain so if you look at a business value chain … Let’s just say “supply chain management”, that is one of the typical value chains. What we would do there, if were going to be a really agile organization, is we would have the data professionals, the people with these skills from the data perspective and the data warehousing perspective … be closely aligned with the people in the ERP system who are also closely aligned with the people in any other system that plays in that value chain.

What I’d like to see is where data warehousing and business intelligence and reporting and analytics are part of the value stream of a business process. They’re not and also … “Oh, yeah, we also need reports.” They’re not a red-headed stepchild of IT. They’re really a valued part so as you’re building out the software to meet a business need you’re also looking at building out the reporting very soon thereafter in small chunks.

Instead of an entire ERP implementation, for example, you might start … I’m just going to pick this one out of the blue. You might start with shipping data and say, “Okay, we’re going to make sure our shipping component’s work correctly in the ERP and meet the business’s need and then we’re going to do our shipping reporting before we move on to order management,” for example. I don’t know. However your order is that you do these things, it doesn’t necessarily have to be the order in which the value chain flows. It might be the order in which the business rules are more clearly nailed down by the business. You tackle that first while the business is trying to figure out the business rules in the more complicated pieces of the value stream. That’s where I’d really like to see it and I’d like to see that sliced very thinly to run from a business need through front end software through data warehousing to the end before they move on to the next little slice. That’s my dream.

Steve Stedman:                Okay.

Carlos L Chacon:               We should all dream big there, right?

Lynn Winterboer:            Yeah, exactly.

Carlos L Chacon:               To kind of put it all together and to kind of see some of these steps, I think, when we look at implementing data warehouse test automation, the first components are code cultural, right? Getting together with your analyst or whoever’s giving you the spreadsheets and giving them, “This is what I want this report of,” then they have to help you identify the rules and the process of how they go to where that report looks like. We can then take that and that set of data. We then will put that into our test environment, if you will, and that’s what I’m going to run my development against. It doesn’t necessarily need to be a whole copy of my data warehouse because I know the rules that are going to be in place and I can test against that core set of data. Do I come up with the same numbers I’m expecting? Yada, yada, yada, why not?

Then when I move and I go from there, I can do the automation, run through the bill. Then I go to the next step and then when I’m ready to actually present that to the user, and let’s just say we’re going to the next quarter, so the data I have is … Because we’re sitting here in the fourth quarter of 2016, I’ve got data for the third quarter of 2016. When I start looking at fourth quarter data, I’m going to apply those rules I show that back to the business and then they say, “Well, this number looks a little bit odd.” Then I can go and say, “Well, I’ve applied these rules. My third quarter numbers still look good so what is it? Do we have a change? Is there a new product? Is there some other component or rule that we didn’t take into account?” Then we can kind of start from there. Is that a fair kind of-

Lynn Winterboer:            That’s exactly it.

Carlos L Chacon:               Yeah.

Lynn Winterboer:            Yeah, I think it is. I think what’s good about the process you just described is that if you’ve already validated you’ve got Q-3 running right and it looks good and then something looks funny in Q-4, you’re right. What you’re doing is you’re narrowing the range of areas you would look for an issue so it’s going to help with troubleshooting to say, “We’re confident the math works correctly. What else could be skewing this? Is it a change in the business process? Is it a change in our source system, how it records things?” Any of those things, it’s going to narrow down the scope of what you have to go troubleshoot and that is really, really useful to a data team and to the business people because we can have some confidence in certain things and then say, “Here are the unknowns that we’re going to go investigate.”

I think … Then, with regression testing as well, if you have regression testing at a very micro level … Not too micro but unit tests and then the next level up might be acceptance test. If your regression test suite is a combination of unit tests on specific transformations, for example, and then acceptance tests that might be a little bit broader and more business-facing but you’re automating those, you’re going to know pretty quickly what broke and why it broke because it’ll be down to … Let’s say even down to a single field level. “This field is no longer flowing through a system the way we thought it should continue to do so. Nobody has intended to change that but now this field isn’t behaving as we expected.” It’s going to save you a ton of time and troubleshooting and it also really …

I think test automation eventually, Steven, you mentioned the cultural changes. Test automation’s going to help reduce the amount of finger-pointing or anxiety might feel of like, “Oh gosh, something’s broken,” and suddenly everybody’s covering their own bottoms over it. If you’ve got test automation and you’re … The test you’ve automated are pretty micro level, you’re going to know exactly what broke and there’s no need to be guessing or speculating or finger-pointing. You’re just going to go fix it. You’re going to go solve the problem. I think it does help take the emotion out of the reaction to testing results.

Carlos L Chacon:               One last thought I wanted to ask here. Ultimately, this testing sounds a little easier when we are looking backwards but I know a lot of our listeners are going to be screaming at me and saying, “Well, what about, how do I do that for a new dimension? I want to create something that I don’t currently have rules for.” Does it still apply or …

Lynn Winterboer:            I think it does. I think you have to speculate on what … You have to create some, say in my 16 orders example, you might create order 17 and 18 that have the characteristics of the new type of order that’s going into production or you might create a whole nother 16 orders that add these characteristics and then see how that plays out. You may have to create some … You have to mock up the data, basically. I think in the software world they use terms like “mocking and scaffolding” and we can do that in data warehousing, as well, and in BI.

I do know some successful Agile teams where the BI team is separate from the data warehousing team but they’ll have a joint design session and the BI team will do some scaffolding. They will create structures that don’t yet exist in the data warehouse and populate those structures with data as they expect it to come from the data warehouse and then they’ll build from there. It takes some tight coordination between the two teams but at least the BI team can be moving ahead so that when the data warehouse does have quote-unquote real structures and real data for the BI team to pull into their tool set and into their environment, they’ve got a head start. It may not be perfect but then they tune it and tweak it to make it fit instead of starting from scratch.

Steve Stedman:                Interesting. Okay.

Carlos L Chacon:               Should we do SQL family?

Lynn Winterboer:            Yeah, let’s do SQL family.

Carlos L Chacon:               How do you keep up with technology changes now? I guess you mentioned you’re kind of a business person but you obviously you’ve doing a little bit of cross over here and we’ve been able to get along so we’re going to-

Lynn Winterboer:            Yeah.

Carlos L Chacon:               How are you kind of keeping up with changes, I guess even in your Agile coaching as business methods and things change, how do you keep up with that and kind of stay on your toes?

Lynn Winterboer:            It actually takes a lot of energy to do it. I am looking basically at two industries and then trying to find their overlap or their synergies. I have my Agile world and I have my data warehousing world and it’s important to me to keep up with both of them. I go to a lot of local meetups because that is where not only do I learn new things and hear how people … Somebody might use a term that I don’t understand and I can just ask them and say, “Oh, what do you mean by that?” Then they define it and I go, “Oh, okay, that’s something new that I haven’t heard about.” I also go to national conferences like the one … I guess SQL Saturday Kansas City, for me it was national because I had to fly there from Denver. I try to go to those types of conferences.

On the data warehousing world I’m an analyst with the Boulder BI Brain Trust, which is a group that meets on Fridays. Not necessarily every Friday but some months it’s every Friday from roughly nine to twelve-thirty and we meet with vendors and they share us their tools. It’s anything related to data warehousing or BI or analytics. We get a portion of time where we’re tweeting like crazy. It’s a public portion. We record those and if you’re a subscriber to that organization you can go in and view the recordings.

We also then go into an NDA section where the vendors will ask us questions about what we’re seeing on the marketplace and what direction they should go or they show us new stuff they’re thinking about rolling out and we can give them feedback. The vendors get some good feedback from this group as well, as us learning about them. I don’t go to all of them because it’s a big chunk of time but I really try to go to as many as I can, especially if I think the tool or the vendor that’s presenting has anything to do with Agile enablement for data warehousing. I think one of the most important things is I keep a list of experts when I meet them so when I went to SQL Saturday in Kansas City, I met several people who i will now have in my own CRM as somebody, these people are really good with this type of thing or that type of thing. If I have a need or a question, I know who to reach out and I maintain those relationships and I hope they will leverage my insight and knowledge, as well, and reach out to me when they have questions.

Steve Stedman:                Okay. Lynn, what is the best piece of career advice that you’ve ever received?

Lynn Winterboer:            I’m actually going to give you two but I think they’re related. The first one that comes to mind is really about trusting your higher power. The lady who shared it with me was a woman who was very successful in her data warehousing career and with her company and I asked her at some point, “How do you get your career to be as good as yours? What is your secret to success?” Frankly, she said, “I pray. I pray a lot.” I would translate that to say, pray to your higher power, trust the universe, listen to the yearnings of your heart because I think those are given from above and at leads to finding the work that makes you want to jump out of bed in the morning and excited to go to work.

I know, for me, that is what led em to bringing Agile and data warehousing together. I love the data world. I am hooked. I’m a true data geek and I love the people who work in that world. People don’t go work in the data world if they’re lazy or dumb. Data warehousing is full of some really smart, intelligent, hard-working people. The Agile piece is what brings the joy and the trust and the cultural changes that make it exciting to go to work there so I really decided about five years ago I just wanted to bring the two of them together and that combination, those synergies and bringing those together is what makes me want to jump out of bed every day, so sort of a two-part answer.

Steve Stedman:                Okay. Great.

Carlos L Chacon:               Very nice. Our last question for you today, Lynn, if you could have one super hero power, what would it be and why do you want it?

Lynn Winterboer:            Let’s think. That’s a great question. Should’ve known after meeting you guys at the SQL Saturday that that would be one of the questions. I would say I would have the ability to gift to another human being the gift of deep empathy, to really be able to understand your … somebody who’s very different from you. Where are they coming from? What are their motivations? What are their fears? I think the world is really a scary and sad place right now and I think empathy … I’ll distinguish empathy from sympathy. Sympathy is saying, “There, there, I feel bad for you.” Empathy is saying, “Wow, I can’t … That must be really hard. I’m imaging if that happened to me.” I really think if I could do that, it would be the ability to give the gift of deep empathy. Knowing myself, I would have to not have a super hero costume or a cape because I get very distracted by that external thing. Be better if I could walk around kind of invisibly gifting empathy to people.

Carlos L Chacon:               I was going to say, if that gets out and too much empathy, people might start running away and be like, “Ah, no!”

Lynn Winterboer:            I know. [inaudible 00:51:40]. Yeah, I’d maybe have to have the ability to be invisible and give empathy.

Steve Stedman:                Okay.

Carlos L Chacon:               Lynn, thanks so much for being on the show today.

Lynn Winterboer:            Thank you, guys. It’s been delightful.

Carlos L Chacon:               Yeah, we do appreciate it.

Episode 71: BI Tools of the Trade

It is easy to feel like the odd man (or gal) out in the SQL Community.  We are constantly hearing about new features, products, and architectures and our companies may not always have the same enthusiasm when we describe what could be.

This is probably even more true in the Analytics space as much of the marketing has been pointed in that direction.  Even PASS tried to create a conference just for Analytics.  While there are some amazing new technologies out there–we might–GASP–not need them in every scenario.

Our guest today is Kevin Wilkie, a BI architect with Innovative Architects and he talks to us about some of the tools he uses and spoiler–they aren’t all that fancy.  He also talks with us about how he manages the tech and business users so each feels comfortable and each can get their work done.

 Episode Quote

“Excel’s that tool that’s sort of magically hiding in front of everyone.”

Listen to Learn

  • Some of the tools Kevin uses in his analytics role
  • The criteria you should use to determine whether you should use a new tool
  • Why using simple tools might hold an advantage over ‘better’ tools
  • The data profiling task in SSIS

Data Profiling Task
VLOOKUP
Excel Solver Tool
Excel Data Analysis
Master Data Services

About Kevin Wilkie

BI ToolsKevin Wilkie is BI architect with Innovative Architects.  He is a community participant and helps organize SQL events in the greater Atlanta area.  He is a graduate of Southern Polytechnic State University.  He also gives the SQL Data Partners podcast two thumbs WAY up.

Transcription: BI Tools

Carlos:                  Okay, well Kevin, welcome to the show.

Kevin:                   Thank you Carlos. Good to be here.

Carlos:                  Thanks for coming on with us today. So compañeros for those you on the east coast, you might have run into Kevin, he’s just out the Atlanta area. A bit of a big, I’ll say supporter of the show, particularly, kind of early on getting some supportive feedback from Kevin was helpful to me. I appreciate now, that we’re able to have him on and talk a little bit with him about some of the things that he’s doing. Ultimately, I think our idea today, or our focus is going to be that, the analytics environments, have started to take over more and more of the attention that the data world is getting. Lots of focus is being placed there, and I think as administrators we want to be able to start looking there, how do we potentially move over into the analytics world, and what tools are those analytics folks using. Ultimately, I think that’s where our focus is, tools, analytics, how and when to appropriately apply them.

We talk about these tools, and ultimately we’re talking about the Microsoft stack, right? These are going to be tools you’ve heard of and probably installed before. Let’s just go ahead and start that conversation, Kevin, why don’t you talk about some of these tools that you’re using for data analysis.

Kevin:                   Thank you Carlos. I think one of my favorite tools, since I’ve been doing SQL for years, is SQL itself. I just get into the actual SQL itself to play with the database and that simple queries just so that I can see the overall form of the database of many of the tables. Don’t get really into the nitty-gritty as much, I try not to in SQL Server itself, but I like to at least the overall form, we see things like the foreign keys, the primary keys, any constraints that the table will have. You know kind of where the data will lie. That’s where we’re going to try to stay, just at that level with SQL server itself, and for data analysis in the overall distinct values there are, how far the field exists in the table, like looking in the max, or things like that.

Carlos:                  I think maybe, let’s just step back just for a moment, and talk about, we’re talking about analytics, you mentioned you like to spend your time in SQL Server, I think that most people would say that is true. Ultimately, when we need to start looking at tools, is either because we have either questions that we want to ask the data, that we can’t get currently. Or we want to look at the data a different way, or present that to other people so that they can review it and ask questions of that data. Right?

It’s interesting, you went from the SQL Server, and then kind of into SSIS, but you used that as more of an ETL, it almost sounds like a data cleansing, or a data integrity operation, rather than moving it from one place to the other.

Kevin:                   In this case, you’re right. It is just, for this one SSIS is fairly good at just, like the task itself says, its profiling data. You’re just seeing the basic outlines of how far it can go from min to max, maybe a few averages and standard deviations, but not overly heavy on what you can do with it, and how much you can see with it.

Especially, the biggest thing I see is how many nulls were in the table, because if you have a four hundred thousand row table, and three hundred and ninety eight thousand are nulls, usually the business needs to know that, we need to rethink why that field exists in the table.

Carlos:                  Sure. There’s not enough data there to validate, or use that as a valid option.

Steve:                   Just a question around that Kevin, I know the data profiling task in SSIS that you mentioned, that’s not something that I have myself used in the past, but the things you’re describing there, are things that I would normally have done in SQL Server and TSQL. I’m just curious, what are the benefits of getting it that with the data profiling task, rather than just running queries to do it?

Kevin:                   Because with the data profiling task, I can do it a little bit faster, and I don’t have to do as much work … As much work to actually seek for each, let me back up … For example, if I wanted to do the min/max, standard deviations, and averages, I have to do a little bit of work on a number, for each field, I have to write different equations for SQL.

Steve:                   Okay.

Kevin:                   Whereas, data profiling task, I can do it in one shot.

Steve:                   Okay, that makes sense, sounds like something I need to go try (laughter).

Carlos:                  Now also in your role, you mentioned being able touch SQL server, and then also be able to use SSIS. Now generally, while that line is getting thinner and thinner, there has traditionally kind of been a border between, the administrators, or the ‘keepers of the data’, and then what we’re going to allow them users access to. Now, we think in analytics, we’re talk about end users and I guess there are the Power BI folks we see in the community, but generally, we’re think of our business users. Any heartache with giving them access to SSIS? Or is that something more purely for the administrator in kind of a prep move?

Kevin:                   It’s really a tool I use when I’m setting up, or doing the beginning phases of a warehouse. I don’t really like to go, I wouldn’t like to go a normal user access, because there obviously are multiple tools that the SSIS world can do. And even SQL sometimes will intimidate the normal user too much, that shows us run away.

Carlos:                  Sure.

Kevin:                   I like to keep these two tools are really the IT use. If you really know what you’re doing, these two tools are great, but for the normal everyday user, not so much.

Carlos:                  Okay.

Steve:                   Okay, so then, if you’re considering SSIS and then TSQL, and SQL server for the IT side of things, and then the normal users come along and they want to be able to go and do some analysis, what are you recommending, or how are you going about doing that with other tools?

Kevin:                   Well, there are two tools, well two branches of users, I like to call it; the more advanced users, I would have in Power BI, Tableau, something that there’s a lot of features, almost IT, but not quite. IT can still have security on it so they can make sure they’re not going crazy, and adding a bunch of stuff that they shouldn’t be, or even looking at data they shouldn’t be. Where in both those worlds, you can still do 98% of everything you’ll ever want, at least for the Power users, but, also, the second class of users, will mix in with the Power users because, I like to give them a tool that they really know rather well, it’s Excel.

Carlos:                  (Laughter). Oh boy, we’re going to have to wash your mouth out with soap now Kevin.

Steve:                   Excel’s that tool that’s sort of magically hiding in front of everyone.

Kevin:                   Yeah, but it has a lot of features that everybody’s gotten use to, everybody can play with, but the sad thing is everybody likes to think they’re a expert at it. With some of these tools, you really, really do have to know what you’re doing, but at least you’re in Excel … You can play around some other ways and still get the same data, but there are lot of easier ways. Even, VLOOKUP for some people at performance, automation, right? Repeatability. Now all of a sudden, we think about Excel, I don’t think about any of those things (laughter), in that term.

Can you give us some thoughts, or under what circumstances should we be using Excel, versus all these automated things we like to be using?

Kevin:                   I like to use Excel, for more of proof of concept. To actually see, “oh yeah, we can do this. This makes more sense this way. I have only like twenty-thousand rows.” Excel is great for that.

Carlos:                  Sure.

Kevin:                   Especially if it’s already in a spread, in less space, it’s 99.9% of our stuff is spreadsheets. Why did I take it out of Excel and put it in SQL Server, massage it a little bit, then run through with it? Let’s just go ahead and do what we need to in Excel, that’s what people want, let’s give it to them in Excel.

Carlos:                  You know, one area I do think that we could probably be a little, or we might leverage some of those Excel users, and that is, in that data validation parts. Because the business users are coming to us with these reports that that want written, they’re going to be showing us that data in Excel, and I think we could potentially help leverage that to say, “well look, this is the process that I went through, let’s validate that this is, I guess, maybe before I build it, let’s validate that process with them. Right? I take this data, I do this to it, I apply this, and then I add that, and this special sauce, and then bam! There’s my number.” I think that could potentially help accelerate that report development process.

Kevin:                   Right. Heck, even Excel itself can do a lot of what-if scenarios. It can do the T-test, it can do a lot of Nola tests that more of the data analysis sites will use. You can do all your tests there without having to go to ‘R’ or choice is.

Steve:                   When you’re talking about Excel for data analysis at that point, do you find that the Excel Solver tool is being used quite a bit for that?

Kevin:                   I’ve used Solver, not as much as I’ve used the data analysis itself.

Steve:                   Okay.

Kevin:                   For those who don’t know, both data analysis and Solver are just very simple add-ins that are added, are simply Microsoft Excel add-ons. They’re almost loaded straight out of, from … When you loaded Excel, but it’s like one or two clicks and they’re there.

Steve:                   Right, so for instance, I’ve seen the Solver used in data analysis to run the what-if scenarios, where it has to work through a number of variables and figure out, “well what if we adjusted this in this way, or in that way.” I’ve seen different times, trying to port that same kind of thing into SQL Server, once somebody’s figured that out in Excel and finding that that can be very challenging.

Kevin:                   Yeah, it’s not simple because Excel is made for statistics and numbers, it’s made, “if this variable changes, and this variable changes, and this variable changes, what’s the one, I can get the most I can get out of this other number?” Excel works great for that. I used to do that SQL, it would take a little while to figure that one out, because it changes for each scenario.

Steve:                   Yup. As we go through this, we’ve talked about different tools, and we’ve mentioned SSIS, and Excel, and SQL, what else do you see being used for this analysis side of things?

Kevin:                   We’ve started to use SSRS, especially the new SSRS 2016. It has, the great mapping capabilities that more and more of our clients are starting to really like, because they’re not just seeing the data that’s actually in their little area, these companies that want to do more and more stuff outside of Georgia, the south, United States, they want to do stuff all over the world, so it’s great so they can see, how is their product being bought, in Atlanta, in Richmond, in Seattle, wherever. They can do all kinds of analysis even down to the, at least the city level, I’m sure some of these people have done it even further down than that.

that you are in the architect role, at least that’s one of those hats that you wear, but how much of a learning curve is, building SSIS reports, particularly ones that come from stored procedures, I think are kind of the very, I won’t say basic, but that’s like, a lot of people cut their teeth on those kinds of scenarios. Right?

Now going to a report that you mentioned that includes the geography components, what kind of learning curve is involved there?

Kevin:                   For SSRS, at least for me, there wasn’t a huge amount, because for the most part, once you have the city, most of the data will have cities, states, zip, at worst it will have … Most of our stuff will have that data in there …

Carlos:                  Mm-hmm (affirmative)

Kevin:                   You’re good. Thankfully SSRS picks it up, runs it with it.

Carlos:                  Oh wow, okay, so it’s really just, I like this type of map, here are my columns and there you go, I can see the data.

Kevin:                   Yup, because it’s all built in and going for bangs, it’s great for that.

Carlos:                  Got you.

Steve:                   So it’s basically using Bing as your presentation engine, wrapped in SSRS?

Kevin:                   Yes, that’s exactly what it does. At least from what I’ve been able to tell, it is using Bing to just roll the data out there, and it figures out “oh this is where Atlanta this is, this is where Richmond is, this is where the counties around it are.” Figures it out rather quickly, much more quickly than I would expect.

Carlos:                  Now another tool that I’ve heard talked a lot about, and I’ve heard scenarios, but I haven’t, it seems like only in very niche situations, or with people that really know their stuff, have been able to get Master Data Services to work well. Am I off base there, or why is there option of Master Data Services.

Kevin:                   I wish I could tell you (laughter). I think there’s a couple things going on with Master Data Services, that makes it where people don’t want to use it as much. First part is, it can be highly taxing on the actual business user themselves.

Carlos:                  Mm-hmm (affirmative)-

Kevin:                   Because they have, somebody has to be on it every day making sure, “yes, this one is a real valid answer, let’s put it into our data warehouse. No, this one really means this.” These are the people that are the think we’re actually pushing it as much as we should be, because, especially well, let me back up, with the small ten, fifteen people shop, there’s really not a big use for this. I wouldn’t think, there are probably scenarios where yes it would be great, but for the big, major companies, and some of the groups that are within it, it can be useful, but I don’t think that the people that are in charge, see the value as much as it should be.

Carlos:                  Okay compañeros, I just want to ask a question to all of you, would you be interested in hearing more about Master Data Services? Again, I’ve talked with lots of people that are mostly product managers, so if you’re interested in us doing an episode on that, let me know. I’ve kind of shied away from it, just because, again, it kind of seems like you really have to spend quite a bit of time to get up to speed on it. But if it’s something you think we should cover, let us know, hashtag SQL podcasts.

Steve:                   It’s interesting on that one is that, with Master Data Services, it seems like there’s two categories of people I’ve seen. There’s the people who swear by it, and it’s the best thing they’ve ever used. And there’s the people who swear at it (laughter), and it’s the most painful thing they’ve ever used. Maybe it’s that there’s some misunderstanding there, but I think that there’s certainly a lot of value there that people could, if you have to do the data cleansing side of things, you could get with that.

Kevin:                   I agree. Especially since, it’s been out since 2008, R2, it’s been available for a rather long time, more and more people should be at least heard of it, like it enough to at least try it, but it’s just not happening, for whatever reason.

Carlos:                  But I think some of that, it needs to maybe get a little more love from Microsoft, or from an interface perspective, it still feels a little bit form-ish, and I think that may be a part of it as well.

Kevin:                   Now Carlos, you’re not talking about the extremely early versions of it, it was, I agree completely bad, 2008 R2 was horrible. Twenty, well was a little better, I haven’t seen the 2016 version of it, but I believe they’ve updated it some.

Carlos:                  Okay, I admit I haven’t looked at 2016, I’ve heard that they’ve been doing some updating, but I haven’t seen any screenshots.

Kevin:                   Yep, my wife is going to Leblanc, who’s also from the Atlanta area, on earlier, talking about Power BI, and of course that continues to change. I guess, thoughts about introducing that, or making that part of the repertoire of analytics.

Kevin:                   I think, just like everything else, you have to play with it, you have to work it in, once you see how, I don’t want to say life-shattering, but it can really help your world by using Power BI, or all the tools that are available, especially even MDS, just to see if it’s available. Test it, play with it, see if it works for you, some of these things just won’t …

Carlos:                  Mm-hmm (affirmative)

Kevin:                   But others will, and you’ll see, it’s much easier once you actually start playing with it, and know what you can do with each of these tools. That’s why I’m encouraging, just try some of these things, try different scenarios, see what this works, see what works, see what doesn’t.

Carlos:                  I feel like some of the rub with the tools as well, is that, they don’t … It takes a cross functional team, right, to make that happen. We talk about wearing different hats, and some people can wear all the hats, but even in smaller organizations, there is some divide, you’re going to have to share that workload, and I think where we run into some heartburn is that, not a single person is going to know all those tools, or able to do all of those steps for the analytics. I think maybe, that’s the way sometimes, at least on the administrator side, we tend to balk at it a bit.

Kevin:                   Agree. I think a lot of the DBA’s I’ve talked to, they want to, it’s not just DBA’s, it’s any programmer at all will try to say, “oh, I do this, I am good, I know what I’m doing, I can do it all.” Especially in the data analysis world, you really can’t, there is so much stuff out there, you may have a tool that can do the job, but it may not be the best tool. But you can do it, and you can get around to it, so you still like, just like SQL, you have to have friends out there who know different areas, who can help you with whatnot. That’s one of the things, sometimes you will get just try talking it out with them, sometimes that’s the best help.

Carlos:                  Right.

Steve:                   Yeah, that’s a good point. On the different tools, I kind of look at it as the tools that somebody might have in their toolbox. You got, maybe you got a hammer, and a drill, and a screwdriver, and Excel, and Power BI, and SQL are kind of the equivalent of those, where one of them might be the right tool to do one job, but maybe not the right tool to do another job. You use a screwdriver to screw in a screw, you could probably get that same screw screwed in with a hammer, or at least hammered in, but it might not be the best way of doing it. Sometimes with Excel versus Power BI, or the other tools, it’s kind of the same kind of comparison where you might be able to do something really quick in Excel, that might take you more time in Power BI or vice versa. already there, just do it there and be done with it.

Steve:                   Exactly, and I think there’s lot of the business users out there, who may not have that SQL Server experience, but if they can pull the data into Excel, then they can use it in an area that’s comfortable for them.

Kevin:                   Not even bother with the IT department which …

Steve:                   Yep.

Kevin:                   Somebody has thirty thousand things going on, one thing that’s easier to do will be helpful.

Steve:                   That’s a very good point.

Carlos:                  (Laughter) Okay, so last thoughts, or as we kind of wrap this up.

Kevin:                   I just want to encourage more people to actually play with different tools that are available. Don’t just, “let’s play with SQL Server, we can do it all on SQL Server.” Now you have thirty thousand tools out there, not just what is normally setup for the IT, but for, you have Excel, you have a lot of open source things out there that you can get and use, that use SQL Server, or Bing to get better. Great tools, lot of people have put a lot of time into some of this, and some of them may be quicker than your regular tools you normally go for. I mean, even at worst, you’ll learn something new about yourself, or about the tools, or what scenarios it can be used, and what it shouldn’t be used for.

Carlos:                  Some of that I think though, has to be, it comes from a business culture perspective, because I think, like you mentioned, the business users are going to bring us an Excel file, or they’re going to bring us some handmade report, if you will, that came from who knows where, and are going to be like, “I want this.” So I think part of that is, maybe a little interpersonal, or personal relationship skills, to say, “you know what, let’s talk about this, let’s see if we can’t play with this other tool, or look at this in a slightly different way to get where we need to go.”

Kevin:                   Right, in that case, you really have to work with your business team. If, sometimes I have work with teams to, this is always how it’s always been, and this is how it’ll always be, I need it copied over at Thesaurus. I want a report everyday, at such and such time, do not talk to me about any other things, this is what I want, and this is what I need. End of story.

Carlos:                  Right.

Kevin:                   And then there’s the one’s you can work with.

Carlos:                  And there is an advantage, at least, to having the users know what they want. Right?

Kevin:                   Yes, and know how to get there. That’s great help to all of us.

Carlos:                  Right. Okay, should we do SQL family?

Kevin:                   I like it.

Steve:                   Yes, let’s do SQL family.

Carlos:                  So Kevin, how do you go about keeping up with technology, and all the changes that are happening, especially, in the SQL Server area?

Kevin:                   I read lots of blogs. I have a old school, I have a RSS feed, lots of those, and listen to lots of webcasts. My favorite is just SQL Server user groups. I learn probably more through those than I will through any amount of reading, or anything I can think of to get all the information that I could ever want. Plus, it let’s me see friends that I haven’t seen in quite some time. User groups are the number one way of getting, my way of getting data. It’s everything.

Carlos:                  If you could change one thing about SQL Server, what would it be?

Kevin:                   All right, this is where the DBA out of me comes out. The auto close feature, that is to me one of the dumbest things I have ever seen out there on SQL Server. So many of the brand new DBA’s will try to say, “oh, let’s just hit that, and we’ll be fine, we won’t ever have to use the database again.” Oh my God, it would drive me insane when I would start looking through the logs, and I start seeing “Closed, closed connection, closed connection” What? No, let’s just get rid of that feature once and for all. I don’t want to see it at all, I don’t, I’ve never even seen why it’s even there.

Steve:                   Good point. Okay, so can you tell us about the best piece of career advice you’ve ever received?

Kevin:                   This was from my wife, who listens to this podcast along with me every week. She actually said, early on, even when we were starting dating, she told me that, at night, how did she put it … No matter what happens, as long as the systems are still running, and you’re just doing performance tuning, it’ll all still be there in the morning. Because, like all SQL developer, well auto-developers period, we try to get two or three more seconds faster for all of our short procedures. It can wait, don’t just stay in the office until nine, or even your own work at home office, don’t stay down there until midnight, or two in the morning. See your kids, see your family, see somebody. Don’t just stay in the office and work yourself to death. Get out.

Carlos:                  Right.

Steve:                   That’s a really great point, and I think that, when people do get out, or have something else outside of just their work life, they end up doing a much better job in the time that they’re there doing their job.

Kevin:                   Amen.

Steve:                   Very good point.

Carlos:                  Our last question for you today Kevin, if you could have one superhero power, what would it be and why do you want it?

Kevin:                   Teleportation, yup. I notice y’all have been lately talking about “I want to be able to fly.” Oh heck no, I’m going to be teleporting because I’m going to be there now (laughter). So flying towards there, heck with it, I want to teleport.

Steve:                   (Laughter) I like that.

Carlos:                  There you go. Well Kevin thanks so much for being on the show with us today.

Kevin:                   Oh, thank y’all, that was great. That was a great time.

Carlos:                  Yeah, we do appreciate it, you taking a little time with us.

Episode 70: SQL Server on Linux

SQL Server Loves LinuxIn the keynote of PASS summit 2016 Rohan Kumar, the program manager for SQL Server proudly displayed a database restore to a SQL Server instance installed on Linux.  What makes this all even more interesting is the database was sourced from a Windows server.  While it might not sound like much, the number of operating systems SQL Server runs on has now doubled.  This is no small feat and our guest today Travis Wright is the program manager for the SQL Server on Linux migration.  While I have to admit I was not super excited about the news when it came out last year, I am definitely more interested and think there is a huge opportunity for SQL Server administrators to get access to jobs they wouldn’t have before.  I know you will enjoy this interview with Travis.

Episode Quote

“Why are we doing this? . . .  It comes down to the customers have been asking us to do this for a long time.” Travis Wright

Listen to Learn

  • Why are they doing this to us? (come on, that is funny)
  • What features are going to be supported.
  • The version of Linux supported.
  • What type of integrations are now possible – think WordPress on SQL Server.
  • How do I kick the tires on this?

Travis on Twitter
SQL Server on Linux
Data Platform insider blog
SQL Server on Linux Preview

About Travis Wright

SQL Server on LinuxTravis Wright is the program manager for the SQL Server transition to SQL Server on Linux.  Travis is the father of 3 kids and is currently on his second tour with Microsoft.  He was heavily involved in the system center team and is now putting on his skills to us on the Linux team.  He describes himself this way–“In a room full of “suits” and “propeller heads”, you can find me in the middle fluently talking about business strategies, priorities, requirements, financing, and schedules as easily as software architecture, algorithms, and database schemas. My specialty is bringing together the business flint and software steel to ignite the spark of innovation that creates something meaningful.”

Transcription: SQL Server on Linux

Carlos L Chacon:               Travis, welcome to the program.

Travis Wright:                    Thank you. Glad to be here.

Carlos L Chacon:               Yes. Thanks for coming with us on this show … Thanks for coming on the show with us today. One of the very interesting things that happened in the key note at the SQL summit in Seattle was a demonstration of a restorer of a database that was a window sourced database and was restored to a SQL server running on a Linux machine. Lots of chatter about that. Ultimately, we wanted to have you on the program to talk a little bit about why this move to SQL server on Linux? I must admit I’m not a huge Linux person myself, what Microsoft sees as the future, I think there’s going to be some analytics discussion in here. Let’s go ahead and jump in and tell us a little bit about that history. Why the change forth to SQL server on Linux?

Travis Wright:                    Yeah, this is one of the first questions we always get is why is Microsoft doing this. After all these years you mentioned that you’ve been working on SQL for 26 years, why after all these years is Microsoft doing this. Really it comes down to the customers have been asking us to do this for a long time. For various reasons, it’s always been either a technical challenge that was seemed insurmountable or there was political challenges to do that within Microsoft obviously with Windows being a huge part of Microsoft’s culture.

Carlos L Chacon:               Sure.

Travis Wright:                    Everything. The star aligned recently though about a year and a half ago. I’d say we revisited the decision to do this, went on did a bunch of research talking to customers about what they’d like to see, talked to some developers. What we found is very interesting that Linux is very much a growing and vibrant ecosystem. We’re getting up to the point now where a SQL, Linux VM and azure accounts for about 25% of the total BMs and azure even. Lots of interesting things are happening on Linux. We got lots of big data things happening, a lot of CICD types of things are happening on Linux. It’s just a very interesting ecosystem. It’s growing. Even if you look at the IDC numbers for the database market you can see the Linux is growing faster than Windows as far as it being a database platform. Over time those markets will be roughly equivalent in size by about 2020-2021, somewhere in there.

Really it just is a matter of being responsive to our customer’s requests. Looking at the market opportunity ahead of us as well as for our partners, just the ecosystem in general. Just doing what customers want. What customers want is to have a choice. SQL Server is really become a very mature database at this point. It’s full featured. Recently we’ve surpassed Oracle on the Gartner MQ For the first time.

Carlos L Chacon:               Hooray.

Travis Wright:                    We have retained that crown if you will for another year here recently. Now it’s really just a matter of; okay, should we go and add yet another feature of the SQL server which would of course be great or should we maybe take a look at expanding the availability of SQL Server to other customers, developers, and just industries in areas that we haven’t been in before. Just take the goodness of SQL Servers to those other areas. There’s lots of other reasons too. We’ve talked to customers, sometimes for whatever reason, customers want to run their database on Linux. The reasons range from that’s what their database and systems administrators are experienced in. They have Linux people, they want to use SQL Server. “Okay let’s do it.”

In other cases you have companies that have maybe been acquired by another company that happens to run all their database platform on Linux. That acquiring company is now requiring the acquired company to map to that.

Carlos L Chacon:               Sure.

Travis Wright:                    The acquired company want to keep their SQL Servers. These types of scenarios come up where customers really just need this flexibility to be able to run SQL Server on other platforms besides just Windows.

Carlos L Chacon:               I’m I naive in saying that that’s more than just hosting companies because I could see the hosting companies doing that. Obviously as they consolidate and things like that. Are there … You’re still seeing other places, just takes some general verticals like even healthcare or engineering or things like that are also wanting to do those the same changes?

Travis Wright:                    Yeah, absolutely. Lots of enterprises that are running their structure, whether it’s on or private cloud, or a host of product or something along those lines where they’re in their own SQL Server instances are very interested in this. It’s interesting, I took a trip down to Silicon Valley and visited with several customers down there where Linux is very hyped. They were interested in how they could use SQL server in those situations as well. Definitely lots of enterprises. Then as you mentioned hosters are also very interested in this because typically hosting providers are based on Linux. We’ve had a number of hosting providers approach us about now operating SQL servers as a database platform in their portfolio because we have now support for SQL server running on Linux.

Carlos L Chacon:               Exactly.

Steve Stedman:                A question on that then, because I’ve heard rumors of Ubuntu support, or Red Hat support, is there anything that you can talk about with what Linux distributions that you will be supporting?

Travis Wright:                    We’re starting out with support for Red Hat Enterprise Linux. The version that we’re supporting there is 7.2. We also support Ubuntu 1604. We’re now working on support for you SUSE Linux Enterprise server version 12SPQ. Those will be the 3 main Linux distributions that will support the SQL Server on Linux. We’ve had requests from customers to support some other distributions like Sento-S or Oracle Enterprise Linux, Amazon Linux, so on. I think as we wrap up the work to support those first 3, we’ll start to look at other distributions that we may want to support and working out what that might look like.

Steve Stedman:                Okay, when you talk about the Red Hat Enterprise Linux for instance, to be running on, is that something that a developer or someone who’s just trying to a proof of concept to try out on their own machine using just regular Red hat or does it obviously have to be enterprise there for people to use it?

Travis Wright:                    Right, Red Hat has a developer program. People can go and get Red Hat licenses that are intended for use in a development environment. It doesn’t grant you the right to run things in production. You can still get all the documentation and the subscription access to be able to get the packages and that kind of thing as a developer.

Carlos L Chacon:               Okay. Great. I have to check that out. So then when we talk about the Linux edition I mean I’ve heard of this being like core engine features. I mean will this just be the SQL engine or will it have other things like SSAS or SSRS or integration services or anything like that?

Travis Wright:                    Right. The first release we have of SQL Server on Linux will be focused on the relational database engine. That includes all the features around data being stored in a relational quantity. It’s all the DDL, schema, then storing the data, that kind of thing. It also includes a lot of the advanced features that we’ve been introducing in the more recent version the SQL server. Like in memory LLTP, column store. Lots of the other goodness of SQL Server; like compression, partitioning, and all those things.

It includes all of the new security features we’ve been adding as well. Things like always encrypted to track and granted encryption, level of security, date of masking, auditing, all of those features as well are also included. Also things like SQL Server Agent, we can get in the high availability here as a specific topic here in minutes.

All those features around the core relational engine. What’s not included in the first release is things like Integration Services, analysis services, reporting services. Those are the big areas that we’re not going after yet. Then some other miscellaneous items or things like master data services, data quality services, some of those kinds of things. Over time, we’ll be responsive to customer feedback. We’ll a look at what it is that people want and go after those things that make sense. I think integration service is probably the one that pops out the most as something the customers are looking to be able to run on Linux and so we’ve started looking into that and working on that. We’ll see what the time frame will look like.

Steve Stedman:                Okay. It really sounds like it’s the core engine but it’s going all the way with the core engine.

Travis Wright:                    Right.

Steve Stedman:                As far as the core SQL functionality there.

Travis Wright:                    Exactly, you should expect nearly all of the features that you would see in the co-relational engine to be there.

Carlos L Chacon:               Yes, when you think about that SQL server service.

Steve Stedman:                Okay.

Carlos L Chacon:               Everything would be there. Lets go back, you mentioned you broke out the high availability solutions, should we dive into that for a second?

Travis Wright:                    Yeah let’s talk about that. It was different levels of high viability and disaster recovery. Lots of different features around that as well. We just finished up the work on the SQL server agents to enable a lot of shifting. That’s one option. You can log shift from a SQL Server running on Linux to another SQL Server running on Linux or a working SQL Server around Linux, you’re supposed to run on Windows, you kind of have some flexibility to go about how you log shipping. That’s one option. I’m just going through the basics to the more advanced.

Carlos L Chacon:               Sure.

Travis Wright:                    You also have support for backup and restore. You mentioned at the top of the session here the demo that Ro-On did at the past Summit where he showed taking a database that had been backed up on Windows and restoring it on SQL Server on Linux. At the other direction as well. You can go take a database back up on SQL Server on Linux and put it on Windows. You have old backup, restore, attach, detach capability because of the binary compatibility between the SQL servers regardless of where they’re running at.

That gives you a complex ability there. Then we also demod at one of the sessions we did at past. The fail over clustering running on Linux. We had 2 different notes in a cluster using pacemaker which is an open source clustering technology. I had to coordinate that cluster. Then we had to shared disc running on another VM. We showed how the SQL servers were using that shared storage to have all their data files, the master data as well as user data bases all right there. Then we would use the fail over command line utility for pacemaker to fail over from one node to the other.

I had a little application that was running there. It was written in node. It would just sit there and run a simple query against the database nonstop. I would kick off up that application, it would sit there and would output the name of the server that it was connected to. I just output that to the terminal just every second or so. I failed over and you could see how it’d lose the connection for a few seconds. It would pick up that connection again and output the name of the new server to the terminal. It was a cool demonstration of how fail over clustering just works. Very similar to how it does on industrial cluster services, in this case running on Linux using pacemaker.

Carlos L Chacon:               Sure. That’s very cool because, again, I think sometimes maybe we forget until we get into the thick of it. Particularly in availability groups even more so. That reliance on the Windows server cluster service, right? Through to manage some of that. It’s very cool that there are alternatives in the Linux world to do that.

Travis Wright:                    Yeah, absolutely. Going on to the best of all worlds is where you start to get into availability groups and having high viability that way with the replication based HA. That, we are also working on. That will also be based on pacemaker. We will have that available for people to test here pretty soon as well. That will be pretty exciting because then you going to have the replication based HA using always on availability groups with reader election and fail over detection, being able to detect a node being unavailable and automatically failing over a lot of listener. All those things that you’re used to in always on availability groups on Windows, those will also exist on Linux as well, just using a different technology.

One of the most interesting things about the way that we’re doing higher viability on Linux is that it will be a more open architecture. What I mean by that is that on Windows today Windows Server closer services is the main way of doing clustering in high availability with SQL server. On the Linux sides, there are lots of different technologies that are used for clustering in the Linux world. Pacemaker happens to be one of those and the most commonly used. There are others like zookeeper, there are some harbor B solutions. That we want to do is we want to make this a more open architecture where people can introduce other higher availability solutions that SQL server can work with. People can just write the scripts and talk over APIs, just introduce other solutions for high availability beyond what we provide as a reference in founding the recent pacemaker.

Carlos L Chacon:               Yeah. Wow.

Steve Stedman:                Interesting.

Carlos L Chacon:               You’re making it hard for your documentation folks there.

Travis Wright:                    Yeah, we just hired 10 more. It was starting to be a challenge to keep up with the documentation in general. You’ve got features that are going to work here and not there. You’re going to have what’s applicable to Linux, how you do things here versus there. There will be a ton of overlap. I’m expecting there to be just, if you had a chance to play with it, I don’t know yet or not. Most things just work, right?

Carlos L Chacon:               Sure.

Travis Wright:                    If you run an application, that just sits on top of TBS, it just works. All your drivers work. Anything that you’re doing through the front door, that all just works as you would expect it to. You got 2 SQL statements, most of those things just work. DMV’s just work. You can use SSNS, you can use SSPP, all those things just work. There will be some differences. Yeah, the documentation people have some good job security for the next couple years keeping up with all this. Yeah.

Steve Stedman:                Wow. Impressive for the initial release on Linux. I’ll tell you that. I know that for years Marilyn has been announced that it’s going to be deprecated. I would assume that Marilyn is probably not included in Linux. Is that correct?

Travis Wright:                    Right. Exactly. We’re not planning on bringing Marilyn over to SQL Server on Linux.

Steve Stedman:                Okay. I know you mentioned replication around availability groups. You probably weren’t talking about replication in a publisher or subscriber mode, were you?

Travis Wright:                    I wasn’t when I was talking about earlier. Let’s hit on that briefly.

Steve Stedman:                Okay.

Travis Wright:                    With SQL Server agent working now, we do have sub-subscriptions as well working. Most of that just works. I think the only thing that is left to do in that area is where you’re doing some publishing and subscribing between different database engines; like SQL Server and Oracle for example. We need to figure out how we host third party binaries in our platform on Linux before we can get some of those working.

Carlos L Chacon:               Sure.

Steve Stedman:                Okay.

Travis Wright:                    As long as the SQL is looking good. We’ve been doing some testing with that. That’s doing well.

Steve Stedman:                Great. Not that … I mean, it’s one of those that a lot of people are using it but the whole subscription piece is not … Publication piece not a piece that a lot of people get excited about. It’s being used in a lot of places.

Travis Wright:                    Yeah, it’s one of the things that are around for a long time. People depend on it. We’ve got to make sure that works.

Steve Stedman:                I know with the Windows version of SQL Server, there’s generally been the Express, a standard, and the enterprise edition. Although there’s been a few other little flavors over the years. Will the Linux version follow a similar licensing model or will it be something completely different.

Travis Wright:                    Yeah. The way we think about it is that SQL Server is the product. You license SQL server. Now you’ll just have an option of where you run SQL server. Could be on Windows Linux, could be on Docker containers. To that end, we plan on having the same edition lineup on Linux as we have on Windows. You’ll have a developer edition, you’ll have an enterprise edition, standard, express, Web Edition and so on. One thing that we don’t currently have a plan for right now is local DB on Linux.

Steve Stedman:                Okay.

Carlos L Chacon:               That’s your U-fold curve all of a sudden. You just said local DB.

Travis Wright:                    What?

Carlos L Chacon:               I guess I’m drawing a blank. What does that mean?

Travis Wright:                    Local DB is an optional component you can install as part of express. It runs as a process instead of a service. It’s just a lightweight database can be used in the context of developments. Typically few people use it like in Visual Studio. It’s a little lightweight DB that they use for development.

Carlos L Chacon:               Sure. Okay, I got you.

Travis Wright:                    Yeah.

Steve Stedman:                As far as the performance goes with this, one of the things that I’m wondering … I mean the initial version around the performance is always a big deal. Are there any metrics out there on how it will perform to comparing SQL Server running on Windows versus SQL server on Linux using the exact same hardware?

Travis Wright:                    Yeah, great. We haven’t published numbers on that yet. We’ve been spending the last couple of months really focused on making sure that SQL server on Linux performs in scales and that kind of thing. Of course SQL server on Windows is the most obvious benchmark that we can measure ourselves against because that’s really apples to apples. We also do use some bench-marking relative to other database engines and like that. At this point I would say that we are at about 90% or so of the performance of SQL Server on Windows on a machine that’s a 2 socket machine with somewhere around like 256Ghz of RAM, something like that right. We have lots of work still to do in this area. We need to add support for Numa. We have some other improvements we want to do around network IO and some other code path improvements.

There’s lots of work still to be done. Our goal is to get to the point where we have essentially the same performance and the scalability is what people are used to today with SQL Server on WIndows. We want to make sure that SQL Server on Linux can support those same mission critical to your one-type workloads that people use SQL Server for today.

Steve Stedman:                Okay, excellent.

Carlos L Chacon:               So is SQL Server going to be open sourced?

Travis Wright:                    Yeah, that’s maybe the next logical step, right? I think at this point, there’s no plans to do that. I think that would be a massive development effort on our part to get SQL Server ready to be open sourced even.

Carlos L Chacon:               Yeah.

Travis Wright:                    Let alone the will it would take to do something like. First steps first, we’ll get SQL Server on Linux out there, we’ll go from there.

Carlos L Chacon:               Very nice.

Steve Stedman:                Okay. One of the things today that I see is if somebody has a website they’re hosting with Word Press or something like that, the backend out there is generally a MySQL database.

Travis Wright:                    Yeah.

Steve Stedman:                Do you ever see SQL for Linux competing in that area with maybe an Express Edition or something?

Travis Wright:                    Yeah, absolutely. In fact, we’ve been working with some guys that started this project called Project Nami. Have you guys heard of this?

Steve Stedman:                I haven’t.

Carlos L Chacon:               I have not.

Travis Wright:                    Yeah. Project Nami, N-A-M-I, which funnily enough we’ve been talking to them for a couple months, then one day we’re finally like, “What is this Nami thing? What does this mean? They said it stands for ‘Not Another MySQL Instance’. What they’ve done is they’ve created a compact layer for MySQL … I’m sorry, for Word Press to run on SQL Server.

Carlos L Chacon:               Interesting.

Travis Wright:                    Very.

Steve Stedman:                Wow.

Travis Wright:                    Their original intention was to create this such that it could run on top of an azure SQL database which is where most of their customers today are using it. It also, because we have this compatibility between azure people database and SQL Server and windows and SQL Server on Linux, it happens to also work with SQL Server on Windows and SQL Server on Linux. They’ve taken it now a step further to where they have the full stack of Word Press and SQL Server running on Linux. A very interesting scenario. As we hit public review, we’re looking to how we can put that out there and make people aware of it. That certainly could be a very interesting scenario for people to run a Word Press instance on top of a SQL Server Express or standard. Depending upon what size they need to have their Word Press application or blog running on.

Carlos L Chacon:               Right now, I think that would be very attractive. I know, I was with a client that SQL Server shop, their marketing department wanted to move to Word Press. Which meant that all the sudden they inherited some MySQL databases that they now needed to care and feed, pure Microsoft’s shop if you will. That was just another item that they’re going to tackle.

Travis Wright:                    Sure.

Carlos L Chacon:               Being able to keep data all in the same family if you will, I think, would definitely be of benefit.

Travis Wright:                    Yeah, and definitely check out this Project Nami thing. It’s an open source project. I think it’s amazing actually. Very cool.

Steve Stedman:                Yeah. Definitely have to take a look at that. I know every time I try and use MySQL behind what I have in my blog, I just feel completely impaired trying to use it.

Travis Wright:                    Yeah. You want to know what you … You want to use what you know.

Steve Stedman:                Right, you know? Okay. Then for someone who wants to try it out; once you’re at that preview point, whenever that may be, how do you go about really someone getting started with that? Let’s say someone has a little bit of Linux experience. Are there going to be some ‘how to’ guides or anything like that that people could just walk through and say, “This is what it takes to get it going.”?

Travis Wright:                    Absolutely. It’s actually super simple to get going with SQL Server on Linux. I think we’ve done several demos now at conferences of the installation experience. It’s really quite refreshing for people that are used to installing SQL Server on Windows. On SQL Server on windows, you go through a wizard. It’s fairly simple and easy to do. It does take some time to click through everything and answer all these questions and everything. With SQL Server on Linux, we’re leveraging the package management systems in Linux. On Ubuntu, you have APTs, on Rel you have Yum, on Suso you have Zipper. They have all these packages. Installing SQL Server on Linux is really a simple command that you run at the terminal. Like on Rel, for example, you would say Yum install Ms SQL Server, it would go through the entire installation and download experience in about 40 seconds. That’s all it takes to get SQL Server installed. Then you run this setup script. It prompts you to accept the EULA and provide the SA password and you’re done. All in all, in less than a minute, typically you can have SQL Server downloaded, installed, and running.

A very cool experience with that. Then I’ve recently just been falling in love with Docker. I don’t know if you guys have had much experience with Docker. With Docker, again, it’s a very simple acquisition experience. You just go to the terminal, assuming you already have Docker installed and configured, you just go to the terminal, you type in Docker Pull and their SQL Server. It will pull down the latest version of SQL Server, that image. That takes a minute or 2, depending on how fast your network connection is. Then you just type a command Docker run and their SQL Server, you pass a couple of parameters to it to specify the SA password and accept the EULA. You’ve got SQL Server running in a container on your mac book or on your Linux development environment, or wherever you might be running Docker.

Then that also is just a beautiful thing when you start using some of these container management platforms like Red Hat Open Shift, For example. We just did a demo about a pass where people can publish that image up into their catalog and open shift. Then they can just provision containers just with basically a one-click experience in the open shift portal and push those containers out onto the container management platform.

Steve Stedman:                Very cool stuff.

Carlos L Chacon:               I can …

Steve Stedman:                I know … Go ahead Carlos.

Carlos L Chacon:               Travis, I think the folks who are familiar with Linux are going to be maybe a little bit earlier adopters with the docker stuff. That may not be quite a fair assessment, maybe in my lay of the land. I think that with this migration of SQL server, at least those in the data space will have to become a little bit more familiar with Docker or at least   get more familiar with it than we currently are.

Travis Wright:                    Yeah, I would agree. With Docker, to me feels like the VMs did, whatever 15 years ago or so, Right? It feels like we’re on the verge of a big transformation in how people isolate their workloads, how they provision their workloads. Docker, I think is going to be huge. We have Docker for Linux containers for SQL Server. That is from the very beginning of the SQL Server on Linux project, there has been an emphasis in making sure that we get that working really seamlessly. We’ve also been recently working on having SQL Server running in Windows containers.

With Windows Server 2016 being generally available now and having the option of running Windows based containers, we want to make sure the SQL Server runs well in a Windows based container. We’ve recently pushed out a SQL Server Express Edition in a Windows container to Docker hub. People can go grab that and use it. You could take that same Docker file that’s available there, just change the location of the bits and actually have a Docker container in which that you build for other additions of SQL Server.

What we need to do now as far as Docker goes is just do some more testing around having SQL Server running in a Docker container. Get to the point where we can have that be a first class supported scenario of having people running SQL Server on Windows containers as well. That’s something that we’re working on.

Carlos L Chacon:               Good deal.

Steve Stedman:                I know one of the things that I end up working on a lot is database corruption, having to fix it. Is the typical, or are the typical DBCC, check DB, check table, and those type of commands going to be included and available as well on Linux?

Travis Wright:                    Yeah, all of that works. So does tools like DMVs, those are also working. We’ve added a couple of additional DMV’s there. We have tweaked a couple of the DMVs that exposed system informational so that depending upon which system you’re running on, whether it’s Windows or Linux, we go and get the right information from the right place. Then we surface it up in the exact same results schemer.

Steve Stedman:                This is a tool that’ll only run on Linux for now. What it does is it creates a mounted file system using Fuse on Linux. It creates a bunch of virtual files, each of which represents the DMVs in SQL Server. What you can do is, you can just CD into this mounted directory basically, this fused directory. Then you can just LLS that directory, you can see all the DMVs listed there as these virtual files. Then you close as though they’re files. You can just cut them to get the upload of them or you can type to tools, like ‘cut’ if you want to use just one column or whatever.

What happens is when you open and read that virtual file that we’re actually initiating a call out to the SQL Server and executing that the DMV query and bringing back the results as though they were the content of that file. That sounds mind blowing. This is what Linux administrators expect; is that they can go in, they can look at system information in a file. We’ve created this virtualized experience using virtual files where they can go and get the DMV data and view it as though it was a file.

Carlos L Chacon:               Interesting.

Steve Stedman:                That’s great. Then using your favorite shell bash, whatever it may be, you can then use those files to type them into Set, or arc, or whatever you want to use there to work in your normal Linux type environment to get the output you want.

Travis Wright:                    Yeah, exactly. Because they’re just files there effectively, you could have that in a mounted directory even and share it out through a samba share, for example. People could remote into it. It’s just a file. Right? You can do all the things you could do on that.

The other thing about the DMV tool is you can actually have … You can set it up with a config file where you can have multiple SQL Servers in this config file. Then what you do is you CD into the mounted directory, you’ll see all of your SQL Servers there as directories essentially. Then you’ll see you can CD into which ever SQL Server you want to look at and look at all the DMVs for that SQL Server. Then you can do really interesting things because these are all just files. You could do Scripts over this. You could traverse over each of those SQL Servers and look at a particular DMV, compare data to each other, or whatever. It’s just files at that point. Very interesting possibilities there. I’m curious to see who will be the first person to build some monitoring tool based on this. That would an interesting thing to see.

Steve Stedman:                Yeah.

Travis Wright:                    The other thing that we’ve done is the flip side of that; where we will now have a DMV that will expose … Which will show you the process information from the proctor rectory. The proctor rectory if you go in a SQL Server … If you go into Linux is where all this system information is about all the processes running on that particular Linux host and will expose that proctor rectory as a DMV. People can just get access to the proctor rectory by running it to a SQL query. They don’t have the SS agent of the box, they can just run it to a SQL query if that’s what they prefer to do. That will be very interesting I think for any of the monitoring tool that moves out there to be able to access that information just through a T-SQL query.

Carlos L Chacon:               Exactly.

Steve Stedman:                That sounds great.

Travis Wright:                    The monitoring tools and monitoring tool vendors and that kind of thing out there don’t ever have to access each into the box to figure out what’s going on. Everything can be exposed through T-SQL. Obviously is systems go down for some reason then you’re black, dark. At least for the most part it should work.

Steve Stedman:                On the Windows version of SQL server, one of the more controversial commands that’s out there is the XP_command shell.

Travis Wright:                    Yes. Yeah

Steve Stedman:                Is there a Linux equivalent of that where you’ll be able to check out your shell and run local commands?

Travis Wright:                    Right. As of right now, we’re thinking that we’ll probably close that off for Linux. We’re always interested in hearing people’s feedback on that. That’s just a security vulnerability that we’re inclined to just close off for now. There are other options for executing those types of things; like having a SQL Server Agent job for example that can execute bash scripts that will probably enable.

Steve Stedman:                Okay.

Travis Wright:                    We’ll probably have that sort of thing through the agent.

Steve Stedman:                Then with the agent today, you could create agent jobs using different stored procedures. I assume a similar thing would just work. Can you be able to create agent jobs on the Linux version?

Travis Wright:                    Yeah.

Steve Stedman:                Okay. Excellent.

Carlos L Chacon:               Travis, I have to admit, I wasn’t all that excited, I’ll use that word, about SQL Server on Linux. Now after our conversation with you, I have to say I’m a little bit more enthused about it and interested to see where things go because of, I will say, the merger, right? If you will of the 2 operating systems and what you’re having to traverse there.

Travis Wright:                    Yeah. I think we ran into that sentiment a bit at SQL pass. I think at the past summit, you’d expect to find people there that are long time SQL Server users, they’re for the most part happy with what they have today. They will maybe just want us to add more stuff to SQL Server or what they would like us to do. Or maybe improve all of your performance on scale. Whatever it might be. They live in this world of SQL Server as they know it today. Right?

Steve Stedman:                Right.

Travis Wright:                    I think as people start to learn more about SQL Server on Linux, they see the new opportunities that this presents to everybody. Whether you’re a consultant, or you’re a DBA, or you’re a developer, if you think about it it’s really just expanding opportunities for everybody in the SQL Server ecosystem. Where we can go now and talk to our friends over in the other hallway about Linux. We can figure out what makes sense to do. It’s not maybe just a one-size-fits-all solution running on Windows. We need to run something on Linux. For whatever reason, we can now. We can have that conversation. I think it presents some interesting opportunities for all of our partners in this space. Whether they’re monitoring tool vendors, backup vendors, security vendors. Presents opportunities for all of our hardware partners, we have brand new friends now with our operating system vendor partners like Suso, Nautical and Red Hat.

It really just was opening up the ecosystem and presenting a lot of new opportunities to everybody. I think as we all look at the landscape and how Linux is just growing so much, it’s good for everybody that’s in the SQL Server ecosystem to take a look and see it as an opportunity for you to expand your own opportunity. Learn about Linux, expand your skill set, have more opportunity as you look out to what your next job might be. It might be to run SQL Server on Linux. There’s a … You can get huge demand for people that know how to do that.

Carlos L Chacon:               Exactly. Exactly.

Steve Stedman:                I know it’s been a couple of years since I’ve done really much of anything with Linux but I’m excited. I want to go spin up the Linux VM to get ready for whenever this is available to give it a try.

Travis Wright:                    Awesome.

Carlos L Chacon:               Should we do SQL family?

Travis Wright:                    All right, let’s SQL family.

Carlos L Chacon:               SQL family is just an opportunity for us to get to know you a little bit better. I guess I should have made sure that you had those questions we added in the invite.

Travis Wright:                    I think I saw that in the invite. I just didn’t realize it was called SQL Family.

Carlos L Chacon:               Yeah.

Travis Wright:                    Okay.

Carlos L Chacon:               It’s just our little segment that we call SQL family.

Steve Stedman:                Should we jump into it then?

Carlos L Chacon:               Yeah, let’s do it.

Steve Stedman:                All right. With technology changing so rapidly, how do you go about keeping up with technology and all the changes that are happening every day?

Travis Wright:                    Well I do a lot of reading. I read Tech Meme incessantly. I can’t stop reading it. I can’t stop reading Red It and all the cool stuff that’s going on there. I do a lot of reading. Then whatever I find that’s interesting, I go and try it, you know? I’ve never really been a developer on anything other than dot.net. Since I’ve been working on SQL Server on Linux, I’ve taught myself Ruby, Python, and Java. What else? One other language, Node. I’ve been working with node. I just go and try it. I’ve been using other database systems as well, to get what people like and don’t like about those. I’m a hands on guy. I like to actually get down and try to use this stuff and see what I can do with it.

One of the funnest things I’ve done recently is one that Microsoft announced, the Back Framework back at … Earlier this year. I just sat down, stayed up all night, built out this little box that sat on top of SQL Server on Linux and got that working. That was pretty fun.

Carlos L Chacon:               Pretty cool.

Steve Stedman:                All right.

Carlos L Chacon:               Now, we’ve talked about lots of things that are happening with SQL Server, right? If you could make one additional change to SQL Server, what would it be?

Travis Wright:                    That’s a good question. I don’t know.

Carlos L Chacon:               Okay. We can … We’ll strip that question out.

Steve Stedman:                Yeah.

Travis Wright:                    Let me think about it while we go on some of the other questions. Then your editor can pick it up I guess.

Carlos L Chacon:               Yeah, if you think of an answer, we’ll come back to it.

Travis Wright:                    Okay.

Steve Stedman:                What is the best piece of career advice that you’ve ever received?

Travis Wright:                    I think probably the best career advice I ever got was from John Byce. He pulled me into his office one day, he said, “Do you know who Terrell Owens is?” I said, “I’ve heard the name. I don’t know who he is.” For those that don’t know, Terrell Owens was a wide receiver in the NFL. He was a flamboyant guy. He would pull his … I remember one time he caught a reception, ran it in the end zone, he pulled a Sharpie out of his sock and signed the football and threw it to the audience. His feedback to me was, “Don’t be like Terrell Owens. Look at it more like you’re part of the team. Make the people around you better. Don’t be such a showboat. Try to …” That that was some of the best advice I’ve ever gotten in my career. I’ve always looked back on that, appreciate it.

The other thing that John Byce always preached to me as I transitioned into being more of a manager was, “Look for people that are humble, hungry, and smart.” I’ve always tried to surround myself with people that are hungry, humble, and smart.

Carlos L Chacon:               Yeah, that’s a good combination. Travis, our last question for you today. If you could have one superhero power, what would it be? Why do you want it?

Travis Wright:                    Yeah, I’ve always dreamed of having a superpower of human computer interface. Where I can just sit down to a computer, without any other input device, the computer can just mind meld with my brain. I can make computers do whatever I want without any typing or any other input just rained a computer meld. That’s what I would love.

Carlos L Chacon:               There you go.

Steve Stedman:                That would be awesome. I’ll take that too.

Carlos L Chacon:               Yeah. Once you’re done with SQL Server on Linux migration, you get the other one, the mind meld figured out Travis, come back and let’s chat.

Travis Wright:                    Yeah. It’s interesting, if you look at all the superhero characters out there in Marvel, DC, and everything, they all have these physical abilities that they have or whatever. Rarely is it like a technology thing that they leverage, yeah? I thought that was interesting.

Carlos L Chacon:               That is an interesting…

Travis Wright:                    Other than that, yeah. Very interesting.

Carlos L Chacon:               Travis, thanks so much for being on the show with us today. It’s been a blast.

Travis Wright:                    Thanks guys.