Part of what we like to do on podcast is to present topics and discuss different ways you might use features. In this episode, we go back to basics and discuss indexes. Like most topics, we made sure to explain all the important aspects. Because this is a such broad topic, we decided to carry this over into our next episode as well. Using the phone book as an example, we chat with our guest Randolph West and explain heaps, non-clustered indexes, and clustered indexes. We also explain best practices to use them when creating new tables.
We didn’t forget to cover the problems you might run into. For example, you might not think about indexes until your data grows very large. Sure, you can attempt to resolve the problem by moving to SSD’s and increasing memory, but this will push the issue further in the future. In this episode, we will discuss considerations for indexes and why database architecture is so important.
We would like to hear your opinions. What are your strategies when using indexes? Use the hashtag #sqlpodcast and let us know!
“The clustered index should be chosen very carefully because your entire table is going to be sorted according to that column or inside of columns and what you’re looking for is a unique value for every row.“
“All that an SSD is doing is making your problems go away until a later time. The problem is still there. It’s just being hidden by the fact that you’ve got faster drives. In fact, the same goes for adding more memory. It’s just that happens to be cheaper than trying to fix your indexes.”
Listen to Learn
- Heaps and indexes types explained
- Randolph’s thoughts on clustered indexes vs. non-clustered indexes
- Similarity between clustered indexes and primary keys
- The results of using Management Studio Designer for creating new tables
- How to create covering indexes
- The negatives of having too many indexes and what you can do about it
- How are default fill factors affecting indexes? Is faster storage good reason to avoid fixing the indexes?
About Randolph West
Randolph West solves technology problems with a focus on SQL Server and C#. He is a Microsoft Data Platform MVP who has worked with SQL Server since the late 1990s. When not consulting, he can be seen acting on the stage and screen or doing voices for independent video games.
Transcription: Indexes Part 1
*Untranscribed introductory portion*
Carlos: Ok, so let’s, I guess we’ll go ahead and kick it off. So Randolph, welcome to the show.
Randolph: Thank you very much for having me.
Carlos: Yeah, it’s great. I know that you and Steve have done a little bit of work together and some of your adventures have come up from time to time as we’ve talked about it, and so it’s great to have you here on the show with us today.
Randolph: Well, thank you! It’s been great fun working with Steve. He is a consummate professional, and I would hire him. I’m just saying.
Carlos: There you go.
Steve: And technically I would say I would hire you too because we’ve done that on projects.
Randolph: Yes we have.
Steve: I think we first met on the Database Corruption Challenge in 2015.
Randolph: Has been it where.
Steve: Oh yeah, yes, and then, yeah you did an amazing job on some of the corruption we worked on there. And then you and I have worked on fixing a couple of corruption issues for clients since then, and then we also worked together on another project sort of a longer term disaster recovery moving to Azure type project for our client.
Carlos: So today our topic is indexes and now this can be a pretty broad topic and just some of our pre-planning we thought, “Gosh, there’s no way we’re going to cover it all.” Some of the components are just a little complex to potentially discuss all of them over an audio format. So some of that will be pointing to our show notes and pointing people to additional documentation there. But let’s go ahead, I guess let’s start with that conversation and kind of maybe even defining an index and how that works. Now I think, so the first one, so we talked about indexes, we’re talking about copies of data, the way our data is structured, right. The very first example we have of that is a heap. Basically, a heap is a table with no indexes on it so there’s no structure or order into the way that the data is saved. Obviously, you have columns, right. You can have data types, and all those kind of constraints but the way the data is actually saved is not part of the definition if you will of the table. That makes it a heap.
Steve: Yup, so then really from there we take a look at how do we get to the data faster because with a heap to find any specific piece of data you’ve really got to search through the entire list, or the entire table and look at every single row. And say, “Is this what I’m looking for? Is that what I’m looking for?” Eventually after you’ve looked through the entire pile, heap in this case you come back with the result. But another way to this is by reordering some of the data. The way I like to think of it is sort of like a phonebook or a phone list. And I know growing up I’m used to seeing the yellow pages and the white pages. The yellow pages were the business listing and the white pages were the listing of every person and business with a phone number organized by name. Now, if you were just to take everyone on that phone list and just put them in a random order perhaps in the order that they signed up or purchased their phone service. That’s sort of the equivalent of a heap. There’s no specific order to it that’s useful for searching on it unless of course maybe you’re searching on the order that they got their phone service installed. But it would make it really impossible for anyone using that phonebook if they were just in that order, just random order basically to find anybodies phone number. So what they’ve done with the white pages is they’ve organized all of the people ordered by last name, then by, when there’s multiple people having the same last name you have the first name order, and when there’s multiple people with the first name you have the middle initial order. And that’s really the equivalent of creating a clustered index on last name, first name and middle initial. What that effectively does is it orders the entire table or phonebook in this case by those items so that when you needed to do a lookup and you’re looking for someone who’s name is Smith, you don’t have to start through the As and go all the way through to the Zs and confirm you found all the Smiths. You can just jump to the S section and then track down where Smith is from there really quickly. So then we look at the yellow pages, and the yellow pages are the business listings where the business is organized by business type. So if you’re looking for an accountant you can jump to the A section, find the section where all the accountants are and then lookup who all the accountants that are listed in the yellow pages. But if you’re looking for an exterminator, you need to go to the E section for exterminators and then you can find the list there. Now, with the yellow pages, it’s sort of like a filtered, non-clustered index in that it’s a copy of data that exist in the yellow pages, sorry it’s a copy of a data that exist in the white pages but it’s ordered differently and the reason it’s filtered is that it only includes businesses. So we’re kind of throwing two things in there, one it’s a non-clustered index and then it’s a copy of the data and two it’s filtered because just people have been thrown out and only businesses are shown.
Carlos: One thing that’s kind of important there is that that filter won’t necessarily change, right. That category of business, you’re not going to take that out of your system. While you can potentially filter on dates, it is a specific date. You’re not going to be like keeping an average for example of 90 days in an index. That wouldn’t be a good use of an index. But if you only wanted to see like from example, the beginning of 2017 forward then that might be an option, something like that.
Steve: Right, yup. So then next if you had what’s often referred to as a reverse phone directory. Typically not included with you white pages or yellow pages but it would be ordered by phone number so you can look up a name. That would be the equivalent of a non-clustered index or a copy of some of the data that’s been ordered in a different way to make it quicker to find something. So if you want to find out who has the phone number at 555-1212 well you can look it up by the phone number and then find out their name. And that would be just a plain old non-clustered index with no filter in that case because it would have the entire list. Not sure if the yellow pages and white pages are international or available in all countries but hopefully everyone has sort of understood where we’re going with that as an example on how those indexes work.
Randolph: Yeah, that makes sense.
Carlos: So let’s see, Randolph, what about thoughts on clustered indexes vs. non-clustered indexes?
Randolph: Well, the clustered index should be chosen very carefully because your entire table is going to be sorted according to that column or inside of columns and what you’re looking for is a unique value for every row. And you want to make sure that the clustering key is narrow so that it doesn’t take of that much space because that clustering key is really used in non-clustered indexes. Because you have to have a way for that copy of the data to match back to the original table so the way that the database works is it stores the clustering key, the clustered index inside the non-clustered index so that it can match back. Your clustered index, your clustering key has to be narrow. It has to be unique preferably and ever increasing. What you don’t want is a random value because for a number of reasons, which we can get into later. A random value causes something called page splits and it kills performance, and it kills the way the storage mechanism which just makes things ugly. So you want a number that is ever increasing like an integer or a big integer if you’re going to have more than 2 or 4 billion rows. It’s just you have to choose carefully.
Carlos: One of the things I like there about that idea, I guess we can kind of jump in there, so you mentioned page splits. Even Steve using the analogy of the phone book in a real and almost, I guess it’s an example. But if you think about taking pieces of paper right, and then basically writing on them and filling that up. That’s kind of what the way that SQL Server works and it has and these pages at least in SQL Server are 8K chunks right, so it has to sort them. When you talk about that sorting and the clustered key, clustered index rather it’s going to store them in that scenario and I think to your point is that when you’re doing that in kind of a random order it has to resort that every time, and those pages can get out of whack because, then all of a sudden I’ve got to fit in data that wasn’t there originally.
Randolph: Exactly, so what happens there is you have to wait for the index to be resorted and that can take time. It can kill your performance badly, so that’s why we try and stir clear of wide random columns for a clustering key or a clustered index. For example, GUID columns are not a good choice for a clustering key. That doesn’t mean that will make a bad index but they would make a bad clustered index because remember the entire table is sorted according to that index because the clustering key is how you’re data is sorted and that is your actual data. It’s not a copy of the data.
Carlos: So it’s worth asking at this point, right, to maybe like a small little poll here. So we talk about non-clustered indexes and then which are separate and different from but very, they kind of go hand in hand with primary keys most of the time, right?
Randolph: There’s an interesting conversation here. Microsoft for I think a good reason, it wasn’t the right choice but a good reason, decided that if you create a primary key inside the Management Studio Designer and in fact even before there in previous versions of the tool Enterprise Manager I think was called in the old days. If you create a primary key on a table it will automatically make that primary key a clustered index. I think the reason for that default was because they wanted you to have a clustered index at least. And because the primary key is unique by definition it made sense from a designer point of view and for people who are not intimately familiar with internals to have at least a unique value as your primary key and also as your clustering key. It makes sense from that point of view but it can run into trouble because let’s face it a GUID is unique. It’s a very good candidate for a primary key but it may not be a good candidate for a clustered index. In fact, it isn’t. There are reasons why you would not want to use the designer on particular tables. In fact, I prefer not to use the designer anymore when I’m creating tables because it has a bunch of defaults that may not be appropriate for the design that I’m implementing. It doesn’t mean I don’t use the designer. In fact, I use it a lot for creating relationships and stuff because it’s easier. But in terms of clustered indexes and primary keys the designer will make that the same set of columns so it’s something to watch out for.
Carlos: Well, now, having said that, kind of from a poll perspective. What percentage of time are your clustered indexes different from your primary key?
Randolph: Well in my case, 1%.
Steve: Yeah, I think I would put that really into two camps. One of them is if it’s a new database that I’m architecting from scratch or new tables that I’m building out where I have control over it. Oftentimes they may end up being different, the clustered index versus the primary key. But if it’s a, I don’t know, third-party application database that I’m jumping in to do some maintenance on or something. Like 99.9% of the time those are the same.
Carlos: Ok, and so I think, that’s been my experience as well most of the time, right, so that 99.9% of the time. It’s pretty much like when I’ve identified an issue or they have for an instance a GUID as their primary key and not in the clustered index, then we decide, “Hey, maybe we should make change here.” I got into a little bit of, so I’ve got some feedback. I’ve just put out this book “Zero to SQL”. And one of the things, actually I put a sentence in there. I said, when talking about non-clustered indexes, I say, “We don’t have to include the primary key of the table in the non-clustered index because this is automatically gets included in the index.” Now, somebody kind of took me to task, and they’re ultimately correct in the sense that, it’s the primary key that gets added, it’s the clustered index that gets added. I just thought they’re just curious I’m like how many times is your primary key not your clustered index. That’s kind of where I was going with that.
Randolph: Yeah, we’re designing from scratches as Steve said. You might have a reason for that.
Steve: So I guess you has just gone into a little bit talking about how you said in the book something about not putting the same columns that are in the primary key into your non-clustered indexes because they are included there as well. What was the feedback you got on that?
Carlos: So technically correct. It’s not the primary key that gets added, it’s the clustered index that gets added to the non-clustered index.
Randolph: And in fact there is a little bit deeper depth into that is it gets into the leaf node guaranteed. But if it’s a unique index then it’s not in the intermediate levels if that matters.
Carlos: Ok, and so here we start talking about and this is part of the index that will be a little bit hard to describe over the audio portion. But one of the things we kind of came up was that the idea of the game of the $10,000 pyramid or even pyramids, the card game, if your familiar with that. And that idea that you have a structured top and the data kind of goes the pyramid format, and that’s one of the things that Randolph mentions there. So they have leaf nodes and other types of names if you will for some of those sections so that as SQL Server is looking through the data it can find, do I need to keep going or not.
Steve: Yup, so really there when the index is being used, it’s that pyramid or like a tree type structure. And it uses that to be able to quickly jump to the pages that you’re looking for rather than scanning every leaf node. It goes through that tree to get to the data quickly. Randolph, did you want to talk about the B-Tree and how that part works?
Randolph: Well, it’s difficult to speak about with just audio. But if you think of the top of the pyramid there is only ever one page there and that is called the root node. Only ever one page and it’s also an 8K page. And it would be a type of page instead of being a normal data page. It’s called an index page. But otherwise it’s exactly the same as a table structure or a clustered index because remember your clustered index is your data. So your non-clustered index and your clustered index have a similar structure. You’ve got the single page at the top and then inside that page are references to other pages. And those pages contain the index information. So if you need to go to Stedman, you would look on the root page for where the S starts and then tell you which page to should go to where the actual data is and then inside there it will be able to find the data much quicker than having to read the entire table to find Steve Stedman’s number in there.
Carlos: Almost like an index of sorts, right, like a database.
Randolph: It’s exactly what it is, right? It’s just an implementation detail of exactly what you’re talking about. You flip to the back of the book, you look for S, there’s Stedman and you know which page to jump to. That’s exactly what’s happening.
Steve: So then one of the differences with that is the amount of time it takes to get there. Is it if you’re going through a heap or a clustered index that isn’t clustered on last name and you’re searching for me Stedman. If there’s a million pages in your table you have to touch all million of those to check and see, to find the rows with my last name on it. But with the index in that tree structure there you might have to read two, or three, or four pages depending on how deep that index is to get to the S section, and then scan through just a few pages once you’re there to find Stedman. And the difference is you’re touching maybe 8, maybe 10 pages rather than.
Randolph: Yeah, tends a lot for a seek. What you’ve just described, if you’re checking the entire white pages or the entire index that’s called a “scan”, and if you’re just looking for a particular record because you know how to get there through the index then that is called a “seek”. There’s a whole conversation here about which is better? The answer is none of them because it depends on what you’re doing. If you need 10,000 records scan is much more efficient than if you just need one. So a seek would be more efficient if you’re just looking for one or two records. But if you’re looking for a million out of your 10 million then a scan would be much more efficient because it would only need to read fewer pages in that respect.
Steve: Yup, good point there. I think that’s one of those that it comes back to really what is it you’re trying to do. Sometimes the query optimizer has to guess at what is it you’re attempting to do with your query. Usually it gets it pretty close but sometimes it’s not quite right and you end up with a seek when a scan may have been better and vice versa.
Randolph: Yeah, and that gets into conversations about are your indexes correct? Are they stale? Do they need to be updated? Do they need to be maintained? There’s a whole bunch of stuff here that we can talk about. There’s statistics as well.
Carlos: I guess we can jump into some of that. Before we do that I think one of those ideas is, are my indexes good enough? Because the optimizer can do some different things based on the information that it has, right? And then there’s this idea of a covering index. Take us through that.
Randolph: What happens there is we spoke about the structure of the index briefly were you’ve got your root node, you’ve got your leaf node where your data is, and then there’s this intermediate levels. What happens with an included column is that if you have a table where you need to bring back the last name, the first name, the telephone number but there’s a whole bunch of other data in there as well then your index would only need to contain those three columns. But let’s say that some of your queries had happen quite often would also require a middle name. You could add the middle name as an included column. That middle name would appear in the leaf level so the bottom level of the index. But it wouldn’t appear in the intermediate levels and that’s called an included column. Your index doesn’t have to be huge if it had four columns because you wouldn’t be using it for all of your queries but it would be a suitable covering query for the one that requires the middle name. I hope I haven’t confused everybody by saying it that way.
Steve: Just a little bit more detail on the covering index because sometimes when I talked about covering index is people say, “Oh, how do I create that? Do I create, is that a different create statement than a non-clustered or clustered index?” And the answer to that is really, No. A covering index is simply an index where all of the data that you’re asking for in your query happens to be in that non-clustered index. It can return it all back to the query without ever touching the heap or the clustered index. But if it’s not covering, what that means is it uses that non-clustered index to find the rows that you need, and then it jumps back to the heap or the clustered index to bring in the data that you’re looking for. So that last step to jump to the heap or non-clustered index that skipped in the covering index and that can save quite a bit of time too.
Carlos: And that’s in your execution plans you’ll see that lookup, right. And that’s what’s doing. It is basically, I don’t have the information I need so I need to jump and grab it, right. So whether you should then create a “covering index” will depend on how many times a query gets run, performance to that query and all those kind of things.
Randolph: And how many columns there are in the lookup and all sorts of stuff. Yeah, it’s an art. It’s not a science.
Steve: So then with that when you’re looking at the query plan and there is a suggested index. And I see this quite often where it says suggested index with two or three columns and then include, and it includes like 25 columns in it. That kind of thing would certainly give you a covering index but what are your thoughts when SQL Server is suggesting those types of indexes?
Randolph: Don’t do that.
Steve: And why?
Randolph: Ok so, when SQL recommends and index, a missing index, it is only specific to that query. If we know anything about query plans you can add a space to a query and it will suddenly be a different query plan maybe. Don’t just create missing indexes as they come up. What you need to do is to evaluate against other missing indexes or other recommendations for indexes on that table. And maybe combine the different recommendations into one or two new indexes or modifying existing indexes that try and accommodate as many of those recommendations as possible. An index that includes all of the columns of the table you might as well just be creating the clustered index. It’s just not efficient.
Steve: So one of the things that I see oftentimes when I’m looking at performance tuning and you’re looking at maybe indexes that are inefficient or not used. Often times you see them names something like missing index name which is a clue that somebody just copied and pasted from the suggested missing index and run it then they end up with an index that may not be really what they’re really looking for there.
Randolph: In fact, this goes back to Pinal’s interview that you did in December where he said his reference point is, “Do you have more than ten indexes on a table.” He’s going to start looking into that and see why. Because you can have on a legacy system or even a vendor system, third-party that you’re not allowed to touch. People have over the years gone and added indexes because it helps with the particular query. You can end up with duplicate indexes or indexes that are only different by say an included column. You’ve got all of that maintenance that’s going into keeping these indexes up to date. And statistics and all other stuffs that goes with that when in fact you can consolidate and reduce your indexes quite easily by just analyzing the system and how those queries are running.
Steve: Ok, so then let’s say I’ve got too many indexes like you’re talking about there and some of them are not efficient, or they’re not being used. They may help someday with some query but what are the negatives to having too many indexes specifically what’s going to slow down when you have too many of those?
Randolph: Well, there are two things that I would like to address. The first things is, if this index is useful once a month then create it once a month then drop it again. If you’ve got a reporting style query that has to run once month then generates a report then without the index it would take 10x as long, go ahead and build that index just before the report runs and then afterwards drop it because it’s not in use 28 days of the month or 29 days of the month. There’s nothing wrong with you creating and dropping index when you need it as long as it doesn’t obviously impact business hours. That’s the first thing. The second thing to answer the question is every time you modify the table it has to make sure that the indexes are up to date. So if you’ve got 10 indexes or 15 indexes on a table it’s going to have to, every time you insert or update a row, or delete in fact which is the same as an update because it’s changing the data you’ve got to make sure that that row is replicated or the change of the row is replicated in each one of those indexes. So it has to lock the table and it has to lock the indexes while it’s updating them and that can cause performance problems in of itself. It can also cause problems because you are locking data so you cannot make any changes, you cannot select from that unless you start using tricks that you shouldn’t be using query hence which I’m not even going to mention. Because you have too many indexes and you’re trying to satisfy a possibility that might happen in the future when in fact you’ve got statistics that tell you, DMVs that tell you exactly what’s going on with your database. Use the information available. It’s not SQL 2000 anymore. We have better ways of analyzing how your queries are running.
Steve: Ok, so then one of the things that often comes up, and it’s highly debatable, I don’t know, in places that I’ve been recently is fill factor. And the concept of what is a GUID default fill factor setting and what does that really mean to your indexes?
Randolph: Ok well, fill factor is in theory is about leaving enough space in the data pages themselves so that if you do need to add columns and it is a random type of column like we spoke about before with GUIDs. It leaves space in the data page so that it doesn’t have to split as often and it can go and insert data into those pages without having to resort the entire index every time. That’s kind of what’s it’s about. Do you want to add anything on that definition?
Steve: Well, I was just thinking, I heard a debate awhile back of about someone arguing that with SSDs and faster storage that your fill factor changes over spinning storage. I think that, I don’t know if I entirely agree with that.
Randolph: I completely disagree because all that an SSD is doing is making your problems go away until a later time. The problem is still there. It’s just being hidden by the fact that you’ve got faster drives. In fact, the same goes for adding more memory. It’s just that happens to be cheaper than trying to fix your indexes.
Steve: Yup. Ok then, so with fill factor I think what we end up seeing oftentimes is that people have a default fill factor setting on their database. They think that that’s going to cover what’s ever they need there it’s going to be right for all occasions. But really it comes down to that fill factor needs to vary depending on the data type, the frequency of change, whether the data going in is sequential or ever growing, or whether it’s completely random. Fill factor really depends on the data.
Carlos: Do you have different fill factor settings based on your clustered vs. non-clustered indexes? Because that kind of would be a good sorting factor in the sense that in theory we talked about our clustered indexes. Again, assuming that they’re unique, that they’re ordered, that you shouldn’t be seeing as much fragmentation there because again they’re kind of in that order. Kind of going back to the idea of them being in the primary key or unique, ascending, keep going, identity is a good example of that. It’s the clustered index is that or going to be a little bit more random if you will, and you could see a little bit more shuffling.
Randolph: The magic word you use there was fragmentation. This used to be a huge problem as Steve mentioned with spinning drives. While I disagree with the fact that I’m not worrying about fill factor at all is ok. I disagree, you should worry about fragmentation. You should worry about these things purely from the point of views of knowing that your database isn’t in a good condition is well maintained. At the same time though, index fragmentation with faster drives doesn’t matter as much as for example making sure that your statistics are up to date. In fact, I now say in my performance tuning session if you want to get the best benefit from your maintenance window. It is far better time spent by updating your statistics than to, for example, rebuild an entire index. Firstly, from an I/O point of view you need a size of data space for large tables if you’re going to be building an index rebuilt. That’s going to need, transaction log space is going to need space in the table itself, in the data file itself. It’s going to need a lot of resources, and memory, and all of those things. And if you’re replicating, it’s going to replicate all. It’s a big deal to rebuild a large table whereas a much more effective use of your time to do a statistics update. And the benefit from having an index rebuild versus your query plan being more effective fragmentation is less of a problem now. So from that point of view, Steve, yes I will agree with your statement before about fill factor But yeah, the one thing I wanted to mention on fragmentation is that there was a confusion a few years ago, fill factor of zero and fill factor of 100. What’s the difference? There is no difference. They’re exactly the same thing. Zero fill factor means there is zero space left in the page, fill factor of a hundred means it’s 100% used so it’s the same thing. I don’t know why they decided to distinguished but who knows it is Microsoft. A fill factor is a percentage from 1 to 100 so zero is a hundred. What you want to do there as you mentioned is it depends on the type of column you’re addressing. Less so of the type of index you’re using but more about the type of column. So a GUID as we spoken about about before is a random value so you would want to have a lower fill factor on that index for that type of data because then it reduces number of page splits which reduces the amount of fragmentation. However, you will still need to maintain that index from time to time to make sure that the fragmentation is reduced.
Steve: One of the interesting things I’ve come across there is that if you’re in a scenario where somebody has a GUID as their primary key and as their clustered index and it might be in a position where it’s too big in a given maintenance window to change that clustered index or too time consuming to change that. One of the work arounds that I found there is to use the new sequential ID rather than new ID so that you can turn GUIDs into sort of a pseudo ever incrementing value.
Randolph: Right, and that’s 16 bytes wide so.
Steve: Yup, it’s still big.
Randolph: It’s still big but it is sequential as you mentioned.
Steve: Yeah, it’s better than random at that point.
Randolph: It’s better than random but the reason people use GUIDs is usually from a development first perspective so the developers want to manage the index, sorry the primary key in the application as opposed to the database handling it. For number of legacy reasons that was a good idea 15 years ago. It’s not a good idea now. So there are a lot of habits and this is the way we do things happening from a development side that are not being, well the new stuff in SQL Server is not being adapted as effectively because there is 15 year old news on the Internet that says SQL is not good at this. The other thing is we have to educate people to say, listen there is a reason these RDBMS is in the top 3 in the world. There’s Oracle, there’s MySQL and there’s SQL Server, those are the top 3 RDBMSs on the world, Relational Database Management System. So SQL Server is in the top 3 for a reason. It’s a good product. It’s in fact one of the best products out there to handle your data so you should be using the newer features and the newer compatibility levels and things like that. And get away from the old habits of 15 years ago because they’re bad habits and things have gotten better in the last few years.
Steve: Yup, and that is oftentimes a challenging debate to have. And I found that to show examples or to come in and show what the performance difference is between one way of doing it versus the other way. It is a good way to show what’s better there. Ok, so then another one is talked about just the different options around indexing that we have is the concept of filtered indexes. And I know I threw this term out when we’re talking about the phonebook directory around the business names in the yellow pages. So filtered indexes, any thoughts on pros and cons and sort of how that works that you want to talk about here?
Randolph: Well, the idea of a filtered indexes is that if you have a very particular condition in your WHERE clause in your query. Let’s say that you want to grab all of the Stedman’s in the phonebook where there middle name equals J or starts with a J. If that WHERE condition happens often, then you could create a filtered index that uses that WHERE condition and then the index will only include those rows that satisfy that condition. The problem with the filtered index there is that if you use a parameter in your WHERE clause it’s not going to use the filtered index. So it’s solving a very specific problem in my opinion. It’s a good problem that it solves but it’s a very particular way of solving it. And you have to be very careful when you create your indexes and create your queries that you’re using them correctly.
Steve: Yup, so a real world example I came across there that filtered indexes were very handy was around a very large table and needing to purge data beyond a certain date. I found that when I just ran DELETE statement to delete where something that’s older than a given date it was forcing a large index again to find that. And it was like taking 40 minutes to delete a thousand rows. And then what I did instead was I went in and I added a filtered index and deleting one day at a time that was beyond the date range. I added a filtered index just for that one day. That filtered index gave me the shortcut to be able to delete a thousand rows at a time just split second like less than a second to delete those rather than extended amount of time to delete it. I found in that case I could add that filtered index do the work I needed to do and then drop it and then add a new one when I was working on a different day, and that was really a handy way to use it, to improve performance there.
Randolph: Yeah, that’s an excellent example of when a filtered index would make a lot of sense. Another example is if you have a very large table and you have a bit typed field where you’ve got a nullable values, it’s a nullable column, and you’ve only got a few of those rows that have a value of true or false and you could have an index, a filtered index on either of the null values or the populated values depending on what type of query you’re writing. And it can identify those columns or those rows that have that bit column set much more easily.
Steve: Yup, yup, another good example there.
Carlos: Exactly, yeah, that’s been my experience as well.
Steve: So then let’s jump a little bit more into the care and feeding side of things, and I know we touched a little bit on rebuild and a little bit on statistics but if we look at what are the general things that we needed to do to take care of indexes and keep them well structured on an ongoing basis?
Randolph: I’d be interested to hear your side because I’ve been speaking a lot. I’ll open if I had any feedback.
Steve: Ok, I mean, so really I think on that there’s the concept of reorganizing versus rebuilding as you get fragmentation over time. And then there’s statistics and I think that, or let me start with a very bad way of doing it that I see quite often that is if you go and use one of the default maintenance plans. And the maintenance plan you just go and click let’s rebuild my indexes, let’s reorganize my indexes, then let’s rebuilt statistics. One of the things you end up doing is rebuilding which basically rebuilds the entire index from scratch without, basically if you’re in Enterprise Edition you can do that on online mode but if you’re not in Enterprise Edition that’s an offline operation. But what the default maintenance plan people sometimes check is that they say, “Let’s rebuild them.” And that gives you the best structured index you’re probably going to end up with and it also rebuilds your statistics. And then they reorganized them and then they rebuilt statistics with like a 5% sampling or something like that. And in that example you end up with a whole lot of work that’s actually giving you sort of a negative approach or negative impact. Whereas if you just rebuilt them to begin with and then not done the reorg, and not done the statistics rebuild afterwards you would be in a lot better shape.
Randolph: So, you know, those defaults that you’re talking about, the reason that happens is that because the maintenance plan wizard has got rebuilt then reorg, then statistics in that order in the list.
Steve: Absolutely, yeah, people just click the checkboxes or select them and that’s what you get.
Randolph: So when I see that I blame Microsoft for that. That’s purely Microsoft’s fault because generally people here using the maintenance wizard are using it because they don’t know how to do it with say, Ola Hallengren or MinionWare tools so just to name two, there are many. So that’s Microsoft’s fault and every time I see that I do not blame the DBA, or the accidental DBA, or whoever is managing it. It’s not their fault. But it is causing a whole bunch of work that is unnecessary.
Steve: Yes, it may not be their fault but it’s their pain they have to deal with. When I see that I generally going a little bit of education to explain why that’s bad and turn those off. And like you said, replace it with MinionWare or Ola Hallengren scripts. Both which are far better solutions at doing at than the default maintenance plans.
Randolph: So the one thing that I wanted to mention, Steve, is that an index rebuild is, as I mentioned before is size of data operations. So if your table is 10GB and you need to rebuild it, if you’re doing a clustered index rebuild it’s going to require 10GB of space in the data file and then it’s going to require at least 10GB in the transaction log. And if you’re rebuilding a clustered index it is going to by default rebuild all of the non-clustered indexes on that table that referred to that clustered index and that means all of them. So if you’ve got a 10GB table and you’ve got 30GB of non-clustered indexes on that table it’s going to require at least 40GB in the data file and 40GB in the transaction log because it’s going to rebuild everything. So if you’ve got a maintenance plan that says let’s do a clustered index rebuild and then before that you had the non-clustered indexes rebuilding it’s going to do them again anyway so it’s a nightmare. This is why I say it is statistic updates more regularly are going to be a lot more effective than doing index rebuilds over the long term. You still do need index maintenance but let’s not do it every night.
Steve: And that’s where I like the third party solutions that we mentioned there for the ability to rebuild those where you have options to say given a certain level of fragmentation I want to choose to reorganize this or if it’s even more fragmented then we’re going to choose to rebuild it. And some of them even have the option of only do this for a certain amount of time.
Randolph: Can you go into detail what’s the difference between a rebuild and a reorg is?
Steve: Ok, so with the rebuild it’s basically, technically it’s like the equivalent if you drop the index then recreate it, and then it’s going to be completely rebuilt from scratch and it will take up the log and disk I/O like you talked about a moment ago. But with the reorg, what that’s going to do is going to the index and basically restructure or compact the index pages as needed. It will go through and shuffle them around to compact them down to a smaller amount of space. And it will do that page by page as it goes through there and that can be done while things are online in Standard Edition. But it’s slow and the end outcome is not as good as a complete rebuild. It might get you real close but it’s not going to be good as complete rebuild but the benefit is you can keep it online.
Randolph: And you still need to update the statistics after you’ve done a reorganization which is something that people tend to forget as well. The statistics need to be up to date so that the query optimizer knows where the data is and how much of it is in there, and what the distribution of the data is in that index so that it can make the right choice about what type of query to create because a lot of queries is coming down to SORTS and JOINS and that’s where you find your biggest performance bottlenecks. A SORT is a blocking operation so everything in the query has to stop until the sort is finished. So if your index is up to date and it’s sorted correctly then it doesn’t have to do that operation it’s already done. Whole bunch of things about loops, we can talk about here as well which I’m not going to. An up to date statistic is much more valuable to the query optimizer than a defragmented index.
Carlos: There’s another point there even that, we’ve talked about current feeding of the indexes, right, rebuilds vs. reorgs. However, we can also have statistics that are outside of the indexes, right, on the table. Those are also providing feedback to SQL Server to the optimizer. And if we neglect those we’re still kind of the same peril because the optimizer maybe using the statistics but we’re not giving them that current feeding.
Randolph: Correct so, it’s valuable to have a, I hate this word, holistic approach to your current feeding is that you need to look at not just indexes, and not just fragmentation, and not just statistics but all of them. And in a way that is effective and efficient for your maintenance window. You don’t want to go and rebuild indexes every night because it doesn’t make any sense. But you also don’t want to stop doing statistics updates.
Steve: Yup. One thing I commonly see with that is rebuilding of the statistics or updating the statistics where somebody will have a job that goes through and blindly update every single statistic in the entire database every night. And that can have a pretty big performance impact on things depending on the size of your database. And a better approach to do that is to look at which tables or indexes actually have changes and only update those that have a fair amount of changes to them on that frequent basis. And then maybe on a regular basis of once a month or once a week you do an update on across all of them.
Randolph: Yeah, that’s a very good point. The amount of data that changes in a table if you’ve got automatic statistics enabled, we’re going to go a little bit of a segway now, by default automatic statistics are enabled. There are other settings that you can choose in 2016. There’s a whole bunch of new stuff per database which you can choose. Let’s talk about everything before 2016 because that’s what most people are still using. Statistics will only update themselves if a certain percentage of the table has changed. The value has changed across versions recently but the fact is that statistics will only update themselves if a certain percentage has been changed. So if you’ve got a really large table with tens of millions of records or rows and you have only one million that have changed recently statistics will not be updated automatically. So it does pay you to use one of the third party tools we’ve mentioned like MinionWare or Ola Hallengren’s maintenance solution where it will go and check, has any number changed and then you can update the statistics there and that will help your query optimizer as well.
Steve: The other side effect you can get with that is that if you do hit that threshold where it decides it’s time to rebuild those statistics that might be in the middle of your peak load during the day. And right when you’ve got a lot of traffic coming to your database and index or statistics rebuild occurred that can have impact on things too.
Randolph: That in fact is one of the recommendations for SharePoint style databases. There’s a lot of things I don’t like about SharePoint but the fact is a lot of people use it so one of their recommendations is to turn off automatic statistics updates on any SharePoint style database. That includes CRM, it includes GreatPlains, all of those even. What’s the one that I used? Whatever it is, there’s a whole bunch of them where you should turn off statistics updates automatically and then include statistics rebuilds in your maintenance plans. So it’s just to keep in mind each database is different.
Steve: It will probably apply to Navision and Dynamics as well.
Randolph: Yes that’s the one. Yeah, Dynamics is the one I’m trying to remember. Thank you! So any SharePoint style database they do recommend turning off statistics updates automatically and to do them in a separate maintenance window. So it pays to do your research to make sure that you’re doing the right kind of maintenance plans for your database and your instance.
Steve: Yup, very good point.
Carlos: So I guess a couple of different things we talked about all of these components. And I guess let’s now talk about some of the benefits right, so all of these things we have to go in, right, kind of the holistic approach, kind of having to know our data, getting more familiar with it. Ultimately to what end are we going to do that? I think, so we’ve talked a little bit about performance. I guess we should probably talk about how that performance gets into the system if you will or the mechanisms that cause the improvements?
Randolph: Ok, so when I do my performance session, what I say is, “You’ve got expensive items like your CPU. You’ve got less expensive items like your RAM, and you’ve got even less expensive items like your hard drives.” So your CPU is the most valuable thing to you because the SQL license is per CPU core and you want to make sure you’re using the most efficiencies of your CPU and memory as you can. What an index does is, we spoke about this before, it is a copy of the data so you want to keep your indexes as efficient as possible so that if you’ve got a large table you don’t want your index to be large as well. You want it to be smaller so that less of it is in memory because that’s what this game is about. SQL Server is all about being in memory as much data as possible in memory. So for Standard Edition up to 2016 even you’ve only got a certain amount of memory that you can access. 2016 Service Pack 1, the limit is still there but that’s your buffer pool that is in memory as opposed to everything. But the fact is that there are limits to the amount of memory you can use for SQL Server specifically with Standard Edition because not everybody can afford Enterprise. So you have to manage the amount of data that’s going into memory as much as you can and that is most effectively done by the right kind of indexes for your queries. And that’s also why you don’t want to have duplicate indexes because it will also be in memory. Also, you don’t want to have wide indexes because they will take up more memory than they need to. And that’s why included columns are very handy way to reduce the size of your indexes that’s why we have filtered indexes. All of these things to try and reduce the amount of data that is in memory so that we can do a lot more with what’s in the buffer pool.
Carlos: So then how do we go about or what’s the best ways to determine, we talked about looking at execution plan that kind of gives the recommended index there. So what are some good strategies to go about deciding which index is get included and which ones we need to ignore?
Steve: One approach I take on that is if I’m looking at an execution plan and it suggests an index. If it’s a development environment one of the things I’ll try initially is just create the index and see how it helps and I’ll drop it, and then go look and see is there a similar index. I mean, if the index that it suggested was beneficial I’ll go and see if there’s a similar index that could be modified or added to that would give the similar and a fact that the suggested index was doing. And sometimes that means you’re taking existing index and just add in an include to it or taking an existing index and add in another column into it.
Carlos: Ok, so that’s kind of a dev environment, one query at a time.
Steve: Yup, and the other approach that I see there is I can even go and look at some of the missing index DMVs to go and figure out. Actually you look at DMVs for a handful of things, one is to figure out what are the missing indexes and figure out the overall cause associated with those missing indexes. And then come up with what are some good ones to add that are not going to be duplicates of other indexes and then see how that improves performance. You can also, using DMVs, go in and look, you can figure out what are your duplicate indexes. If you have duplicate indexes oftentimes you’ll see that one of them might be not being used at all and the other one is being used a lot. You can go in and drop one of those. However, you need to be careful when you’re looking at unique indexes. You don’t want to drop a unique index versus a clustered index or non-clustered index you want to look at and drop the right one there. Another thing to look at is big clustered indexes, that’s another thing that you track down is what are the clustered indexes that are really wide. And that really wide means it’s going to ripple through all the non-clustered indexes with that extra size. You can also track down unused indexes. What are the indexes that are on your system that are not being used? Now when you look at that you need to be very careful because that’s only going to be the ones that haven’t been used since the SQL Server instance restarted. Also, if you rebuild and index I believe it clears those index usage stats.
Steve: Sometimes, exactly. So it’s something that if you see there’s an index that is not being used you might want to track it over a few weeks to a month and confirm it, yeah it really isn’t being used and then go and take a look dropping those.
Randolph: That goes back to one of the best practices is to have a baseline. So know what your system is doing and track that overtime and then you can refer back to that baseline and say, “Well, this is exactly what’s happening. This is different.” And go from there. Yeah, that’s a good point.
Steve: Yup, and that reminds me of a project, Carlos, that you and I worked on. Were we setup a monitoring component that ran for a month and kept track of unused index details, and log them every few hours, and then we went back at the end of the month and reviewed the ones that over the entire month had no utilization and then suggest those as possible candidates to be dropped.
Randolph: That reminded me of Pinal’s comment as well that he’ll only look at the database that’s been running for a week. The one thing I wanted to mention is the DMVs that you’re referring to there’s one in particular, there’s a group of diagnostic queries that Glenn Berry, Glenn Alan Berry, from sqlskills, he uses and maintains them. He’s got a couple of, and so does Brent Ozar and a whole bunch of other people. But Brent Ozar’s one called as sp_BlitzIndex and Glenn Berry’s one is included in his diagnostic scripts. It’s a very similar query and what it does it waits your missing indexes as according to number of scans, number of seeks and something called Average User Impact. Now that Average User Impact number can be, I think it’s unit less, I’m not quite sure how does that number is calculated but if it’s really in a high impact like tens and hundreds of thousands then usually I will go and look at that first and say, “Well, how many times has this index been used in terms of number of seek and scans.” And if it’s a high usage index that is missing or a high impact then I will usually create that without too many modifications if it’s not too wide or doesn’t have too many include columns.
Carlos: I think it’s the number of times requested multiplied by the number of hits.
Steve: Yup, and just to jump in there I think I’d feel a little bit left out I didn’t have the chance to mention Database Health Monitor on that. In Database Health Monitor there are many indexing reports similar to what Randolph has described there.
Randolph: Certainly, from Steve Stedman’s solutions has a magnificent and free tool called Database Health Monitor which I have used. I don’t even get paid for this statement. It’s a good product. It’s free which is even more amazing. This is the great thing about the SQL Server community. There are a lot of free tools that are out there that are adding value all the time. And all that people asked is that you recognized them and I recognized Steve as a valuable member of our community. That ends the sponsorship message.
Steve: Thank you for the shameless plug.
Carlos: So it’s interesting, I guess, and maybe I’m being a little bit paranoid but using, so when I’m looking at that impact and I will take that and also using the ones that I’m interested in adding. Of course I want to do what Steve mentioned looking for duplicates or kind of what’s there, right? Then am I going to get over my threshold of ten or whatever just kind of taking into consideration what additional load am I putting on this table by creating the index. And while you want to test that out mostly the environments that I worked on they just don’t have a good mechanism for creating the similar load as I have in production. So when I go to implement that index one of the first things I’ll start to look at is those usage stats, right. Because I want the usage stats in the index that I just created to be going up because I know that, “Ok well, wait a second what did I do? Is this still a really good candidate?”
Randolph: Yeah, that’s a very good point.
Carlos: So I guess, other that’s how are going to go out and creating them, couple of other things that we didn’t really get into some of the other new indexes like column store or even XML indexes. But I guess other thoughts about when to start looking at these other, we even try to approach that now. Maybe I feel like we should punt on some of those.
Randolph: I can briefly mention about XML indexes that the way they work is they’re going to be larger than your table or your XML column. The reason being is that it will create an internal table according to the structure the XML document or XML field that you’re indexing. So it actually expands out the XML data into an internal table and then indexes that so you could end up with a column that is say 100kb maximum. That’s a small one and you can end up with an index that is 500kb, or gigabyte, or tens of gigabyte because it’s creating an internal table under the covers. A system table that it is then indexing so be wary of XML columns in that if you’re going to index then make sure that there’s sensible indexes that they’re only indexing certain nodes inside that XML document and be aware of that. And also I’d like to add here that if you use the XML data type, the data going into that XML row or column is not going be the same as what you get out. It does modify the XML data going in for whatever reasons.
Carlos: Say that one more time.
Randolph: If you have XML data that you put in for auditing reasons for example. If you pull it out it’s going to have been modified somehow. Either the tags will be slightly different or the spacing will be slightly different so do not use an XML data type to store audited records of XML types. If you want to keep an absolute record of what your XML looked like as it came out of a web service for whatever reason store it in a varchar or nvarchar column instead because in it it is identical. If you put it into an XML data type it will modify the data. It is still the same data but it will slightly modify. The tags will be slight different or whatever.
Steve: So then as far as column store indexes I think that’s something we could probably dedicate an entire episode just to talk about.
Randolph: We definitely can.
Carlos: I think you’re right.
Randolph: A column store index is neither an index nor clustered so it’s so complicated. In fact, there is a series of post. I think it’s Nico who has done it over one hundred posts on how column store indexes work. We could talk for days on that. They are fascinating and completely subverts what you think you know about indexes, and data, and row level storage and all that kind of stuff. It’s fascinating stuff.
Steve: Yeah, and just a few notes on that I think that it’s one of those things that is great for data warehousing or OLAP type things. And may not always be the best option for your OLTP side.
Randolph: At the same time you could have reporting style queries in your database and with 2016’s optimizations for column store you could have which can change and can be clustered and all sorts of stuff. You could have some stuff that in your OLTP environment that could be OLAP style indexes. Yeah, there’s so much to think about there.
Carlos: We saw that in Episode 78 with Brian Carrig. They talked about using their column store index in a transactional environment.
Steve: Yup, and then one of the things that I’ve seen that is one of those misunderstandings around column stores is that if you, because column store does the compression on each column, is that people think of it as I don’t have to have any non-clustered indexes if I have a column store index. And that’s not true at all. And that if you’re looking at a bigger table and you’re using column store you may still need to have some non-clustered indexes on it as well.
Randolph: Oh yeah, so do you research. Read all one hundred and whatever post. Become an expert then implement them.
Carlos: So we’ve talked a little bit about, so implications, how do we know what to use. So space considerations we talked a little bit about more in the rebuild process, right? That we’re going to need additional space in our data file and our log file things like that. I think we’ve had a pretty good conversation there. I guess one last idea I’ll throw out that we can bat around a little bit. We talked a lot about having multiple TempDB files and so one of the thoughts that I’ve seen out there is if you’re potentially, I won’t say, so what’s the value there is that if you’re looking to spread out or move your indexes from where your data might reside. One way to do that is to create another data file potentially on a separate storage and then rebuild those indexes.
Randolph: Ok, there are two things that I can talk about there. The first thing I wanted to mention is if you do need to move your table into a different file or file group all you have to do is do a clustered index rebuild and target that file group as the new location for the clustered index because remember the clustered index is you data. So that’s a handy way if you need to move your table into a different file group that’s how you will do it. The other thing is by virtue of that definition there is that you could have your non-clustered indexes in a different file as your data and you might want that for improved I/O performance or if you’ve got your read-only stuff in a different file group or all sorts of reasons for that. It’s very handy for splitting your load on the I/O level. Less of a problem these days but it’s a nice thing for large tables to split your non-clustered indexes from your clustered index so that’s not reading from the same portion of the disk or the I/O subsystem or both.
Steve: Just to note on that index rebuild for a clustered index to move it to a different file group although that will move the table and the entire index. One of the things that could be left behind when you do that is any of the varchar max or nvarchar max or other large items that are located outside of the data pages for that table.
Randolph: Yeah, off row data will be affected. Steve, I don’t know because I’ve never tried. What happens if you tell it to do the index rebuild with the log compaction enabled.
Steve: You know, that’s a good question. I have not tried that for a long time.
Randolph: I smell a blog post.
Carlos: Ok, very good. Well, awesome. I think great conversation and obviously there’s a lot more that we could talk about indexes but I think this was a noble attempt at covering some of the basics and getting into some of the nitty gritty as well.
Randolph: Yeah, the fact is that indexes don’t stand by themselves. They are a very important part of everything including statistics and everything else so don’t think that once you became an index expert you become an index performance tuning expert because that’s not true. You have to have a very broad knowledge of how things work in a number of different fields used upon to get the best performance out of your system. And there’s nothing wrong with good enough. You don’t have to have 100% defrag indexes. You don’t have to have indexes rebuilt. You can have them reorganized. Don’t have to have them reorganized at all if your statistics are up to date in certain contexts. There are a lot of tradeoffs that you have to think about when doing your maintenance plans and indexes form just a small part of that.
Steve: Very good point.
Carlos: So shall we do SQL Family?
Steve: So how did you first get started with SQL Server, Randolph?
Randolph: Steve, that’s a good question because when I originally got this question I had to think very hard about it. I have been fascinated from a personal level about organizing data. Back in my youth I would create lists and lists of things CDs, books all that kind of stuff and I would do them in the paper equivalent of Microsoft Excel. And then when Lotus 123 came along I started playing with that because my dad worked at the bank and he was a 123 expert. If I’d know about VisiCalc I probably would have used it. And then I got into Microsoft Access because it was there. It was a very old version of Access. I think it was version 2 that I started with and then I started working at a PeopleSoft implementation partner in South Africa where I am from. And that’s where I first work with SQL Server and Oracle at the same time, and I was not a fan of SQL Server. I have to be honest. At that time it was version 6.5 and version 7 that’s around the time I joined so there were issues because as some of your listeners may know SQL Server’s engine was rebuilt around 7 time, 6.5, 7 time. In fact, if you look on MSDN you can download 6.5 and you can download 2000 but you cannot download version 7. It is very difficult to come by. There are maybe good reasons for that. I got exposed to SQL Server that would have been in 1997, around there. So yeah, that’s where I first got exposed but then I didn’t really play with it until I got to the bank. I was more of an Oracle guy. Got to the bank in 2006 so there was a large gap of not playing in SQL Server and then I couldn’t figure out why DESCRIBE, the keyword DESCRIBE wasn’t working in SQL Server. For all of you Oracle office you notice there I prefer SQL Server now because as I said before a lot has changed. It is a much better product than it was in 1997. In fact, I think it has surpassed Oracle. I think it’s better than anything else as well and that’s because I’m trying to keep abreast of all the new stuff. I don’t want to be stuck in the past and have assumptions about the product. I want to play around with the new stuff so. That was a long way of saying 20 years or so.
Steve: Ok. Well I know that in the late 90’s Oracle was a much better product than SQL Server.
Randolph: It really was.
Steve: Yup, and that has changed. That has changed significantly in the last 17 years.
Randolph: Yeah, it has.
Carlos: If you could change one thing about SQL Server what would it be?
Randolph: I had a pot for this but I’m going to change my mind. The pot answer is in-memory processing on SQL Server, in-memory OLTP which was called Hackathon, that was the code name for it. There is a transaction log operation that happens even for stuff that you don’t want kept. It’s still being backed by disks somewhere. It may not be in the transaction log itself but it is still being backed by disk somewhere. I would like to have in-memory objects that do not have anything on disk backing them at all. They must just be there, and if the server goes down, oh well tough luck.
Carlos: Alright. Yes, I remember seeing you in the Bob Ward presentation at SQL PASS talking about the in-memory OLTP.
Randolph: There’s some very very cool stuff with in-memory OLTP that I’ve been playing with. I’ve been blogging about it as well on my blog bornsql.ca. That speaks about some in-memory ops and I found. Interestingly I found another issue that I’ve spoken to Jonathan Kehayias about where Temp tables are slower than Temp variables in certain cases. So I’m investigating that at the moment and it had to do with my in-memory investigations.
Steve: Oh interesting, I’d like to see what you find there.
Randolph: If I could change, that was my pot answer about SQL Server what I would change. What I would change is having a TempDB per user database as a configurable extra.
Steve: That would be nice.
Randolph: In other words it doesn’t have to be default but if I need a TempDB for a particular database I would like to have that as a separate entity so that I could manage it separately and keep track of the objects that are happening for that particular database. The architecture for that is going to be really difficult to do so I don’t know if Microsoft is going to do that but that’s what I would like.
Steve: Alright, so what’s the best piece of career advice that you’ve received.
Randolph: Oh, it’s going to be along answer as well, sorry. “Let your work speak for itself” is the first and foremost one. So it doesn’t matter what people say about you if your work can speak for itself then you don’t have to say anything. That’s the best piece of advice that I had. But the other is “Everything you say and everything you do is being recorded somewhere” so please treat people with respect. Treat people as you would have them treat you and don’t say things on voicemails that can be played back to you at meetings with the CEO and saying Randolph don’t do that.
Steve: I like that too.
Carlos: We won’t ask for the personal experience that led you to that advice.
Randolph: I have at the end of the year or every now and then when it’s a quiet time on my blog I say career limiting moves and that was one of them. So there are a bunch of career limiting moves that I can recommend to you. Don’t do because I’ve done them not because it’s good advice but it’s because I’ve done them so don’t do that because I’ve done it, and I will be one of them.
Carlos: Our last question for you today. If you could have one superhero power what would it be and why do you want it?
Randolph: I thought about this and I honestly don’t know. I liked Pinal’s when where he could touch a server and know what’s wrong with it. The problem is whenever I touch as server if it’s going to break it’s going to break then and there. Maybe I want the reverse of that superpower.
Steve: To just know without touching the server.
Randolph: Yeah, you know per cost of maintenance is one thing. But if I’m in a room and it is going to break it is going to break while I’m there which is good I guess in way because you then you know that we can figure it out then and there. But I’d like the reverse of that that would be cool.
Carlos: Well, awesome. Randolph thanks so much for being with us today.
Randolph: Well, thanks for having me it’s a topic near and dear to my heart indexing and maintenance and stuff like that so I’m glad I got an opportunity to speak to you guys about it.