Do you find you have tables and indexes that have grown really large? Are you moving around a lot more data than you used to? Are your windows for jobs and maintenance getting tighter? If you answered yes to any of these questions, partitioning may be for you!
Partitioning allows us to break a table or index down into smaller more manageable chunks. Partitioning can enable us to perform maintenance on just part of a table or index. We can even move data in and out of tables with quick and easy metadata only operations.
In this episode, we’ll go over basic partitioning concepts such as horizontal vs. vertical partitioning, how to identify if a table is ready for partitioning, and what you need to know about your applications before you implement partitioning.
Carlos L. Chacon: This is the “SQL Data Partners” Podcast. My name is Carlos L. Chacon, your host, and this is Episode 17. Today we are talking about partitioning, could be a bit of a niche topic since you have to have SQL Server Enterprise Edition. However, if you are planning for growth or maybe you were already using partitioning and maybe not seeing the results that you’d expect, then of course this episode will be for you.
I do want to say, we have a great community, and Brandon is a great of example of that. Lots of people willing to devote their time and energy to help others along the SQL trail. Sometimes, these podcast recordings don’t go quite as planned, so I’m grateful to Brandon for giving me a little extra time with him to make sure that we had a good recording.
If you were at the Summit last month in October, in Seattle, I hope that I had the chance to meet you. I know I’ve made many new compañeros, always interesting to see the different environment that people come from. I hope that we had a chance to talk.
While at Summit I attended a session where we talked about branding words. I thought it would be appropriate just to review some of the thoughts that I had about why I created the podcast and what I hope that everyone gets out of it.
I think if I were to use three words to describe what I was hoping to do with the podcast it would be community, fun, and learning. We’re trying to build a sense of community. We want to have a little fun on the way, and of course provide some learning opportunities or some things that at least you think about that you can further your learning along.
I hope that you’re getting some value out of the podcast. I know I enjoy putting them together. Talking with many different people, they provide such valuable insights and things that I wasn’t necessarily thinking about and I hope that you feel the same way.
We do always, however, welcome your feedback. We want to hear from you. You can hit me up at [email protected] or you can hit me up on Twitter @CarlosLChacon. Of course, you can leave your feedback on iTunes or on Stitchers. With that compañeros, welcome to the show.
Recording: SQL Data Partners.[intro music]
Carlos: Today we’re talking about partitioning. Of course I have with me Brandon Leach. Brandon, welcome to the show.
Brandon Leach: Thank you very much for having me. Thanks for taking a little bit of time out. We find ourselves in beautiful Portland and it’s always good to catch up with you.
Carlos: So we’re talking about partitioning. A specialized topic because it’s only available on SQL Server Enterprise Edition. It’s fairly straightforward to set up, but can quickly go sideways if we’re not careful with what we do.Now when we’re talking about partitioning there are two options available to us, right? There is vertical partitioning and then horizontal partitioning.
Brandon: Yeah, vertical partitioning being when you are essentially splitting out columns into separate tables versus horizontal partitioning which is what the table partitioning feature that SQL Server deals with.
Carlos: Right. So vertical partitioning, I have a table that’s 10 columns long or 10 columns wide. I want to take five of those columns, make table one. 6 through 10, make table two.So horizontal positioning, I’ve got 10 billion records, right? I want to group those. I want to level those. I want to put a billion records into 10 different tables. But the columns are all still the same.
Brandon: Well, 10 different parts of tables. In table partitioning, we tend to think the table is mostly a collection of rows. But every table already has a least one partition, so a collection of partitions actually forms our table and our indexes.
Carlos: Fair enough, fair enough. So you’re talking about basically a subset of that table?
Brandon: There’s essentially a hidden, additional layer or tables that aren’t officially partitioned. They have partition one, and that’s it.
Carlos: And that’s it. Right. So when talking about SQL Server partitioning, we’re talking about this horizontal partitioning?
Carlos: So why do we want to partition?
Brandon: Well, so often times it’s for maintenance and there’s also a performance indication there. But let’s talk about the maintenance part first. As tables grow larger, it can take us longer to, say, defragment those indexes.
Carlos: Archive data might be another example?
Brandon: Yeah, archiving of data is definitely another one. Being able to move data around from one place to another within the same data file even.If you’ve got staging tables, you can actually switch partitions out and put them into either their own table or into another table that’s very common as a key component of the sliding windows technique, where you have a table that is simply a heap.
It’s a heap table, no indexes or anything. You dump your incoming data into that as fast as possible. You then apply your indexes on that staging table and you can use the ALTER TABLE switch command and say this table is now a partitioned of that one.
Carlos: You prepped it, it’s on stage, it’s in the wings. You pat it on the back and you are like, “OK.” You toss it on stage. Now you are ready to go.
Brandon: There are some caveats with that though. Everything, the schema, the design has to be the same. Same constraints, same foreign keys, same primary keys, everything has to be the same.
Carlos: Right, so from the object perspective the database is going to look at those and it’s going to say, hm, these aren’t quite the same and it will reject it.
Brandon: You can essentially move a billion rows from your staging table into your production table in a millisecond, because it’s a data operation. It’s just checking to make sure they match and that nothing is going to be crazy and then it’s going to update the metadata.
Carlos: There you go. That’s therefore the beauty of partitioning.
Brandon: You can use that for archiving data out too.
Carlos: The opposite would be true, right? Now I have all this stuff. OK, you guys are ready to come out.
Brandon: Pull out the partition you don’t want into a new table. Run your ETL process against that new table. Now you can move that data elsewhere without blocking the original table.
Carlos: Very nice. You have all these different pieces that you can put together.
Brandon: That kind of gives you both the performance piece too.
Carlos: You talked about index maintenance. If we can separate from that we can decrease the number of reads that are going against our table because we are only going against part of it like a filing cabinet.
Brandon: Correct. Say I’ve got an index on a large partitioned table. I can analyze the fragmentation levels of different partitions. So now I can go through and I can say, well this partition, it doesn’t need to be touched. That second one needs to be re-orged. That third one needs to be rebuilt. So I’m able to speed up my index maintenance. I’m able to be more proactive.
Brandon: I don’t know if you ever tried to rebuild or even re-org a terabyte-plus sized table.[laughter]
Brandon: If you have you know it’s painful.
Carlos: A funny story. We’re in two different projects, there are two DBAs where we were both doing something different. The other DBA was actually in England at the time. We were talking and we decided, let’s just rebuild it.We rebuilt the index. Of course it rebuilds the index when the other one was still in play. And we ran out of disk space. [laughs]As the result of the rebuild we put in the request to have it come back. Then we had to roll that whole thing back, and it took longer. People were waiting and not good. [laughs]
Brandon: I’ve seen tables where it would take you days to rebuild the whole thing. Even as an online operation, it’s not just the time. It’s the amount of log generated. Any sort of change like this has to be tracked.
Carlos: Shannon Fawler told me once he was down in Charlotte and he was working with a large warehouse environment. They had a table that needed to be defragmented. Now luckily this wasn’t a “production.” Outside users weren’t using it. It was internal to the business.So he kicked off the re-org, the rebuild on Friday at 5:00 pm before he left. Monday he came back in and it was still chugging away. [laughs]This gets down, everybody’s like, “Hey, sign me up. I want partitioning.” But it may or may not be for you. Not only because you don’t have Enterprise Edition.
Brandon: There are some good and bads with partitioning. The real question is why do you want to partition?
Carlos: I have a quote from Michelle Uffer and she says, “‘I only partition tables that I expect to go quickly.” She says her rule of thumb is greater than 10 million records a week or tables that have regular archiving needs. That’s a lot of data, 10 million records.
Brandon: Well, it could be and it might not be. Here’s the thing.
Carlos: It is in a week.
Brandon: Sure. With 10 million records, what’s the row size? How much space does each row take? Sure, I’ll look at row counts, but it’s not my primary decision-maker when partitioning.
Carlos: It sounds like row size is?
Brandon: Well, it’s not row size, it’s table size.
Carlos: It’s table size.
Brandon: There are at certain points where I know it’s going to be more painful for me to start with that management. I will partition and I want to partition when I know that table is going to become a problem in terms of maintenance, or there’s a clear, defined performance reason. We didn’t talk about performance before, so let me bring it up.There are certain benefits to partitioning — there’s partition elimination, first and foremost. Where SQL Server is smart enough to look at the table, its schema and say, “Well, based on this query…” I’m searching for, say, a date range, it’s the most common partitioning column you’ll find, or partitioning key, I should say. It’s going to be a date column.
It looks at this and says, “Well, I’m looking for data within these range of dates. I know these partitions.”
Say we have five partitions. These two partitions can contain that data. I know the others can’t. Instead, I’m going to focus on those two partitions and forget about the rest of them. This is a wonderful thing if you know you’re going to scan. [laughs]
Carlos: Sure. Then it goes back to the reads, we are bringing in less data and it’s eventually, going to end up in memory because SQL wants to read from there, so we have less resources that we’re…[crosstalk]
Brandon: And your index can be partitioned. When you partition a table, you rebuild the table on the clustered index, you have to have a clustered index there. You have these objects underneath that partition scheme and your partition function, you rebuild on that partition scheme.That has the partition function linked to it, so it knows how to create the buckets and how many recreates and setup, and where to put those partitions. Are we sending it somewhere else, to different file groups or are we putting it on the same? You do both scenarios for different reasons. But you get another benefit, there’s partition locks.
I’m sure every DBA out there has seen an issue at some point where they’ve suddenly escalated from a row lock to a table lock. That’s painful.
Carlos: Now my system has hang up now, right?[laughter]
Brandon: Now that table is unusable.
Carlos: We’re going to wait for the transactions to finish.
Brandon: But when you have partitioning, we essentially have another lock level in place that we can turn on and use.
Carlos: Another option. An escalation path, from row to partition and then…
Brandon: From row to partition and then to table.
Carlos: If we’re using our partitionings well, hopefully we don’t have to go there. Luckily we’re using our keys and our queries and things like that which will be to refer to again, but yes, that’s how we setup.
Brandon: One of the things that you have to consider is that, back to the original question is that, yeah, you want to rebuild that table before it’s a problem.
Carlos: This goes to the planning, right? Partitioning should be in part of the planning process…
Carlos: …not a reactionary process.
Brandon: Yeah. If you’ve got a terabyte-size table, that’s going to take a lot of time to rebuild. Can you take the hit of partitioning that table?
Carlos: Take it offline, make it unavailable. You may not be able to. We’ve talked about situations where that’s not possible.
Brandon: Maybe you get by that. Maybe you can do some hybrid table partitioning, partition view to create a new table.
Carlos: So more complexity. Poor planning just leads to more complexity.
Brandon: Ideally, you want to identify these tables before they become a problem, these tables that have the potential. I would say, generally, when a table size is about 40 gigs. that’s when I start to look and I say, “Is this a good candidate for partitioning?” Sometimes it is, sometimes it’s not.
Carlos: Let’s delve to it a little bit further, identifying. You’ve given us a size metric there, so now it’s kind of on radar, partitioning radar. What would be the next step in selecting whether partitioning is a match?Now, you think something maybe we should probably step back and talk about and that is the partitioning key. This is basically the column that we’re going to slice that information by.
Brandon: Yes, and it can only be one column.
Carlos: It can only be one column. This column then becomes very important.
Brandon: It does. Ideally, it will be one in which it’s going to be part of your queries in there, one of your search arguments, essentially.
Carlos: Does that mean the WHERE clause?
Carlos: Has to be in the WHERE clause.
Brandon: Ideally, yes.
Brandon: Because if it’s not in the WHERE clause, SQL Server is not going to be able to take advantage of partition elimination.
Carlos: To interpret it, the optimizer will be able to say, “Oh, hey, WHERE this? Oh, hey, you know what? Before you should mention that, I’ve got a partition that matches that.”
Brandon: Yeah. If it’s not there, SQL Server can’t use that. It’s the same reason you try to build your indexes around how you are going to search through the data, how you’re going to SELECT that one.
Carlos: To continue on there, it’s not just going back to using DMVs and finding what the most common queries are? Are you going back and talking to your developers and saying, “Let’s look at our big reports.”
Brandon: It depends. I would say that you should understand and see the code. I really want to see the SELECTs.
Carlos: The queries that are coming in.
Brandon: Yes, that are coming in against that table. I’ll pull the numbers off the DMVs, but I also want to talk to [inaudible 18:40] , because I want to know if that’s likely the change.Because if I partition one way and then six months down the road, they’re already planning on doing something X, I want to know about that because maybe there’s an opportunity. Maybe they’re going to change things and it might be better to use a different column for your partition key.
Carlos: I cannot tell a lie here. I haven’t had a ton of experience with partitioning, but it seems like a lot of the problems people run in to is they haven’t had those discussions. The DBAs, the data stewards if you will, they say, “Oh, we have a performance problem let’s partition,” and they do it in a vacuum. Then they are like, “Well wait a second, how come things didn’t get better or minimal improvement?” kind of thing.
Brandon: There is a stereotype of your typical DBA as more reclusive, the bearer of no. Always saying, “No.” Doesn’t really get along with anyone. The problem is in order to be a really good DBA, you have to have those conversations. You have to understand your environment, your developers, how they work, their code. You have to be part of their meetings.Some companies like mine, we break it down. So you have data base operations and data base engineering. Operations is more infrastructure. That’s where I am. But everyone in operations still works directly with developers too. We are getting that information either in person or from the database engineers who are SQL developers.
Carlos: How do you get integration mix there?
Brandon: You have those discussions like communication. You will end up with better systems and less problems.
Carlos: Ultimately that’s what we want. Data base administrators who are responsible for the data and we can create less problems then that’s a win for us.
Brandon: Yeah. Also, one thing partitioning does show you how constraints help your query plans. Because the old way of doing it was that partition views, where you had separate tables and a view where you would union them all together and then you would get one view.As long as you had constraints, the planning what data could be within each “partition,” you could actively get partition elimination. So SQL server looks at the constraints you have.
Carlos: It takes all that stuff, all those little pieces, tries to put together a picture that says, “OK here’s what I think you want.”
Brandon: When I present on this stuff in person like I’m doing today, I always spend time. I actually show partition elimination using partition views so that even the people who don’t have Enterprise have a tool and everyone can see partition elimination.
Carlos: Very good.
Brandon: It’s much easier to see.
Carlos: We will put a link up on the show notes SQLpartners.com/podcast and you can go there to look up today’s episode on partitioning. We’ll make sure we have a link to Brandon’s slide deck if you want to review that.
Brandon: It will have queries in it. You can actually go and see partition elimination. A solved sample data base. Everything you need.
Carlos: Very cool, very cool. Yes, that would be a big help for those who want to try to play around with it a little bit. Again, the most important thing is knowing your data. Know where it’s going, planning for the future and then because you know that you’ll be able to choose a key where your queries are in the WHERE clause, you create that partition and then you are the hero.
Brandon: Yeah, you’re done. You don’t have to modify your queries. Table partitioning is transparent to anyone running a query.
Carlos: Very cool. Well, Brandon we do appreciate it. Thanks for being here. Before we let you go we do want to talk about something.You don’t have it on now, but I know earlier you had your SQL Cruise jacket on. So you are a former SQL Cruiser, right, on the show? I’d like to talk about SQL Cruise, so tell us about your experience on SQL Cruise.
Brandon: I love SQL Cruise. SQL Cruise was a turning point in my career. It really was. I went on SQL Cruise and it’s not just the sessions. The sessions are great, but the people you meet, they change your life.
Carlos: Yep, I couldn’t have said it better. For those who want to check out more on SQL Cruise you can go to SQLCruise.com. Of course, Tim has extended us a hundred dollar discount if you want to go to sqldatapartners.com/sqlcruise we’ll give you some information on how to get that information there.So we do have a couple of standard questions we like to ask here at the end. We want to provide some additional value. Of course the partitioning conversation was very, very good. But we want to know what’s your favorite SQL tool? Paid tool, free tool, but why do you like it and why you had to use?
Brandon: There are so many wonderful tools out there it’s really hard to pick. I was thinking about that question a lot [laughs] throughout the whole conversation. Let me give you some of my favorites. I’ll try to keep it short.
Carlos: Can you give us one and we’ll talk about the others? Is there a top one?
Brandon: Yeah, from a DBA perspective it’s definitely SQL Sentry.
Carlos: The Plan Explorer or the Monitor?
Brandon: It’s wonderful. Plan Explorer is amazing. In terms of free tools, yeah, Plan Explorer is a go-to, easy-to-use tool that you can get completely free. SQL Sentry’s Performance Advisor is awesome.It has pulled my butt out of the fire so many times. Just being able to go down and see all the information you need right in front of you and you go oh, that correlates to that and this is why. I mean I had a CTO actually look at that product at one point and we had an actual production issue.
We were pointing at production. It was once after we implemented. He literally drilled down and he found the problem. He’s not a DBA, he’s not even close to a DBA. He found the problem in five minutes.
Brandon: [laughs] Made a phone call, they stopped running that report, which wasn’t supposed to run then anyways.
Carlos: Life went on.
Brandon: After that, he said, “I now no longer need to know anything about this tool.”
Brandon: He says this was a good buy.
Carlos: This is a good buy. Very good.
Brandon: SQL Pro from Redgate is another necessity if you are doing any sort of T-SQL development, design. It’s a huge plus.
Carlos: Very good. So, our last question. If you could have one superhero power, what would it be and why would you want it?
Brandon: One superhero power. A lot of people would say something like flying.
Carlos: Hey, that’s mine![laughter]
Carlos: Don’t go too harsh on flying.
Brandon: No, no. You know, I don’t need to fly. That’s what planes are for. What I would love is just to be able to just know. For example, when I’m troubleshooting an issue, it would be wonderful to just know about things I don’t have access in to.Knowing that’s in admin, we’ll give you access to his reports. I’d want those. Just knowing stuff like. I’d also probably win the lottery by day two.[laughter]
Carlos: [inaudible 27:58] that’s interesting. Would you know after things happened or do you have some insight into the future?
Brandon: To be honest, to be able to just see and know at that time, which is why tools are wonderful things because they help assuming you have access to them.
Carlos: There you go.
Brandon: Even just to understand and feel what other people are kind of feeling at that point in time, because that can help you in communications so much. I don’t need flying. I want knowledge.[laughter]
Carlos: You are much nobler than I am Brandon.
Brandon: It’s not so noble, trust me.[laughter]
Carlos: Well we do appreciate you being on the show today. It was great to have you.
Brandon: I’m honored.[music]
Carlos: Thanks again and compañeros, be sure to check out the show notes sqldatapartners.com/podcast for all updates and what we talked about today and we’ll see you on the SQL trail.