Are you experiencing challenges managing Very Large Databases (VLDBs), or anticipate challenges with future growth? In this episode, we are joined by Aaron Hayes to discuss the practical advice on managing very large databases in SQL Server, focusing on common problem areas we have found, along with our own experiences. Topics include storage layout decisions, unique tuning challenges, HA/DR implications, and database restores, along with the challenges of getting backups, re-indexing and DBCC checks into your maintenance windows. No comparisons here — we are sure you will enjoy the discussion regardless of your database size.
“It was said to me probably in the best way, that anything that has to deal with size of data operations is going to affect you and can be considered a large database.”
“We ship the logs over to the local standby. We schedule [our applied process] every 12 hours or however long we want. If something comes up, we can stop that apply process, restore to a point in time, grab the table and then copy that over.”
“Once you get to being a large database, you really have to cater around the business value that database adds, the case you can make for your SLAs on it, and then with that, you can adapt it to be the best solution.”
“Just because you think you have a large database, until you feel you’re going to exceed your maintenance window, keep all of your maintenance plans standardized across your Enterprise. Don’t try to overdo it until you need to, and at that point, have a good strategy for it.”
Listen to Learn
00:40 Intro to the guest and topic
01:52 Compañero Shout-Outs
02:27 Public Service Announcement
03:00 What constitutes a very large database?
07:32 Things you need to keep in mind when you have a very large database
10:29 Object level restores without third-party products
12:22 Having a full copy of your very large database means paying for lots of storage
14:29 Pros and cons to SAN snapshots and third-party backups
19:35 Maintenance changes you might have to make with a very large database
22:51 A little bit about statistics
24:00 Knowing your data is the most critical piece
26:55 Just how big are Aaron’s databases?
27:56 Performance changes that occur with large databases
31:27 Migrating and upgrading that very large database – it’s a team effort
33:45 Last thoughts and advice about very large databases
34:58 SQL Family Questions
39:32 Closing Thoughts
About Aaron Hayes
Aaron is a DBA manager who spends his days furiously googling answers to Oracle, SQL Server, and Hadoop questions. After spending all his time in college learning Java, he applied for a developer job and was immediately handed a SQL book. He’s worked as an Oracle developer, lone DBA, system integrator, and data architect. When he isn’t trying to figure out how to stay up to date with technology, he can be found running on the trails around Chicago or trying out the new restaurants in his neighborhood.
Music for SQL Server in the News by Mansardian
Carlos: Compañeros, welcome to another edition of the SQL Data Partners Podcast. My name is Carlos L Chacon and thanks for being here, wherever you might be. Maybe you’re walking the dog, you’re headed to work, headed home from work, that’s putting in a little extra time, so thanks for bringing us along. We do appreciate it. So today, we’re talking about Very Large Databases, and our guest today is Aaron Hayes. Welcome!
Aaron: Thanks for having me.
Carlos: Yeah, all the way from the Windy City, I know we were talking and we were trying to plan this, you guys did not have any temperature, you guys were in the negative. Now, hopefully the sun’s come out and you’ve thawed out from up there, so thanks for surviving.
Aaron: Yes, we are barely above.
Carlos: And being on the program. And I’m also joined by Eugene Meidinger. Welcome, Eugene.
Eugene: I am down in Florida this week, enjoying multiple temperatures, lots of temperatures.
Carlos: Oh, that’s right.
Eugene: I’ve got a working vacation, visiting some family and getting some sunlight, because Pittsburgh is not known for its sunlight.
Carlos: That’s right, there you go. So there you go, compañeros, we have a hard-hitting podcast panel, willing to come in on vacation to do this podcast.
Carlos: So yeah, this is going to be interesting. So again, we’re talking about Very Large Databases. Before we do that, however, we have a couple of shout-outs and you’ll forgive me, as always, compañeros, for not getting these quite right, but I am going to try to do the best I can. I’ve got Antoni Nowak, April Smith, Ajit Chaudary, Umesh Yadava, Debabrat Mohanty, and Graves Kilsgaard. He’s giving the podcast a little bit of love. Wants to connect, and hopefully, Graves, we can connect with you sometime on the SQL Trail, in real life, in the future as well. So thanks for reaching out, and glad to know that you enjoy the podcast and hope you continue to do so. Traditionally, we do a little SQL Server in the News, but today, we’re actually just going to do the SQL Server Public Service Announcement. So for all of you compañeros out there who may be tuning in, maybe you’re new, who knows, right? SQL Server 2008 End of Life is this July. I haven’t calculated the days yet, but we’re almost to that point where we can calculate in days. So if you haven’t had that conversation with your boss just yet, now might be the time. As soon as this episode is over, obviously, then go talk. Okay, so for the show notes for today’s episode, we are going to be at sqldatapartners.com/vldb, for very large database, or at sqldatapartners.com/163. Okay, so Aaron, let’s go ahead and jump into this conversation. I guess before we do, I do want to attribute this quote to Todd Kleinhans, I hope I said his name right. I apologize, Todd, if that’s not correct, but he has this great quote, and he says, “everything just takes longer than you expect it to, with very large databases.”
Aaron: Yeah, I think that’s true.
Carlos: Yeah. Now, earlier, we were chatting and the question was, what are we going to call a very large database? In the Size Matters category, lots of people have different opinions about what constitutes a very large database. Let’s go ahead and hear your definition, Aaron, of what your parameters are.
Aaron: Yeah, so I think it does vary and I think it is specific to some of your use case. So I think if you look online, you’ll find that it’s anything greater than one terabyte or something you can’t fit in your home lab. I think that’s one part of it, but I think going from there to really anything you have to take extra care with your backup and restore, your maintenance or your monitoring your performance. It was said to me probably in the best way, that anything that has to deal with size of data operations is going to affect you and can be considered a large database.
Carlos: Sure. You hear a lot that one terabyte is kind of the threshold now. Of course, now with database sizes getting larger and larger, more and more people are getting to that point. But I also think, you know, another thought that was there is if you have to change the way you do things for a certain database, that’s also kind of a measurement of, well, this one’s different, for whatever reason, right?
Aaron: Yes, I think definitely. I think some of the stuff you might ask yourself is how many letters of drives can I put on my machine when your storage admin starts telling you what’s happening when you run your backups. Those things all kind of start to fit in that category of a very large database. And I think another metric is any time you have a really large database, you’re going to find yourself talking to other teams and really having to understand how it interacts with everything around it. Because, when it gets large, it does impact other systems, so your IO subsystem, your network and at some point you’ll probably find yourself talking to the finance people about how much space you’re using.
Carlos: Sure, so almost a noisy neighbor type idea, right?
Aaron: Yeah, I think it’s critical to the success of a large database to be good friends and understanding of how the things around it work.
Carlos: Sure, now it may go without saying, but I think we’ll point it out, is that when we’re talking about very large databases we are not talking about big data, so it may be less common. Now data science is kind of the new catch term everybody’s kind of chasing, but I know when big data was at its prime organizations would ask, “do we have big data?” I think for our purposes, we are talking about data in SQL Server, that’s kind of our focus, and so generally we think about big data, we’re talking about unstructured data, and the Hadoops and the Mongos of the world, and those types of processes. So while those data sizes can also get large, that is not the conversation we’re having today.
Aaron: Yes, no I definitely agree. I also think with the large database that there are kind of two types. One is going to be a large database with high throughput, but you also have this other category of just an archive database where some of the things we talk about will still apply, but things like performance aren’t going to apply as much.
Carlos: Sure, yeah, because you’re just looking to save it for whatever reason. So regular maintenance requirements or audit purposes, what have you, but the everyday transactions aren’t going through that system.
Aaron: Yes, right.
Carlos: Okay, so we talked about size, volume, variety and velocity could also be another one? I guess you kind of touched on that, so like how frequently this database is getting updated or read from. Sometimes the storage space may not be very big, but for whatever reason, I have larger data sets that I have to deal with, and that could also then potentially impact or get into some of the pieces that we’re going to be talking about today, as well, right?
Aaron: Yeah, definitely.
Carlos: So then let’s go ahead and jump into a little bit of that. How am I affected from an administration perspective when I have a very large database? So what kinds of things do I have to think about or keep in mind?
Aaron: Yeah, so I think one of the core responsibilities of any DBA managing any database is keeping the data safe and keeping it available. So keeping it safe is, of course there’s a security aspect to it, but really focusing on backing it up and then more importantly, and I think what changes the most with a large database is the restore process. So, you know, I think the backup is actually easier and still fits into some of the same categories as a smaller database, but the restore I think is really where it gets challenging.
Carlos: Right, and another challenge there with that process is that the backup times, I can monitor those, or I can measure that. The restore process is something that has to be generally executed. A decision has to be made, “hey, I need to restore this,” and so then capturing those processes can, I’m not saying it’s hard, necessarily, but you have to take that into account as well. So, knowing how long it takes to backup, fairly straightforward. But knowing how long to restore can be more challenging.
Aaron: Yeah, I think it goes back to that idea of the size of data operations. So you’ll find most companies will have an extra 500GB sandbox or VM sitting around that you can do your restore on, but if you get to a 15TB database, suddenly having one of those around just to run a restore on or just to recover a single table, those things change the conversation.
Carlos: Right. Now, I guess we mentioned at the beginning of the program, the need to change operations due to the size of the database, and I think one of those restores and we’ll see if it makes it in, but I know currently you have to go to a third party, for example, if you wanted to restore a single table, as an example. So that might be a reason, if you have a need, or if you’ve found that you can’t do the whole restore, but for whatever reason this high transaction table or whatnot, somebody makes a mistake, you’re having an issue and you want to be able to restore that one, that might be one small example of having to do something slightly different for that database.
Aaron: Yeah, so I think there are utilities of course that can read from the logs or read from a backup file and try to pull things out. And I think it comes down to what the value is to the business on how important and what the SLA is to get that table restored back. Because I think on a restore, the questions that are usually asked are, “hey, I want to see what the data looked like yesterday” or “I want to know what table permissions this had in the database.” Not, “I need to completely restore the database,” because as you get to high transactions and high throughput, just to restore a table, you can’t just wipe out an entire days’ worth of transactions that might be valid.
Carlos: Right, right.
Eugene: So have you used any third-party utilities to be able to do those object level restore?
Aaron: I don’t have any experience with the third-party utilities. I’ve used two ways and I think are everything gets specific again when you get larger databases, but the two that I’ve used is with our large databases, we’ve had a local standby, and we’ve done delayed log shipping.
Eugene: Oh, that’s interesting.
Aaron: So we ship the logs over to the local standby, so our applied process, we schedule every maybe 12 hours or however long we want. If something comes up, then we can stop that apply process, restore to a point in time, then grab the table and then copy that over.
Eugene: Oh, that’s really smart. So since you’re shipping the logs regularly and you’re just delaying the apply’s, you still have a decent recovery point objective. You’re not risking a lot of data loss, but you still have that kind of lagging snapshot, so to speak, rolling snapshot of what the database was 12 hours ago? Am I understanding that correctly?
Aaron: Yeah, yeah.
Eugene: That’s awesome.
Aaron: Yes, so we can do the restore, and then we’ve actually kind of evolved that process, so we have recently switched now to include that in our availability group. And we’re actually taking database snapshots at increments and then we can query those to get the table data back. So we find that that uses a lot less storage, and we can now offload our full backups from our primary server.
Carlos: Sure, yeah, cause the snapshot’s only containing the delta from when you initiate it.
Aaron: Right, so very similar strategy, but just different type of implementation. One using the database snapshots with kind of like a rolling increment and then one using transaction log.
Carlos: There you go. See, that’s why we have you on the program tonight, Aaron, because you’re coming up with way better solutions than “yeah, go buy a third-party product.”
Eugene: Well, I wouldn’t have thought of that, honestly.
Aaron: Yeah, I mean you do need another full copy of the database.
Carlos: Right, there is that.
Aaron: But the downside of it if we have a 15TB database, we now have two of those. But again, I think a large database, you will find that it isn’t like a one-size fits all, that these are things you do for a large database and these are things you do for a small database. Once you get to being a large database, I think you really have to cater around the business value that database adds, what the case you can make for your SLAs on it, and then with that, you can adapt it to be the best solution.
Carlos: Sure, and generally, at that size, now it’s easy for me to say, in the comforts of my office, here, you know, hypothesizing, about what other’s infrastructure might look like. But at a certain size as well, when you think about aging, not that they want to keep the old SANs around forever, but that might be an opportunity, potentially, to use some of that older, slower discs for some of that storage as they replace them, or what have you. Assuming you have the space for it and all of that stuff.
Aaron: Yeah, absolutely. And I think with that, you really think more about a restore process with a large database than a backup because I know generally I’ve always thought, oh, I’ll keep 7 days or 14 days and when you get to large backups, you find that your retention period will be much less, because the answer is kind of, “well, I’ll never go back 5 days. I’ll have lost 5 days of transactions and it’s a huge database, so I can’t afford that.” Really having those creative restore processes is, I think, critical to feeling comfortable with it.
Carlos: Oh, what’s that email you’re getting? Flash sale for Azure Storage? You know, it’s terabytes of– available to you, you know? Yeah, I mean it’s the management on it.
Aaron: Yeah, and with the cloud strategy, I think that’s also a viable option. So we have some other systems that we’re not willing to pay for the compute of having a similar sized, but we are willing to pay for the storage. And if you can get the storage into the cloud before you need it restored, so if you can stream the log files up there, then you can use the on-demand feature of the cloud to spin things up, grab the table data and then shut down the VMs. So if you have a cloud footprint that is another option, if you’re willing to pay for the storage but not the compute.
Carlos: Now, one of the typical things that we get asked about or we talk about with very large databases is the SAN administrators, they always want to talk about SAN snapshots or doing it that way. So pros and cons to the SAN snapshots?
Aaron: Yeah, so people do look to the storage arrays. So storage arrays are good at storage and arguably better than a Windows server, which is still kind of been catered towards having physical discs. And I think one of the key questions when looking at that, really, is the responsibility. So if everybody is comfortable with the question where someone says, “can you restore this”, and your answer is “go talk to the storage team”, and everybody agrees with you, then it’s something you can pass the responsibility off. However, if it still relies on the DBA team to keep the backup safe, then hopefully you have a close relationship with your storage team, because you don’t have much visibility into a storage array snapshot or a backup.
Carlos: If you guys can’t get along, while you eat lunch together, then you have a problem.
Aaron: Yes, definitely. So, a few of the pros/cons. Pro is that it can be instantaneous. A lot of storage arrays will do their backups asynchronously, so they will use their copy on write technology so you can say “run a backup”, it’ll happen in a few seconds and you’re done. All the processing has happened on the storage array and not on the server itself. A negative you would have is that it only knows about a point in time of when you take that snapshot, so there’s no SQL Server point in time restore. There’s only a true, once a day, at this time.
Carlos: Right, so whatever time the snapshot occurs, let’s just say to throw out an example, 11am, but what we’re trying to get to is 10am or 8am, then we’re out of luck.
Eugene: More than that, I feel like I’ve had run-ins with SAN snapshot backups, but I very well could be wrong on this piece, but does it insert itself into the backup chain as kind of recorded as a full backup? Is that correct or am I confusing with other third-party kind of backup operations?
Aaron: I can’t answer definitely, but the snapshot of the storage LUNs and copying them off is a feature that’s been around for a while, it’s just a native part of what any Enterprise storage array will give you. I do know that there have been conversations and it’s worth checking with your storage team that a lot of the vendors now are becoming more closely integrated with SQL Server. So they’re either using agents on SQL Server so that they can coordinate that, so SQL Server thinks it’s a full backup and a storage array knows where that is and it’s a pointer. So, you see it as a full backup on the server, storage array knows that this is what it points to. There definitely is getting to be more association between the SQL Server and the storage arrays to make those backups more seamless.
Eugene: Okay, because I know something that’s definitely a problem sometimes with general third-party backups, if they’re not configured correctly, is that you’re taking a full backup and it’s recorded as such, and now you’ve broken the backup chain because that full backup is off on some virtual drive. So the file location starts with some long GUID and then you want to be able to do a point-in-time restore and not only does the third-party backup not necessarily support that, but because you’ve broken the chain, you can’t take advantage of the log backups that you’re doing. So yeah, I was curious about how that handles it, but I’m glad to see they’re getting better with the integration for that.
Aaron: Yeah, it really, I think, comes down to a question of responsibility of if you’re responsible for the backup and you’re comfortable with it and have it fully tested, then I think it’s an option. But if your storage array is managed by a third-party vendor and you’re not exactly sure what they’re doing, and you’re not positive you can restore it, then I think it’s going to be a risky endeavor.
Aaron: The storage array backups that I’ve generally worked with, where we have the entire SQL Server on a shared storage and they will actually just swap out the LUNs underneath it. So you’ll stop SQL Server, swap the LUNs, start it back up and suddenly SQL Server thinks it’s back in time.
Carlos: Yeah, and so it is interesting, and this can be I guess to your point about working together, one of the problems that we generally see is that for whatever reason, the storage team and the DBA team are on different teams, like they have different managers, and whatnot and so I think a lot of it just depends on buy-in from the groups and almost politics, a little bit there, and so those are definitely things you want to have. But I think it’s also important, because it now is available and as data’s only getting bigger, we’re getting more and more of it, that it does make sense to then have that conversation to be like, “well, hey, maybe it makes sense to shift this over to the other team and get sign-off there.”
Aaron: Yeah, definitely, and I think storage arrays have a lot of benefits that you don’t see at the application or the SQL Server level. So things like dedup and ability to take it asynchronously in a few seconds are things that SQL Server, at the application level just isn’t able to do, because it doesn’t have that visibility.
Carlos: Right. Okay, well, I think that’s a pretty good primer on backup and restore processes. Another area that admins have responsibility for is in maintenance. What kind of maintenance changes might we have to make with very large databases?
Aaron: I think maintenance is an area that is probably the second largest differentiator between a large database and a smaller one and at the end it comes down to probably what your maintenance window is. So I think just because you think you have a large database, until you feel you’re going to exceed your maintenance window, keep all of your maintenance plans standardized across your Enterprise. Don’t try to overdo it until you need to, and at that point, then have a good strategy for it. So if you can still fit it within your normal maintenance window, and it meets the requirements, my recommendation is always, “keep it simple, keep it in line with all your other maintenance plans and go for it.” Once you get outside that maintenance window is when you have to start piecemealing it and looking at different options to get it all completed.
Carlos: Okay, so one of the big pieces that, obviously backup is kind of in the maintenance window, that’s one piece we already have touched on. The other piece would be index maintenance. So what kinds of strategies do you recommend, or have you had to look at when we think about index maintenance? Or statistics maintenance? I kind of combined those two.
Aaron: And many times, they are. I think yeah indexes, backups and then the DBCC are probably the three big things you’re looking for in the maintenance. And so, with indexes, I think what you’re going to find yourself doing is some kind of a pattern, so you’ll probably find yourself doing more reorgs than rebuilds. Some of it will come down to what version of SQL Server you’re running and what year you’re running. So with rebuilds, online and offline would play pretty heavily into that. To be rebuilding offline for a few minutes is different than rebuilding offline for a few hours on a really large table. You know, one of the strategies I like to use, and again it varies between business requirements, is to use my maintenance window every night that I have it and actually doing reorgs, which will kind of keep the progress as it goes, and then just terminate the reorg when the maintenance window is over, and so I like to split the maintenance up into smaller chunks that happen more concurrently across the days. And then on the weekends is when I’ll try to get a larger maintenance window to run any rebuilds or larger tables that have been kind of pending.
Carlos: Right. Now I feel like at this point we’ve started the transition to “hey, this is what you’re paying the Enterprise licenses for, some of this flexibility” and this is indeed where that feature set can shine. And so some of these options, being able to have online processes and breaking up those reorgs and things like that. Yeah, I think we are kind of getting into Enterprise features at this point.
Aaron: Yeah, and I think if you have a large database that’s an archive one, your box itself might not be that large, but if you have a large database that’s very high throughput, you might even run into things like I believe Standard still has a limit on the number of cores in other items.
Carlos: And memory.
Eugene: Memory, yeah.
Aaron: Yeah, so I think you’re definitely in Enterprise depending on the size that you’re in.
Carlos: Right. So that’s on the index side. So then let’s talk a little bit about statistics. Now again, as we get into newer versions of SQL Server, there was a trace flag that was added that’s now by default, but if you’re still in 2012, 2014, you may have to think about.
Aaron: Yes, so of course, in SQL Server, you have the Auto Update Statistics, which as with any feature has varied discussions around it, but I think the general consensus with the auto updates was, I believe it was before 2008, it was in the trace flag done in 2016 was by default, but before it was, 20% of the table would trigger an auto update on stats. Now that’s varied, I believe, into kind of different tiers and buckets with the new trace flag, but that comes into play with a large database, because 20% of a 10,000 row table is a lot different than 20% of a 10 billion row table.
Carlos: Right, right.
Aaron: And so you have to go through testing to see what works best for you, but I’ve seen both where companies will rely on the auto updates and some will say, “I don’t want any auto updates because I know the database well enough and I’ll run them in my maintenance window.”
Carlos: Right. So there, we come back to knowing your data is still the most critical piece of all of this. That ability to know your data can help make your decisions much easier, because you know the ‘why’ behind what you’re doing or how you’re trying to attack it.
Aaron: Yeah, I think absolutely it does and you know I think it is also a challenge with a large database that they’re tougher to figure out and they’re generally inherited. A large database is usually built over time and it’s something that you’ll hear around the office or around the team of, “oh, it’s that database” or “oh yeah, we have to run the maintenance on this one” or “oh, the backups are taking longer on that database”. It takes longer to understand the data. I think it’s still always important as a DBA to do that, but most large databases, people don’t start out to say, “this is going to be 10 or 15TB”. So I think a lot of times, the stuff you read online is great when you’re architecting from scratch, but I think that you’ll find that some of these piecemeal restores and items that are great and good things to go towards, you won’t find in a lot of company’s large databases.
Carlos: Sure, we were talking about Enterprise features, so even partitioning, to name one. I know of an organization that basically the database had kind of this hockey stick growth, and all of a sudden, the database was just too large and too important, that they couldn’t do the necessary partitioning, because it was already running. They didn’t have the downtime window to be able to do that. That’s kind of an example of something like that, where it’s, “oops. Would have been nice to have, but we don’t have that, now.”
Aaron: Yes, absolutely, and I think that’s where getting creative with the maintenance windows and always trying to learn as much as you can, but kind of understanding that it is tougher to learn a large database from scratch than it is one with a few hundred tables.
Carlos: Sure. Hopefully, hopefully nobody’s out there cutting their teeth on a multiple terabyte database. That would be, yikes. If you are, let me know. Call us. If for no other reason than to commiserate, come talk to us. Now this is years and years ago, but I remember working on a team and that very large database, they actually had a single DBA really for that database. I mean, he’d help out and whatnot, but that was kind of his main responsibility, and then there were two or three of us and then we kind of took care of everything else. I can’t remember how many instances there were, I’m just going to throw out 50. One was under the watch of one specific DBA and then the rest of them we kind of all shared.
Aaron: I think that’s maybe not one DBA, but I think there is, any time you have a large database, it’s always kind of set aside from an administrative standpoint of, “okay, things are a little different on this one” and I think that’s when you know you have a large enough database.
Carlos: Sure. Now, I guess we’ve been kind of keeping this a secret and I know everyone at home listening is wondering, “well, gosh, Aaron, how big are your databases?” So Eugene, I didn’t think that was bad until you snickered. I’m like, “oh, did I ask that in the wrong way?”
Eugene: I’m just saying, it’s just a comparison contest, huh?
Aaron: Yeah, there’s always someone bigger, Carlos.
Carlos: Everybody wants to compare their biggest database, that’s right.
Aaron: The largest database that I have to manage is about 15TB, so it’s a single database. You know, and yeah, it’s definitely presented a lot of the challenges that I’ve talked about. Some of the weird numbers we have, our log file, for instance, is 3TB on its own, that we have for the database and we have almost 600gigs dedicated for the temp space, and this is on an instance that only has the one database. Even our data file growths are, I think last I looked, 15gigs per data file and so you’re talking larger numbers like that when you have a large database, for sure.
Carlos: Right, so then from a performance perspective, other implications? We’ve touched on SAN, that’s probably one indicator or one piece of it. Your hardware changes at that scale, but maybe let’s talk about some other performance-related changes, or pieces you’ve had to implement as a result.
Aaron: Yeah, I’ve always been on the fence. I think from a performance standpoint, a lot of the same stuff you would set up normally applies just as equally to a large database. So sizing your tempDB files correctly making sure nothing else is running on the box, sizing your memory correctly. All of those things that are kind of just becoming common sense, I think, still apply that are no different. And if I had to sum up something that I think makes the performance more difficult is with a smaller database, you might get to run a full maintenance for the whole database in one night and so would you know every day stats were always updated last night after all the transactions came in. And I think I see more performance issues in a large database where the stats maybe only run once a week in your maintenance plan, and so you’ll see great performance the day after that table was gathered, and you’ll see deteriorating performance after that. And that’s where really knowing your data and knowing how to manage, like, “okay, this is a high changing table, this one needs to be in the maintenance window every night, and this one isn’t, so we can do it only on weekends.”
Carlos: Right, again, going back to knowing that data, and then making the decision so like we talked about putting variations on that administration piece. Yeah, interesting.
Aaron: Yeah, and so I think you know, a lot of the other performance stuff is probably the other piece of advice or thing I found with a large database that has a lot of activity, is that things will happen faster. So if you have a session that’s at the root of a blocking chain, a lot more sessions will queue up behind it a lot more quickly. And so having your things in line of who is active and having a third-party tool for monitoring, Redgate, SolarWinds, whatever it is. Having those baselines and knowing like something is off, because there are so many tables and procedures and things running. it’s not small enough to just know, “oh, it’s that one table”. You’ll generally compare more baseline to baseline.
Carlos: Sure. Yes, so getting a feel for, “yeah, she’s running a little hot, but that’s okay,” to “oh, now I’ve crossed a line. Now it’s time to do something.”
Aaron: Yeah, so I would say I monitor performance more like at the database level, saying, “okay, everything looks like it was against the baseline,” versus query by query.
Carlos: Okay, that’s interesting. Maybe let’s dive into that just a little bit more. I guess we mentioned third parties already, but then maybe look to like DMVs or you know, some other things. You’re saying you’re not necessarily looking at execution plans, per se, so I guess to put you on the spot here, a couple of things that you talk about like at the database level, can you give me a couple other examples?
Aaron: Yeah, so I mean this is kind of probably based primarily on what tool you use, but you know baselining, okay, CPU should always be around this at this time for the past three weeks, historically. Or okay, this is how many log flushes we get, or this is how many compilations or batch requests we get. So kind of really looking at like metrics and things around the database to first identify, okay, what has changed and then starting to dig in more to you know what query has gone off in executions or down in executions or changed.
Carlos: Right, right, right. Okay, yeah, good deal. The last piece that I guess we should talk about, which is a bit of a smattering of all of this is migrations and upgrades. So it’s only getting faster, the versions that SQL Server’s putting out. How do you handle your upgrades? We just had a previous episode not too long ago with Andy Levy talking about migrating a single instance with a lot, a lot of databases. So this is, I don’t know, maybe a cousin of that problem. Now I have a single database that’s very, very large. Yeah, so thoughts about moving that around?
Aaron: Yeah, so I think you know what Andy talked about, a lot of that with using a storage team and kind of looking at what’s around you to understand what are my network speeds, what are my storage speeds, do I have extra hardware to do a side-by-side? All of those things generally require more people and more communication. And so I think what you’ll find when you’re migrating or moving a large database is that it isn’t a DBA task, it’s really more of a project. It’s gathering people around, saying, “okay, we’re leading this one, but here’s what I’m going to need. I’m going to need this much more storage. I’m going to need this hardware procured. Do we have budget for this?” Because all of these things now become substantial. They’re not just, “okay, here’s your standard build that we can give you.”
Carlos: Sure, sure. We didn’t ask Andy about his pizza money. How big his pizza fund had to be for that migration project. But yeah, no, no that’s right. It comes back to we’ve said it before, getting along with the other team, because it does become a team effort at that point.
Aaron: Yeah, so I think you know you rally everybody around and you come up with a strategy and then you look at how you want to implement it. So the storage team can probably give you some options of, “hey, we can detach your LUN snapshot on, then move them over and de-present them.” You know, as DBAs, a lot of times you’ll see if you have a low number of databases. In my case, we have just one database on the instance. We can use a side-by-side and do transaction log backups to minimize our maintenance window. There’s also then on the Windows team side, they can do cloning and potentially stream it across. So each team kind of has different tools available to them and I think it’s just pro-ing and con-ing how that is against the maintenance window, the risk levels, and who’s going to own what pieces of it.
Carlos: Right. Okay, awesome. Eugene, other thoughts?
Eugene: No, I think this is definitely very interesting and it’s really interesting how there’s a lot of stuff that is kind of normal and just rudimentary database administrative work, that once you get to a certain size, just starts to break down and you really have to think about things differently. So it’s a very interesting subject.
Carlos: Sure. So Aaron, thanks for this conversation. Last parting words of wisdom when it comes to very large databases?
Aaron: Yeah, probably last piece of advice is always think about what you do. So normal things like, “oh, I’m going to rebuild this quick” that might work on other databases, suddenly you find yourself with a giant index rebuild happening offline and you’re not sure whether to roll it back or let it finish.
Carlos: Finish, yeah.
Aaron: So you know, just plan things out and make sure you have things in line.
Carlos: Right, there you go. Good advice. Or, you can do that and come back on this show and talk about the bad choice you made. How many transactions did you block by making that choice? How many VPs did you have in your office while you were waiting for that rollback? Awesome, should we go ahead and do SQL Family?
Carlos: So Aaron, your all-time favorite movie?
Aaron: Yeah, so this is maybe a lesser known movie, but it’s called Fantastic Mr. Fox.
Carlos: Oh, and so where’s Kevin when we need him?
Aaron: Yeah, but I think I found it on Netflix one night and I don’t know, just enjoyed it.
Carlos: There you go. Very nice, okay. So, the city or place you most want to visit?
Aaron: I think Seattle has always been a favorite of mine. I’ve been there before and have always enjoyed going back. Not quite sure what it is, but I’ve always been a big fan of it.
Carlos: Maybe all the Starbucks aroma?
Aaron: Sure, and I get more work done when I’m there.
Carlos: Yeah, there you go. So, a food that reminds you of your childhood?
Aaron: It’d probably have to be lasagne. You know, I just remember like weekend meals and mom making lasagne.
Carlos: There we go. Now is this like a certain kind? Meat? No meat? Lots of veggies?
Aaron: I don’t know, meat, lots of meat.
Carlos: Lots of cheese.
Aaron: Lots of cheese.
Carlos: There you go. Very nice. Yes, good deal. Okay, so now we saved this to the end and I didn’t tell the listeners here at the beginning, because you know how sometimes my compañeros get, but you’ve worked quite a bit with Oracle as well as very large databases. So you’ll forgive me for ratting you out there, here at the very, very end, but my question is how did you get started with SQL Server?
Aaron: Yeah, so you know I actually started with Oracle and SQL Server at the same time, so I’ve always been working in mixed shop environments.
Aaron: And yeah, I went to college to be another mediocre Java programmer in the field, did my first interview, they asked if I knew SQL and I said, “no” and they said, “you’re hired”. And then from there on out it was SQL and Oracle and databases.
Carlos: Wow, very good. So now, did you have to you know, learn your chops on a very large database or did they give you smaller databases to get started?
Aaron: It was not that large. Fortunately, it was always newer. I think maybe 2008 was the first edition I had to work on.
Aaron: It had a year in it. It wasn’t just a decimal, so that was good.
Carlos: There we go, so now again, having someone who you know, is a bit stronger with other database technologies, I know you’ve compared and contrasted a bit more. We’ve talked offline about this. So, if there is one thing you could change about SQL Server what would it be?
Aaron: Yeah, so I think probably one of the frustrations I run into when I’m kind of architecting or looking at new systems with SQL Server is that I wish that it was more separated now, since I think each of the components of the SQL Server SKU have gotten so much stronger. So I’d like to see like a separate SSRS licensing model and maybe an SSIS. I think right now the majority of them are still packaged under the same agreement.
Carlos: Sure, sure. Yeah and I think slowly but surely, we are going to get there. So, with 2017 now being a separate install at least with Reporting Services, we could start to see that a little bit, yeah. What’s the best piece of career advice you have received?
Aaron: This came pretty early in my career I guess, as I was, like I said, spent my whole college in doing Java and Object Oriented and was pretty overwhelmed with this idea of SQL and what it was and all of these smart people you read about online. And the advice I was given was that no one great was always great. And that everybody kind of starts from the bottom at some point.
Carlos: There you go, yes, that’s right. And some of us are still at the bottom.
Carlos: Luckily, we get to rub shoulders with people like you, Aaron, right?
Aaron: It might be the other way.
Carlos: Our last question for you today, if you could have one superhero power what would it be and why do you want it?
Aaron: That’s a tough one. It’s kind of everything you always think about, you know, and I went with teleport. I spend so much time in airports and just getting places that if I could remove that, it would just be that much easier.
Carlos: There you go. There you go, compañeros. and see, we’ve managed to get through this whole episode, we mentioned at the top of the program, you being from Chicago, I thought for sure a Cubs reference was going to make it into the program today, and it didn’t, so I have to bring it in here, at the very end.
Aaron: Yeah. No (unintelligible).
Carlos: Anyway, so Aaron, thank you so much for being with us today. We do appreciate it. Great conversation.
Aaron: Yeah, thanks for having me. Thanks for having me.
Carlos: Okay, compañeros, that’s going to do it for today’s episode. Our music for SQL Server in the News is by Mansardian used under Creative Commons and of course, we’re always interested in what you have to say and maybe we’ll even give you a shout-out. You can connect with us on social media. Aaron, how can folks connect with you?
Aaron: Yeah, you can find me on LinkedIn. I’m Aaron Hayes.
Eugene: Yeah, you can find me on Twitter at sqlgene and I blog at sqlgene.com.
Carlos: And compañeros, you can connect with me on LinkedIn at CarlosLChacon and we’ll see you on the SQL Trail.