Episode 03: Backup and Recovery

Episode 03: Backup and Recovery

Episode 03: Backup and Recovery 560 420 Carlos L Chacon

I chat with MVP Tim Radney about Backup and Recovery options in SQL Server.  We chatted for quite awhile and I ended up making two episodes out of our conversation.  This episode focuses on why we backup and some core ideas.  Advanced features will be in episode 04.  You can register to win a copy of Tim’s book on Backup and Recovery by visiting sqldatapatners.com/podcast.  Also be sure to check out the very cool offer from SQL Skills at the end of the session.

Show Notes

Tim on Twitter
SQL Server 2014 Backup Recovery Book
SQL Cruise

Carlos Chacon: This is the SQL Data Partners Podcast and my name is Carlos Chacon, your host. Today we wanted to talk about backups and restores. My guest Tim Radney and I had a great conversation about this. Our conversation went a little bit long so we’re going to break this conversation up into two episodes.

Today’s episode is going to focus mostly on the reasons for backups, the different types, and why we do them of course. Episode four is going to focus on some of the more advance concepts as far as encryption, compression, and backing up into Azure.

The folks at SQL Skills have made us a great offer which we will let you know at the end of the session, so we hope that you’ll check that out. We have a special offer going, we’re actually going to be raffling off the book that Tim wrote about backup and restore options in SQL Server. Again, all that information will be coming at the end of both episodes. We hope you check those out and again, welcome to the show.

Carlos: Tim is a consultant with SQL Skills and has been awarded the Microsoft MVP Award. He has written a few books and if you fancy a twitter chat with him you can find him at @tradney. He hails from Fortson Georgia where he serves as the past chapter leader of the Columbus SQL Servers users group. Tim welcome to the program.

Tim Radney: Thank you, Carlos.

Carlos: Thanks for being here. Ultimately today we wanted to talk a little bit about back-up and restore operations, primarily in SQL Server as this is going to be SQL Server focused. One of the reasons is wanted to have you on the show today, Tim, was because you have written not just one but actually a couple of books on back-up and recovery. Your most recent book is entitled SQL Server 2014 back-up and recovery techniques and it is available from Amazon.I’m also super stoked to announce on our podcast that we are going to give away 5 copies of the book. If you would like to register for you chance to win you can go to sqldatapartners.com/podcast and under the information for today’s session you can fill out the form with your email address with a chance to win. We will have the drawing on August 31st. Again, that URL is sqldatapartners.com/podcast and you can register under today’s program information for a copy of Tim’s latest book.

Today our discussion will focus on backups. One of the things that we need to keep in mind, we need to keep the end in mind. It would be incomprehensible for us to talk about backups without first talking about recovery.

Tim: That’s exactly right.

Carlos: While we do take backups, we take them so we can recover. There are the two little acronyms we’ll go over quickly about recovery. RTO and RPO. I’m curious Tim, when we talk about these acronyms, first let’s define them.RTO is your recovery time objective. How long you think you’re going to take to get back up. Then your RPO is your recovery point objective. What point are you going to be able to get to in the event of a disaster or what have you?

Tim: Right. I like to tell people to think about recovery point objective as how much data can you afford to lose? Your point in time, is that 5 minutes, 10 minutes, an hour, to put it in perspective. Unfortunately when we talk about recovery time objective and recovery point objective, everybody says immediately, zero down time, zero data loss.Fine, give me $10 million and we’ll start talking. That’s really hard to hit but you can usually land on something reasonable and build your recovery strategy from there.

Carlos: That’s an interesting point, that you will need to setup your backup recovery, so again, you can recover, which we’ve mentioned. You want to talk about all the things that will be needed for you to be able to get to those points. Lots of different features that are available to give you highly available data, but they don’t necessarily come free.Having those discussions and being able to understand what your business requirements are, and what they’re willing to pay for or potentially even change their business processes if they need to so that they can be able to recover or take care of some of that recovery if needed, I think is also in play. Any thoughts on going about with discussions of the business of how to set these up?

Tim: Absolutely. It all starts with determining what is the [inaudible 00:05:39] and recovery time objective? Knowing what the business requirements are, and what they need in the event of a disaster, or in the event of some “oops” type situation. Update statement without a [inaudible 00:05:53] clause. That could be considered a disaster.If you just trash the entire sales table, your business is down and that is a critical point. Depending on your HA technology, if you are doing an availability group or mirroring type situation, that update statement was ran against both nodes. What can you do at that point? What is a restore operation?

For disaster recovery, if your data center were to go dark or become a smoking hole in the ground, how quickly do you need to be up in a secondary location? What is your appetite for having that secondary location and the technologies there and how are we going to get the data there, type thing.

Based upon those business requirements, will dictate what your backup strategy should be, because your backup strategy mimics your recovery strategy. It all comes back to what does the business require…I say require, not want. What is the regulatory requirement for that?

Carlos: Hopefully, part of our discussion is we want to be able to provide the ability to give back. They may not be necessarily happy with the time it takes to get there, but we have always wanted to be able to provide a way to get back. This is Lesson 101.It’s funny, actually his name is escaping me, but the fellow started code school, which is now part of pluralsight. as well, he tells us a story. He was a developer and still is. He tells a story about a script that he was going to run on the database for a company he was working for. He knew it was going to take a little while. It was an update to all of the contact table for this application.

The organization was mp3.com at the time. They had several million contacts in there. He didn’t have time to do it during the day, so he thought, “OK, I’ll just let the script run at the end of the day and I will check it when I get home.” He didn’t check it that Friday evening. Saturday evening rolls around, he logs in and says, “Hmm, I don’t see my changes. My script says that it ran, but I don’t see the changes that I thought would be in there. Let me run it again.”

He runs it again. Again forgets to check back in on it. He comes in on Monday morning and they have kind of this all hands meeting. The manager says, “So, someone ran a script this weekend that updated every single email address in the contacts table on Friday. The DBAs were in here all day Saturday restoring the database.” They had some issues with the restore process. It wasn’t quite there so they lost a little bit of data. Then, somebody ran it again on Saturday night.

[laughs]

Tim: Oh Wow!!

Carlos: Can somebody tell me who did that? He was like, “Please not me, please not me, please not me.” He goes back to look at his code and sure enough, no [inaudible 00:09:21] and so [laughs] that was an example of always having a fall back plan there.

Tim: That’s right. We always like to say it’s not a matter of…if it’s going to happen to you, it’s a matter of when you’re going to do that. When you are going to drop. You think you are in [inaudible 00:09:42] when you are in production and we try to segregate access and rights and things. You get one of those drive by or walk by your cube distractions and you turn around and you execute and you are on the wrong screen. Stuff happens.Even in real world, in the ETL process, you get a bad file. The file is structured correctly, so all your checks pass. All your little SSIS, little widgets turn green and it ingests this data and it’s really bad data. It’s just mesh.

[crosstalk]

Tim: If you don’t have an easy way of identifying that this was the data that was new, if it was a merge into a whole bunch of different dates and you don’t have an insert date or something you can trigger off of, how do you get back? You have to restore back to a point in time. So having the ability to do that is crucial.

Carlos: I have to tell a story on myself. I was working with some healthcare data in a scenario where they wanted me to help reclaim the data. All the females become Mollys, all the males become Mathews. Change the address information, for example, so that they cannot present this in their demo’s when they are looking at other clients.Just like you said, I was developing this in test , everything was fine. Somehow, my connections switched. To this day I have no idea how that happened and I ended up going against production. Ran the data manipulation and all of a sudden they’re in the data base. All the women were Mollys and all the men were Matthews. So yeah, talk about embarrassing.

Luckily, one tip I will give there is that once I had identified what I had done, I immediately found out how many instances, which luckily in this case just one, it was affected and then I immediately took action by notifying the team. The worst thing you could do is try to cover that up somehow or wait for somebody else to notice. That would not be good.

Tim: That’s a CLM, a Career Limiting Move, trying to cover up. Data professionals were supposed to have certain levels of integrity and honesty and things and being able to raise your hand and say “It was me. This is what I did. It was an honest mistake. Let’s get this fixed”.Once people like myself and you and others had that moment is that when we’re so much more alert and aware of what we’re doing in production. That’s like coming to age type moment. If you’re going to hire a junior DBA and they’ve never have that moment, guess what, they’re going to have it with you.

Carlos: You want to be prepared.

Tim: That’s right.

Carlos: That takes us into I guess the different steps that we can take to be prepared. Let’s talk a little bit about some of the recovery models. I’m counting seven options that we’ve got. We’ve got full, partial, differential file group, file logs and copy only. Let’s go through this. Question, how often are you normally taking full back ups of the databases.

Tim: I typically do weekly full and daily differential.

Carlos: That’s out of the gate you’ll do that?

Tim: Yes.

Carlos: Interesting. I guess I tend to be a little bit pessimistic. I have been advantaged to have lots of space. Normally in the beginning I’m taking full back ups every night. When do you decide to do something else? I think some of that may be a matter of style, size of the database, because it plays in there.

Tim: Yes. Size definitely plays into it. Also a lot of my clients are 24/7 type shops and backing up cross the wire to a network type device doing the same full back up every night, you’re backing up so much data that has not changed. You can decrease the amount of your back up size by doing differentials during the week and then in that we’re doing weekly full daily differential log backups at some smaller intervals anywhere from 5-15 minutes.Doing that, you drastically cut down your back up time, cut down the size of them. Your restore may take slightly longer since you do have to restore a full and then a differential, but most in cases these differentials are just gigabytes in size, so you’re talking minutes. But then the time during the week the overhead, the IO from having to do your constant your full back ups.

If I’m dealing with a client and their databases, their largest one is your 10, 15 or 20 gig, then OK, a nightly full would be OK. A lot of my clients, we’re dealing with hundreds of gigs to terabytes databases and doing a nightly full of a multi terabytes as databases is just not efficient.

Carlos: Right. You mentioned, I’m breaking that up, so ultimately the size of the database. Normally the time that it takes to do that backup, when I looked to change from doing fulls, if I am generally doing one file, then I’ll try to do a full back up to multiple files.Then if I can’t fit it within my window of time that I want those backups to take place, then that’s when I start working at the differentials. One of the pieces about that differential you mentioned is that the differentials will take back ups of everything that’s happened up to the last four.

Tim: Yeah. The differential does not reset the differential bitmap, only a full back up. It’s not an incremental, it’s not everything that has changed since the last differential. It’s everything that has changed since the last full. Over the course of the week as data continually changes, your differential would increase in size until you take that next and you schedule full backup.

Carlos: We mentioned the log backups. Interesting that you mentioned smaller increments. With full differentials we talk about doing that, so once daily generally, right? Now we get into our logs and you mentioned 5 to 15 minutes.It’s interesting, I run through a lot of places where they will set a log back up, they tend to do it even in hours, there’s a time I’ve seen like in four hours. Again, some of this goes back into the recovery models that we talked about. What I want to say is I want to err on the side of caution and want to take my log backups more frequently than every couple of hours

Tim: More frequent log backups give you a smaller granularity of your data loss to decreasing the potential for data loss. It’s also maintaining the log more efficiently. The more frequent that you do the log backups the more frequent the transaction log is truncated.You get the re-use of the VLS. The smaller that log you could potentially be if you are doing it full hours maintaining full hours worth of your logging activity. If you’re doing it every 15 minutes and you have transactions that don’t waste span longer than 15 minutes then the log should be relatively smaller. There’s additional benefits rather than just your recovery time or recovery point objective in having more frequent log backups.

Carlos: Sure. My experience has been that I don’t see if any disc space by increasing my log time. If I do four backups an hour every 15 minutes or one every hour, the log sizes are basically the same, right?

Tim: Yes, within reason. If you take a log backup every four hours, then over eight hours you’ve used X number of megabytes or gigabytes versus eight hours worth of log backups every 15 minutes. It’s still roughly the same amount of data churn. The more frequent the backups, you still have a little bit of header information and things like that.Bytes worth of redundant or additional data, but for the most part, the logging activities should be, within reason, about the same.

Carlos: With our disc space today, the bytes are insignificant. The lesson there is if you’re unsure about how often to write log backups, go with more frequently.

Tim: Right. If you ask the customer “How much data can you afford to lose?” they’re going to say “None.” Unless you’ve negotiated, and it’s some special case with the system where they can go and reload files, because the database is really just loaded from some ETL process, and they can redo that work, then there’s really no reason to have very wide, long durations in between log backups.From your perspective, if you’ve got a standard, your standard is every 15 minutes and they’re telling you every hour would be OK, why change it. Be the hero, and do more frequent backups.

Carlos: That’s right. I agree there. The last one is the copy only. This one, the copy-only is ultimately a full backup where we don’t want to break that chain.

Tim: That’s typically where you’ve taken more frequently. A copy-only with a differential is still just a differential backup, because is still isn’t going to reset the differential bitmap. Where the copy-only is crucial is when you’re doing fulls with differentials. If you were to make a full backup without a copy-only, every differential after that full belong to that full.That one full backup you just made is its base, so if you go to restore your full from Sunday that somebody made a full on Tuesday, and then you go to apply the differential from Thursday against the full from Sunday, it’s not going to work. If you’d made that full on Tuesday with copy-only, it’s not going to reset the bitmaps.

It’s as if that full never really happened. A good practice to get into if you’re in an organization that does differentials, and you’re making a one-off backup that is not part of your recovery plan, or your normal sets of backups that you are going to use to restore, always use copy-only.

Carlos: Yes. Is that fair? You talk about your backup strategy. I think I see that more if you’re using a third party tool to do your backups. I think if you’re using SQL server through maintenance plans or a job and you are storing this copy-only backup in the same place, you may be OK. Hut it’s usually if you’re using a VERITAS or Symantec or something to do those backups. It doesn’t know about that backup, and that’s why we use a copy only. Is that a fair assessment?

Tim: Yeah. I think where you’re going there is folks that are doing VM snap shots or something going on if it’s a virtual machine where it’s also taking backups. I’ve seen some of those systems use copy-only, so it doesn’t interrupt the DBA’s process. If you’re using native SQL server backups where database maintenance plan to do yours, that some of these third parties will use with copy-only so that they don’t disrupt your recovery strategy.

Carlos: OK. Those are the backup types. Your databases can have them. Any special consideration for backing up the system databases? We’ve got Master, MSDB, Model, and tempdb .

Tim: Yes. For Master, Model, MSDB…MSDB, a lot of folks, they don’t think about the system databases. They think they can restore or install SQL server, restore the user databases, and they’re good to go. If you’re not backing up MSDB, you just lost all your DTS or SSIS packages, if you store those there. I don’t know how many people memorize or take screen shots or have everything fully documented for every single jobs that’s running as well as this complex SSIS packages.

Carlos: Right. All this security is associated with them.

Tim: Right. I’ve had people say, “Well, we have those in source control.” Fantastic. Good for you. Are you sure the version that’s running on your server was the last one checked in, things like that. Play it safe. Backup MSDB. Master is another one. That’s where all your users and permissions and things are. If you were to restore your databases to a new clean instance, your applications aren’t going to be able to connect, because the users aren’t there.The last thing you want to be doing is trying to scrounge and figure out what AD groups had rights, and what were the levels of permissions and the local SQL accounts. Having Master backed up is a good thing, but you can also script out all your users and permissions so that you don’t really have to worry about restoring Master.

In order to restore Master, you have to have the exact same revision of SQL server. If you’re running 2008 [inaudible 00:25:13] “10.50.”…and these four digits, you have to be at the same level to restore Master. I always encourage folks, still have it backed up, but also script out your users and permissions so that you’re not having to fight and get to the same patch level to restore Master just to get those logins.

Model…SQL server can’t start up without Model. You tend to leave my favorite interview question is “How often should you be backing up TempDB?”

Carlos: [laughs] Your book threw me for a little curve there. There’s a little section heading on backing up TempDB, and I thought, “What–what–what could be there?” [laughs]

Tim: Right. The thing is, you can’t, even though it’s an option. You can go and try to back it up. It’s going to throw an error. That’s always a fun one. In interview questions, talk about how critical TempDB is and some settings around it. Say, “So, since it’s such an incredible, useful database, how often should you be backing that up?” “Oh, every 15 minutes we replicate it, and we put it in an availability group.”

Carlos: [sarcastically] That would be interesting to see. Along with that Master database…is it being the brain of that, you mentioned scripting those out. I think that if you’re in a fairly static environment, that may work for you. I think most places, however, the documentation around the instances and any changes that have happened since the last script, it’s just not there, it’s not a strong infrastructure.Yeah, go ahead and take that opportunity to back it up. It can be a little cumbersome to get that back, but always better to be able to go back and know you’ve got it than not.

Tim: Absolutely. There’s things where you can repair Master and things like that but sometimes, emergency repair options, you’re going to end up trashing. It’s going to rebuild Master. You’re still going to lose those logins. The organizations I’ve worked at in the past, large regional bank, all of our instances, every single night we would script out the logins and permissions.One person might have changed a local SQL account password during the day, but out of the one or two that might have done that from the hundreds that were on the server. OK, I can triage that. I don’t want to be triaging a thousand people

Carlos: Good point.

Tim: A SQL agent job, output it, save the results of the sp_help_revlogin procedure to a file, store it in a same location that you store your databases. That’s part of your recovery plan. Auditors, examiners, they love that stuff.

Carlos: There you go. You made an interesting observation there. The sp_help_revlogin. Just to take a second, that store procedure is ultimately available through Microsoft support. What that enables you to do is script out with a hashed encrypted password, all the information that you need about a user, so that when you re-created a second instance, for example, it will be just like it was on the first instance.All of that database information and security will be inherited, if you will. You don’t have to worry about orphans users and things like that, which can be a pain. I use it more often in moving environments, upgrading or things like that, but it could be helpful in your recovery. You mentioned you use that in your nightly processes or in your recovery processes?

Tim: Yeah. We’d have a scheduled job that would run, that would output that to a text file, store it in a location that we would have our databases, which was all encrypted and protected and things. Yes, nightly process. If we had to stand up a server in our DR sight from the backup device that was replicated to our DR, we would have instances that were but we wouldn’t have to worry about restoring Master.We’d just restore…run the script. We had the process that would run would first drop all logins and then run the script to add all the logins back just in case from in between the R tests and things that passwords have changed. The user already exists, so it wouldn’t be updating the passwords. We’d just drop logins, add the ones back, restore user databases, restore MSDB. Restoring MSDB is like any other database. There’s no tricks or catches.

Carlos: Yes. It’s very straight forward.

Tim: Restore the user databases and MSDB. We’re up in DR with minimal DBA interference. Actually, we had restore procedures where E admin, that could have access to the server…could run the Power shell script that would do all this stuff for us. It was almost 100 percent automated. It wasn’t too much to it.

Carlos: Yeah, very good. I think that you want to get there over time to be able to do that through…starting off with maintenance plans, go there first. Then, look at scripts. You even mentioned the Power shell scripts. OK, I think we should mention here, ultimately this is all about a chain or data chain.We want to be able to get all the pieces or sections of all the transactions that happen against our database…we want to be able to collect all of those and keep them together to be able to restore back to that point where we want. We use these different methods in order to help ensure we have all of the pieces or the links in the chain together.

Tim: Yeah. The [inaudible 00:31:47] .

Carlos: OK, that’s going to do it for us today. We are going to check the offer from SQL skills and then wrap up. Of course, next week, next Wednesday, episode four will be available where we’ll finish up the conversation with Tim about Backups. Thanks again for tuning in.The SQL Skills team has been very generous in offering a code to each one of you for 30 days of free training videos from their team. What you need to do is to email Paul Randall at [email protected] with the subject line, “User Group Pluralsight Code,” is what you need to put in the subject line. “User Group Pluralsight Code” to get a free trial. You get 30 days. You’ll have access to all of the SQL Skills content that is on Pluralsight.

Again, you can mention in your email that Tim sent him your way, and he’ll be happy to get that information out there. Now, Tim, you were mentioning that there’s how many hours of video are available?

Tim: I want to say it’s like 170 to 180 hours. There’s close to 40 courses, lots and lots of good content.

Carlos: Wow! Yeah, I can imagine that that is lots of deep dive stuff, particularly if the SQL Skills folks are known for their deep dive training. Do you have any courses on Pluralsight? I failed to check.

Tim: I’m working on one now on consolidation best practices and techniques, so hoping to have that wrapped up in the next 30 to 60 days.

Carlos: OK. Very good. I know those can take a long time.

Tim: It’s amazing how much time goes into putting together your slide deck and then doing all the recording. If you thought user group and SQL Saturday sessions, past summit sessions were intense, the Pluralsight, because you want to have it perfect and just right, it’s a labor of love.[music]

Back to top