Episode 28: How Do You Know There Is A Problem? Baselines!


1400Have you ever had a situation where performance got worse and you were sure why?  Do you keep records of when changes happen to your system?  How do we know there’s a problem?  If we have a baseline, this can help us out.  More often than not, tribal knowledge rules the day and if you weren’t part of the tribe when the on-call pager goes off, things can be tough to figure out.

My guest this episode is Erin Stellato of SQLskills and we discuss what your baseline should consist of and how you might go about capturing some of that data.  I am always interested to see how people monitor their servers and I know this episode will give you a few things to think about in your baselines.

Transcription: baseline

Carlos L Chacon: This is “SQL Data Partners Podcast.” My name is Carlos L Chacon, your host.

Today, we are talking about baseline. I have with me Erin Stellato from SQLskills. She’s with us today. Of course, this is a real treat, always enjoy talking with those folks. We have talked about finding your people or people that resonate with you. Erin is one of those people for me. I first met her at a SQL Saturday in Philadelphia several years ago.

I really like her teaching style. The way she lays out information makes it very easy for even a knuckle-dragging Neanderthal like me to understand.

I think there’s a little bit of confusion around what we as database professionals can do to be proactive in our environments. This episode is definitely going to go and provide some thoughts around that area. Ultimately, today, we’re talking about baselining.

Today’s episode is brought to you by SQL Cruise. Learn. Network. Relax. Grow. Just add water. Check out sqlcruise.com for more information.

If you have something we should be talking about on the show, you can reach out to me on Twitter @carloslchacon or by email at [email protected]. Compañeros, it’s always good to have you on the show. I do appreciate you tuning in. Let’s get to it. Welcome to the program.

Erin, thanks so much for being here.

Erin Stellato: No problem. Thank you for having me.

Carlos: As I mentioned, I attended your session in Philadelphia of 2012, that SQL Saturday up there. I know you’ve been talking. That session didn’t happen to be about baselining. In a recent post, you mentioned your first ever session was on baselining and you’ve done several posts on baselining in several of the popular SQL Server blogs.When I thought about having this discussion, I knew I wanted to have you on the show. Take us through some of your experience with baselining. Why do you think baselining has played such a large part in what you teach? What do you see as the benefits of baselining?

Erin: Sure. My previous position, before I joined SQL Skills, was at a software company, a company that produced software that thousands of customers across the US, and across the world really, ran. The back end for that application was SQL Server. I started out at that company in technical support and moved over to the database team over time and really fell in love with SQL Server and databases and how it works.I would get pulled into tech support cases and even when I was in tech support, because of issues related to performance. It was always about, “The application is slow. We’re seeing these issues.” One of the first things that people tend to blame is the database.

Carlos: Of course.

Erin: The question was always, “Tell me what slow means.””Well, it’s just slow,” was often the answer. “How do you know?” “It used to be fast, and now it’s slow.”


Erin: What was fast and what is slow? Trying to quantify fast and slow became a real challenge, because people didn’t have baselines. They couldn’t tell me what was good. They could just say, “It’s bad.”That was where the interest came from because there were ways to measure performance in terms of how long did it take to execute a particular action within the application. People weren’t doing that, nor were they collecting any information on the SQL Server side, not just about how quickly something happened but even what was the configuration of their environment.

It wasn’t uncommon for someone to go in and make a change within the SQL Server configuration and have that adversely affect performance. Nobody was keeping track of what those settings were. No one knew when anything changed, unless it would get logged in the error log or maybe would happen to notice something, because also people don’t use change tracking. Not change tracking for SQL Server, but for their processes, when someone goes in and makes a change.

Carlos: Yeah, exactly.

Erin: That’s really where it started for me.

Carlos: OK. One of the interesting thoughts that you have again in these posts, which we’ll put up in the show notes for today’s episode, and that is the idea of taking a mini health check of your environment every month. That’s spawned out of the same idea as well?

Erin: That’s something that we provide as a service from SQLskills. One of the services that we provide is a health audit. Let me come in and let me look at your environment. We use the SQLdiag tool to do this, which checks with SQL Server.People have it in their environment. They don’t have to download a tool. They just download a set of scripts. They use SQLdiag to run that and capture the information that we want. We get a snapshot of their environment. Let me see what this looks like in terms of the server configuration, the instance configuration, database configuration.

What do we see in the logs, the default tracer, extended events, system health session. We pull all that information and review it to see what does the health of this environment look like right now. That is a baseline, even if things are not good.

If things are not good and they’ve not been great for a while, that’s a baseline. We can improve that. If things are great, then that’s the information where we know, “Hey, this is what things look like when they’re “normal.”

Once we’ve done that for a customer, if they’re interested in remote DBA services, then every month one of the things that we do is we repeat that audit. We compare the information each month to what we’ve seen previously, looking for trends, looking for problems, to proactively identify changes.

Carlos: You’ll actually go through that PSSdiag process each month?

Erin: We will. We’ll run SQLdiag every single month. We’re consistent in that we’re always capturing the same information and then comparing it to the previous month. In some cases, I’ll end up going across three or four months to look at the data, because you’ll see subtle changes sometimes in something like wait statistics or virtual file stats.It increments maybe a few milliseconds each month. When you look at that month against month, it’s only a few milliseconds of increase. It doesn’t necessarily get flagged in your brain as, “This is a bad thing.” If you compare January down to May, and those few seconds add up to a hundred milliseconds, then that’s something to note.

Carlos: Yeah, you see a trend there.

Erin: Right, we see a trend in that increasing. We’re doing that with our tool. There are third-party tools that make that a lot easier to see.

Carlos: That’s an interesting thought. In fact, again, this is a couple of years ago. Red-gate was having one of those SQL in the City events. I’m from Richmond, Virginia. I actually traveled up to Boston. I wanted to see Adam Machanic. Grant Fritchey was there, Steve Jones. At that time, I hadn’t met those guys. I wanted to go up and see them. At Adam’s session, he actually mentioned using a monitoring tool.My jaw hit the floor there. I thought, “I don’t like this guy’s.” [laughs] Probably forgotten more about SQL Server than I know. He’s like, “Use a monitoring tool.” I thought, “OK. We’re at a Red-gate event. Maybe he’s just saying that.” In talking with other people, they were like, “No, no. You should be using a tool.” It’s ultimately for that exact purpose of keeping that history and being able to review it.

I think when we look at baselining, that’s what we need. We need something to be able to look. I think a lot of times we may fall into that trap of taking a single number and then basing some decision on that versus, as you’ve recommended, looking over time to say, “Hey, what’s changed,” or “What’s different about my environment?”

Erin: Right. In the very first presentation that I did on baselines, I had a slide. The very first [laughs] slide was someone out in the snow, which is appropriate for this time of the year, shoveling, shoveling his big driveway and they were shoveling it manually. I said, “This is what you could do with SQL Server,” meaning you manually have to go in and set all of this up.You have to either pull some scripts from the Internet, some of which I provide, and implement those, and set everything up, and manage that and watch that yourself. That’s a lot of work. You can do it. It’s a lot of work. Then I had a picture of a snow plow or a truck with a plow on the front that did it all for you. That was my analogy of a third-party tool.

The difference is money, because if you can prove to the business the return on investment cost, then in the end, I believe, as much as I know that people can do this themselves, you’ve got lots of things to do as a database administrator. Implementing this and trying to make sure you’re covering all your bases can take a lot of time. A third-party utility makes it so much easier for you.

Management balks at the cost. What I recommend is, if you don’t have a tool and you really want one, take one set of scripts to capture information that’s most useful for you, that’s most relevant, that’s going to help you solve some problem. Track your time. Track how long it takes for you to put those into place, for you to get everything set up.

Then, figure out how many other things would be really important to you. Estimate that it’s going to take you the same amount of time for each one. Then take that and determine what your, assuming you know your hourly rate, what’s the cost for that? Present that to your management to say, “Look. This is something we really need. This is why.”

You need to give an example. “We had this problem. Here was what the root cause was. I could have probably solved that faster if I’d had this tool. I can do all these scripts and I can put all of this in, and it’s going to take me this amount of time. Or, we could purchase a third-party tool for this amount, which doesn’t take me time and lets me do something else, and I’m going to get way more information.”

Because if you try personally to replicate what any of those third-party tools do, that is all you will do. They have developers who are constantly working on that product and improving it. A DBA doesn’t have time to do all of that.

Carlos: Sure. The other value add is then the charting or the display graphics. You can take a look at that T-SQL output and make heads or tails of it, but then trying to convince other people that there’s a problem can sometimes be a little problematic.

Erin: The visuals are far superior.

Carlos: Exactly. It’s one of the things I liked about the PAL tool, where you’re doing this individually. You can then graph that up and you’ll say, “Oh, we experienced high CPU.” They’re like “OK, whatever.” Then you show them a graph over a week where their CPU is pegged, and they’re like, “Oh, now I get it.”[laughter]

Erin: Exactly. Those resonate much, much better than anything in text.

Carlos: Right. Again, maybe making a little bit of a jump, you want to use the tool but even then we want to start the process of collecting some of this information, as you mentioned. I guess let’s take a look at some of the common things that we might want to take a look at in our baseline. Ultimately as database administrators, the first area you want to start is with disk space.

Erin: That’s a good one, I think, because if you run out of disk space, you’ve got a serious problem.[laughter]

Carlos: Yeah. All eyes are on you on that one.

Erin: Right. I’ve got some scripts that you can use to look at how much disk space is available and also for the files that you have within a database, what size are they and how full are they, so you can proactively address that size and not let it grow automatically. We always want to leave auto growth enabled but ideally I’m presizing those files and monitoring the use of them and how full they are so that I can manually increase the size if needed.

Carlos: With that, how often are you checking, and logs I think is a common one, because they can grow regularly. Are you checking that once a week, every night? How often do you normally do that?

Erin: I usually have a job set up that checks that data for me. It tells me if it exceeds a certain threshold. Let’s say that I’ve got a file and I want to know when it gets above 90 percent full. Then I have a job that might run every hour, every four hours, maybe only once a day. Sometimes it depends on the application for that database and the volatility of it.Some are pretty slow in terms of how quickly that space gets used. Others can have random fluctuations because of processing that occurs at different times of the month. I might have that job run very frequently or just once a day, that checks to see how full those files are and sends a notification if they’re greater than a certain percentage.

Carlos: That threshold. Yeah, I think that’s a good point. We want to start taking those scripts, get comfortable with what they look like and then putting them into a job so that it can do those checks for us.

Erin: Right. It’s two pieces. It’s snapshotting that information so you have the history, and it’s looking for that change and getting a notification, because what I don’t want to do is snapshot everything and then have to manually go look at that data, as you said, every week or every month. I’m going to automate as much as I can.

Carlos: Right. Very good. To finish that thought, when you look at growing your logs, because you’re going to induce some conflict into the log there, as you’re growing it, you’re going to tie that up a little, do you have a schedule? Is like a maintenance window thing that you’re doing those? If you see that it’s going to grow before the maintenance window, you just let auto grow at that point or thoughts around that?

Erin: It’s a little bit harder to know when the log is going to grow. I can guess that it’s definitely going to be used during a maintenance task. Ideally I’ve presized it to handle that. You can also do event notification to let you know when file growth has occurred. Jonathan has a great post on that for either data files or log files. One of the questions that we often get asked in our Accidental DBA course is “What size is the transaction log supposed to be?” For that, there is no cut and dried answer.You have to set up some monitoring to check it. You can use DBCC SQLPERF to look at the size and the percent used. I’ll capture that information over time and auto growth, of course, always enabled. In that case, I’m probably not going to be able to react quickly enough to grow the file if I need to, because sometimes it can get up to 90 percent and then we might have a backup and it will drop right back down.

I’m not in the habit for the log, especially in a newer environment, of monitoring that. I still will monitor it and I’ll get a notification. I will tend to let that go to sort itself out. Then you’re running through a normal workload, which may take a week, which may take a couple of weeks. You want to make sure you go through your maintenance and then maybe make some adjustments. If I increase the frequency of my transactional backups instead of once an hour to every 15 minutes, does that handle the size a little bit better?

Once I’ve figured out what that size should be, then I would go shrink it, size it back out to make sure I’ve got an adequate number of virtual log files and then let it be. Still continue with the monitoring. Still continue snapshotting that data on a regular basis and still getting the notifications if it becomes full above a certain threshold and see if the auto growths occur. There I would probably put in event notifications. I do have this for a customer who seems to run some crazy processing at random times and just blow out the log.

Then I’ve got a notification so that if the log does grow because of something that someone did, either on purpose or accidentally, I can see that the log has grown and then I can go in and make an adjustment. Either the log needs to be that size, or I’m going to resize it because that was just a one-time thing and it doesn’t need to be that big, because I want to keep that transaction log as small as I can and still manage everything in there, because if I’ve ever got to do recovery, it’s got to write out that entire log file and it’s got to zero it out. You can’t use if either.

Carlos: Right. I think taking a peek at when those logs have grown could be a good starting place just so you can see what the growth looks like there.

Erin: Right.

Carlos: A couple of other options we have are, of course, backups, right? Database administrators, we have to make sure we have good backups. Now, one of the things you talk about is actually getting estimates for the space that you’ll need. Does that ring a bell?

Erin: Where we’re trending backup size?

Carlos: Yeah, you’re trending backup size, that’s correct.

Erin: Yep. I don’t know if I have that linked in my summary post. Yes, one of the things that I like to do, the client with the log growth. One of the other fun things is that their database, it’s an over terabyte database. Sometimes they do some fun things that increase the size of the database by 100 gigs because they’ve got some really wide tables that have LAV data and they’ll create indexes that include those LAV columns and all of a sudden we’ve added 100 gigs.One time they had an issue where they implemented some new error handling and it started adding something like a hundred million rows to a table every day and that table just grew. I like to monitor not just the space used but also the backup size, because I’m writing notes to a specific location and I’m looking to keep a certain number of days online.

If that backup file really grows like by 100 gigs, then that’s something I need to pay attention to because my job might fail, my backup job, because it doesn’t have enough space. That’s definitely happened with this customer because all of a sudden, within a day or two, the database has increased significantly in size.

I like to trend that over time, and I’ve been doing that for this customer, I give them updates every quarter, “Hey, look. Backup size has gone, it used to be a year ago less than 200 gigs and now we’re up to 350 gigs. I can’t keep as many copies of the database online, as many backups online as you would like. As we move forward to get new storage, because they’re running out and we’re about to play the shell game here pretty soon, I can tell them, “If we want to keep this many days of backups online, here’s how much space we need.” They have to make that decision of what’s important for them.

Carlos: I would think that information, as you mentioned, you were just talking to the owners there. It would help with the SAN admins as well. Say, “Look, yes, when I originally asked for the space a year ago, my backups were X. Now they’re Y.” You can show them. Having that data would make, “OK, I get it now.” Again, being able to prove your point, rather than just, “I’m too lazy to do maintenance on my backups.”

Erin: Having that data to back up either how much space you need for your databases in general or for your backups is fabulous. You never know when your lucky day might come along where the SAN admin or the storage admin comes over and says, “Hey, we’re getting new storage,” and you’re like, “Score.”They’re like, “How much space do you need?” They never ask about what performance you need, what do you need in terms of latency, they ask how much space you need. When you say, “You know what? I need three terabytes.” They’re going to be like, “Dude, right now you’re only using 1.5. Why would I give you three?”

You can come back with data, and you can say, “Well, look, we’re trending here. This is what it looks like. This is why I need three.” Anytime you’ve got that data to back it up, you have a much valid case when you’re asking for something.

Carlos: No question, no question. Much harder to go against that.

Erin: Right.

Carlos: Another option. Integrity checks are very important. You are getting this at the database level.

Erin: Capturing the run times you mean?

Carlos: Yeah, you’re capturing the run times of your integrity checks.

Erin: Exactly. Typically for our integrity checks we’re running those using Ola Hallengren script, which I recommend to everyone, and which we use for any of our clients, in terms of running your maintenance, your backups, your integrity checks, and your index and statistics maintenance.With his job, by default, it’s running the integrity check for all user databases. We don’t get the different checks at the individual level. If I wanted to do that, I would need to set up a different job for that.

You can see that. You can get a pretty good estimate of that looking at the error log to see how long each one takes. In general, it’s great to know how long the integrity check takes for a database, because if you run into an issue and you’ve got to run one, someone is going to be asking you, “When are you going to know?”

“How long is this going to take and when are you going to know?” If it takes longer than normal, then that’s a red flag and you’ll be able to say that. You can say, “Look, it normally takes an hour for this database. It’s taking an hour and a half.” That usually means it found something. We need to let it finish.

Carlos: Right. Interesting, good point.

Erin: Always let that check finish. There’s no ifs, ands, and buts about it. You’re never going to know what’s wrong unless you let that check finish.

Carlos: Let it finish, yes. Being able to provide that answer for, “How much longer will it take,” just puts you, again, in the driver’s seat. It gives them a little bit of comfort so that, in theory, they’ll leave you alone or give you a little more leeway to finish what it is that you want to do.

Erin: Right.

Carlos: The last piece I want to talk about is maintaining your log. Ultimately, you talk about taking a peek into your log. Now there’s some PowerShell scripts that can go and do that for you if you’re looking for something specific. Thoughts around what it is that you look for in your log.

Erin: The error log, you mean?

Carlos: The error log, yes, sorry.

Erin: No, that’s OK. I just want to make sure we’re talking about the same one. I don’t think that I have any scripts that interrogate it directly. I have a script on sqlperformance.com about using the error log proactively looking for things. Part of the audit, anytime we do an audit, whether it’s a health check or whether it’s part of our monthly audit, I am looking in the log to see what’s interesting there.There’s a lot of great stuff in there. When people use trace flags, when people change the memory allocation. One of my favorite things is when, if you’re running at a virtual environment, sometimes the VM administrators will make a change to your VM configuration. You may or may not be notified.

They may just say, “Hey, look, we’re doing some maintenance for the VM and you’re going to experience an outage.” It’s really good to go back and look to see that your memory allocation is still the same and that your cores are still the same. I also look for lock pages in memory and SPN information, instant file initialization. You have to do that through a trace flag.

If any, what SQL traces are being run, when CHECKDB finished, if there’s anything that was changed at the instance level or the database level. All kinds of fantastic information you can get from the log. I admit I don’t have anything that queries that. You don’t have to use PowerShell. You can use CSQL. I typically tend to look at that, and hopefully in just looking for errors in there, looking for issues.

I always use trace flag 3226 which disables the writing of backup information, write full backup disks or transaction log information to the error log, because that’s just information. It’s not errors. I really want to see only errors in the log.

Carlos: Right. People logging successful logins as well, that’s a pet peeve of mine.

Erin: Right, that’s so hard.

Carlos: I’m like, “Really? Why do you have to keep this log?”

Erin: Right. For some people, they have to do it for compliance reasons. I’m saying, “Well, you should be using server audits.” That’s a better way to track that information than through the error log.

Carlos: Exactly. How long do you normally keep your error logs around? Grooming or maintenance thoughts around that.

Erin: I cycle that usually weekly. You can keep up to 99. That’s a lot. I don’t need that many. I like to keep about two months’ worth just because you never know. I usually have about 60 files. I know some people only tend to keep a month’s worth. I don’t like to cycle them every day, because I don’t know that in most environments you need that, especially if you’re not writing any of the backup information there.A week, to me, is what I generally implement. Again, keeping two months online.

Carlos: I like the day, mostly because if there is an error or inevitably somebody will come to me, “Hey, last Friday I would experience this.””Oh, man, really? [laughs] I have to take a peek at it. At least I can, “It was Friday? OK, now I know which log to look at.” I get a little bit of filtering that way. Interesting.

Erin, always a great conversation. We do appreciate this information. Of course, we’re going to have the links that you mentioned in the show notes at sqldatapartners.com/podcast. Compañeros, you can check that out there. Again, I’ll mention SQL Cruise.

Erin, are you a SQL Cruiser?

Erin: I went on the very first SQL Cruise, yes.

Carlos: Look at that, the very first one.

Erin: The very first one back in, oh, geez, 2010. Is that when that was?

Carlos: Wow.

Erin: That seems like forever ago. It was a game changer for me.

Carlos: Yes, I’m a big fan as well. Tim and Amy Ford have put together a great training opportunity. They’re allowing us to give $100 off the price of admission. You can check out sqldatapartners.com/sqlcruise for details about that and get on board. I think you will have missed the 2016 voyage this year, but there’s always next year.

Erin: Right. The first one, I don’t know if there’s one or two this year, but I know the first one is coming up in a couple weeks. My friend Jes Borland will be there for some of the training. It’s one of those things. It’s not necessarily what everyone thinks of in terms of training. It’s more than just technical training. There’s a lot of professional, not training but mentoring and networking that goes on there that you don’t find at any conference, at any SQL Saturday.It’s just you and that crew. You create a great bond during that trip. Again, technical knowledge, yes, definitely, but it’s so much more. So many of those folks from that cruise I still keep in touch with, I’m really good friends with, I love to see even though we’re all doing way different things than we were then. It’s really cool to see how far people have come since going on the cruise.

Carlos: I was talking with Melody Zacharias. She mentioned, “The technical components you can get on the Internet on your off time. Those people’s time, you can’t get back.”

Erin: Right. The ability to ask those questions during that session and talk about it as a group. In a SQL Saturday or even a user group where you have smaller crowds, that doesn’t happen. You don’t raise your hand and say, “Look, here’s something that I’ve seen. What would you do?” Someone else might chime in, “We had something really similar, and this is what we did.”You don’t get that same kind of interaction at a conference or anything else. On the cruise, you do. Even though you’ve got particular time frames where you’re supposed to be talking about a topic, if it goes beyond that, you all go we went to this deck. We had this deck that we basically took over the whole time we were there. We’d all head back to the deck after class and we’d grab something to drink or to eat. We’d sit there and continue talking.

Carlos: Hash it out.

Erin: Exactly.

Carlos: Very good. Erin, we’ve arrived to the portion of the program I call SQL Family.[laughter]

Carlos: Here we get to know a little bit more about you and some of your work experience. One of the things we like to talk about is your favorite SQL tool. What’s your favorite tool? Why do you like it? How do you use it?

Erin: I mentioned Ola Hallengren script. That’s one. I have two. That’s one of them. Then Adam Machanic’s sp_WhoIsActive scripts. Those are two things that have been staples for me for a very long time. In fact, in that first baseline session, I ended by demoing Adam’s WhoIsActive. It was awesome. I spent time with him at either the 2010 or the 2011 summit, where he was talking about different ways to use it.He hadn’t written that sp_WhoIsActive post-a-day series that he did, right, about how to use it. He gave me some awesome tips. I incorporated that. The last 10 to 15 minutes of that session was spent using that.

One of my favorite things to do with that tool is snapshot that information to a table. Kendra Little has a great blog on how you actually do that. I remember finding that blog and being like, “Oh, my goodness” and pulled that out.

I have people use that all the time to capture information. I remember Michelle Ufford, who works at GoDaddy, telling me that she would use that. She’d snapshot information to a table and then if something happened, she could go in and look at that. Typically if something happened in her environment, it meant that something ran for longer than five minutes.

She was doing a lot of data warehouse and big data stuff then. She would retain that information. If someone said something to her, she had that and went back to it. If I’m troubleshooting, I might snapshot it more quickly. That’s one of my favorite things you can do with that tool. Ola’s stuff makes management of those database maintenance tasks so much easier.

Carlos: So much easier. I agree. On SP_WhoIsActive, how often are you snapshotting that?

Erin: Depends on what the issue is.

Carlos: This is while you’re having the issue. You’ll go ahead and initiate that and say, “Start taking this every hour.”

Erin: Right.

Carlos: Very good. Yes, he was kind enough to be on the show earlier. Very interesting stuff with that. Eight years, it will be nine years, that that’s been around. It’s hard to believe.

Erin: Crazy, and they’re both free.

Carlos: That’s right, even better. Now you’re with SQLskills. Before you’d mentioned you were with another organization. Lots of SQL installations. You’ve had some diverse experience. What’s one of the best pieces of career advice you’ve received along the way?

Erin: That’s a really good question. I have to say one of the best pieces I got from Brent Ozar. Way back when I did the first cruise it was Brent and Tim. Brent was on that cruise. At some point after that, Brent did one of his FreeCons. I remember him talking about how much time you spent at work.If you hear any noise in the background, that’s my dog. I apologize. Hopefully, she won’t start barking.

Anyway, he was talking about how much time you spent at work and overtime beyond that 40 hours. He said, “If you’re spending more than 40 hours at work, what are you getting from that? Why are you doing that to get a three percent or a five percent raise?”

It made me stop and think. I’m like, “Oh, that’s a really good question.” At that point, I stopped. I continued to work hard for my company and work my 40 hours per week, but the extra time that I was working I turned into my personal growth.

That’s when I started writing the blog. That’s when I started spending my own time understanding how SQL Server works. It was hard for my family to understand that at first, because if you’re working more than 40 hours, you can say, “Oh, it’s work. I have to get this done. I’m supposed to get this done.”

When you voluntarily choose to spend that time doing something that looks like work but isn’t and it’s for you, I had to say, “Look, this is for my career. This isn’t for my career at my job. This is for my career as a whole. ”

That was a suggestion, a piece of advice that, I think, once I did that, again, helped me get to where I am now.

Carlos: Very cool. Yeah. I think you are your biggest investment, and you should invest in yourself.

Erin: Exactly.

Carlos: Speaking of investing, our next question, and I’m trying this as a new question that I’ve just introduced here.

Erin: Sure.

Carlos: You’ve just inherited a million dollars. What are you going to do with it?

Erin: Invest it. Pay off my house.

Carlos: [laughs] Hey, there you go. That’s one big house there, Erin.[laughter]

Erin: I’m very ridiculously practical. What is the biggest debt that I have is the house. I would pay that off, take the rest of it and invest it, which I’m really not going to have that much left after taxes. Not a ton would change, but it would be nice to get that out of the monthly rotation.

Carlos: There you go. That’s right. I hear you. I’m with you on that one.[laughter]

Carlos: Our last but not least question. If you could have one Super girl power, what would it be and why would you want it?

Erin: [laughs] There’s so many to choose from. I don’t know if you consider apparition from “Harry Potter” a super power.

Carlos: Of course.

Erin: I really think that’s cool. I’d really like to just be a wizard and have a wand and be able to do all that stuff.

Carlos: That sounds like more than one super power.

Erin: Yeah, I know.

Carlos: I would take the apparition.[laughter]

Erin: You’re going to cut me off there. That’s fine. I get it. I got it.

Carlos: No, very good. Again, Erin, thanks for being here. I’ve enjoyed the conversation. I think Compañeros are going to get some value out of it as well.

Erin: Yeah, thank you. Thanks for having me. I appreciate it.

Carlos: Compañeros, again, sqldatapartners.com/podcast for the show notes today. If you found today’s episode interesting, I invite you to leave a comment on iTunes or Stitcher. Of course, you can check out [dropped audio] . You can contact me on Twitter. I’m @carloslchacon. I’ll see you on the SQL trail.

Episode 27: Testing Changes Before You Break Production

1400Have you ever wanted a feature to help you assess the impact of future SQL Server upgrades, the impact of hardware or operating system upgrades, and for SQL Server tuning?  Well that tool has been around since 2012, but I have found that not many people have used Distributed Replay.  The level of difficulty to get this feature set up lends to the small adoption rate, but I chat with Melody Zacharias about her experience with the tools and I think it is time to give this feature a spin.

Are you using Distribute Replay?  What do you think the best feature is?  What is the worst thing about it?  Leave your thoughts in the comments below.  I look forward to hearing from you.


Transcription: Distributed Replay

Carlos L Chacon: This is the “SQL Data Partners” podcast and this is Episode 27. I am Carlos L. Chacon, your host. Today we’re talking about Distributed Query, or maybe, better titled, “How you can test changes in production before you actually break production.”

In a recent Twitter conversation, this tool was mentioned. If you’ve ever installed SQL Server, you’ve come across this feature, otherwise I do believe it’s 2012 feature, we’ll get into that in just a minute. I’m guessing that the majority of your compañeros haven’t used it. I know I haven’t fully set it up but what’s new, right? If you’re a regular listener to this program, you might have asked yourself more than once already, “What has he in our SQL Server?”

At times I wonder this myself but today we’re saved because I have Melody Zacharias on the show today and I know she has used it, so I’m safe there.

Melody is a partner with ClearSight Solutions, a technology firm in Canada dedicated to the banking industry. She, as a matter of fact, is the first Canadian I’ve had on the program. I should clarify, first Canadian resident on the show.

Chris Bell, a previous interviewee was on the show. He is also from Canada, but he settled in the States some time ago.

Melody is a great example of why the SQL community is so great. Distributed query is something I wanted to tackle, but wasn’t quite sure how to get started. But after sitting through Melody’s session, I feel a little more equipped to take it on. She’s put up some blog post and other material to get us started. The same holds true for you, compañeros.

Thanks for tuning it today. I’m glad you decided to tune in and I know it’s going to be a great episode. Compañeros, welcome to the show.

Children: SQL data partners.[music]

Carlos: Melody, thanks for being here. Welcome to the program.

Melody Zacharias: Hey, thanks. Great to be here.

Carlos: Earlier, we were actually in a Twitter conversation with Argenis. He happened to mention people were talking about this idea of testing their performance environments. They were looking to make changes in their environments, and this feature of distributed relay came up. I guess this is a 2012 feature?

Melody: Yes. Distributed replay came on the scene for Microsoft in SQL Server 2012. It’s continued from there through to 2016, but it didn’t show up until 2012.

Carlos: I feel like it’s been around a lot longer. I guess maybe I just done that many SQL installs that I’ve seen it. Compañeros, if that’s ringing your bell from you, that’s maybe where you have seen it is there is that install. We’ll talk a little bit about that and over that controller piece, I guess, that you can get with the SQL Server install.Why do you think distributed query is not getting a whole lot of love?

Melody: Distributed replay is not a common tool. It’s a tool that will be very useful in certain specific situations, but not, and because it takes a fair amount to set it up. The documentation is, I have to say, not the best that I’ve seen.

Carlos: Not so good.

Melody: Yeah. That can be part of the problem. It’s also hard for people to change. There’s still a lot of people who aren’t up to 2012, so…

Carlos: There is that.

Melody: Yeah. It is backward compatible in terms of acquiring your data load, but it’s not backward compatible in the ability to run it. You’d have to have, at the very minimum, a developer edition of SQL Server 2012 running in a test system so that you can actually do the replay part.

Carlos: Thank you. It’s called distributed relay. I think I said distributed query, but that’s not right, so distributed relay, replay. I’ll get it right. Distributed replay. Ultimately, I guess that if we’re going to define this, we would define distributed replay as, as what?

Melody: The ability to test your data your way. One of the key things about it is the ability to use your own data. Often, when you’re doing testing, it’s either difficult to acquire your own data, or you want to create specific situations that are very custom to either your industry, or your particular workload. It’s very handy when you’re checking things like indexes for example, because the indexes are specific to your data, right?

Carlos: Right.

Melody: You’re not just testing general workload. You want to test a specific workload.

Carlos: To do that testing, we could set up Distributed Replay. Now, you mentioned certain scenarios that you would want to set this up. Are there some known scenarios where it wouldn’t be of the greatest use?

Melody: That’s a good question. Anywhere you want to test something that you require specific parameters for, it’s really great for it. But just general volume, if you want to do that. I can’t really think of anything that it wouldn’t work for. It may not be the best tool, but it would work for almost everything.

Carlos: I thought you were going with that is something like so testing the input or the throughput of your disks. Before SQL Server even gets installed, you could still do some of those SQLIO or whatever the new tool is that has replaced that to use that, because that’s something in and of itself.I guess maybe once you get into SQL Server, you’ll be talking about changes inside of SQL Server. That’s when Distributed Replay might be a handy feature.

Melody: Yeah, it’s definitely data-dependent. When you’re talking about data, but I can’t see any reason why it wouldn’t be useful, if whatever you were testing was related to data. Even on occasion if that data’s related to disks.Distributed Replay can have up to 16 different servers that are hitting your SQL Server. If you wanted to test workload, you can increase the number of clients that you have, that are hitting your SQL Server. You can do some performance testing that way as well.

Carlos: OK. There you go. Let’s go ahead and get into the setting up components. There’s some vocabulary or some terms that we should become familiar with. Let’s talk about some of those, now.When we go to set this up, some of the terms that I think about, we have on display here, the administration tool, there’s a controller. You already mentioned clients and then the target server. Are those the main moving pieces as far as architecture are concerned?

Melody: Yeah. Those are definitely all the pieces.

Carlos: When I go to grab my load on my server, I’m ultimately using Profiler, right?

Melody: Yeah. Distributed Replay is built on top of Profiler. You get that basic look and feel. Most people, nowadays, have actually used Profiler for one thing or another. The problem with Profiler is making sure you use it on server site trace.As long as you got that covered off, you can only run it in parallel or in serial. You’re not getting the parallel workload out of it. The sort of really what Distributed Replay brings to the table with it. It allows you to do the parallelism and really hit your server and change things up.

Carlos: If I understand what you’re saying, I could take a Profiler on a single server. Then I could replay through that. What you’re saying is that I’m just going to hit one courier at the time, basically. That’s not how real life works.We have many different queries coming. Distributed Replay allows me to separate that. It feels like I’m getting hammered by multiple connections or something.

Melody: Yeah. Think of it like a web connection. Your web servers are going to get hit from all sorts of different places. That information has to go into your SQL Server. If you have one web server versus you want to expand to four web servers, how is that going to affect your SQL Server? You can do that test.

Carlos: That’s an interesting thought. Is it only helpful if you have multiple clients? If you’re just going to do one client, is there any benefit to going through the trouble of setting up Distributed Replay? This is well off, just replaying the Profiler trace.

Melody: The single client is a standard feature. Unfortunately, that’s all you get in standard. It’s still useful to be able to use your own data. Even if you can only do it that way, there are a number of features within Distributed Replay that allow you to change parameters.So, it gives you a lot more flexibility than just your basic Profiler, anyway. Those parameters can be helpful in testing and tuning.

You can also multithread that single client. Even if you only have one client, you can multithread that client. That will change up your workload, too.

Carlos: There you go. That something we should point out is that if I’m going to use more than one client, I have to get Enterprise version. Is that what you’re telling me there?

Melody: Unfortunately, yes. [laughs]

Carlos: OK.

Melody: We keep telling people that, but it’s true.

Carlos: That’s good to know. The first part, which might seem, at least to me, the most straight forward as I go out to my production database, I use a server site trace, a Profiler trace. I grab this workload. Then we’re going to set up the administration tool, we talked about, and a controller. What’s the function of the controller?

Melody: I would liken it to a project manager. A controller just keeps everybody on task, tells everybody what to do, when to do it, it’s like a scheduler. I think of it as the project manager of the whole situation.

Carlos: OK. It’s going to schedule the clients, which then point to the database.

Melody: Yeah. It coordinates that flow of information in data to make sure things are happening when you tell it to make them happen. One of the key things, before we get too much further on with that, when we’re talking about the data collection, there are specific things that you need with your Profiler.In 2012 tools, there’s a template that you can use within Profiler that is specific for Distributed Replay. That will make sure that you gather the key components that are required for Distributed Replay to work. That’s pretty important.

Carlos: OK, very good.

Melody: I just wanted to mention that to make sure that people are aware that you have to have specific pieces of Profiler to make sure that they can be replayed properly. That tool, although it’s a 2012 tool to allow it to collect data from other older machines can be installed on older SQL Servers, you can do that if you need older servers.

Carlos: OK, good deal. We want to be able to collect the right information…

Melody: That’s right.

Carlos: To go through all that trouble, particularly if we’re going to collect it off production, we know there’s a little bit of a watcher effect there, so we’re going to impacting that. We don’t want to be doing that too many times when I get all those things lined up.

Melody: Yeah.

Carlos: Once we have that, then each of those compete those pieces, the administration tool, the controller, the client then uses an XML file to do all those connections?

Melody: Yes. Unfortunately, there’s no pretty GUI, and that might be another reason people don’t use it.

Carlos: Yeah.

Melody: It’s a lot easier to just follow through GUI, and know what to do. Because there are the need for the XML files and the need to do a command line prompt to get things running and working. That could be part of the reason why it’s not been as popular as I would think it would be otherwise.

Carlos: Sure. Now, we’ve gone to that. Let’s say we’re going to invest that time in setting that up. Can my admin tool, my controller, and my client, all be the same server?

Melody: Yeah, they can be. They don’t need to be, they certainly can be. In some situations, it would make it easier.

Carlos: Sure.

Melody: You definitely going to want to run this in a test system anyway.

Carlos: OK. All those components are…

Melody: It doesn’t matter.

Carlos: That’s right. I would think test systems…

Melody: It doesn’t matter if they’re all in the same machine. Again, it’s going to depend on what you’re trying to accomplish and what you’re trying to test.

Carlos: Sure.

Melody: A lot of that is quite specific to what you’re trying to accomplish. At the very least, you want to separate your SQL Server from the other components, I would think.

Carlos: Yeah, that’s right because that’s what you’re trying to test. You want to mimic your production environment, hopefully to see, “OK, if I change this, what’s that going to do to my environment.”

Melody: Yes, but honestly if you’re doing it in a test system and you’re only trying to see what the impact of the certain indexes are, there’s no reason why you would need more than two machines. They don’t have to be physical machines. You can just spin up a couple of VMs and get this going quite quickly.

Carlos: There you go. Now, I guess is that typical? I guess your typical setup, are you each one separate using Enterprise. That’s what your, I know you engage in lots of different environments.

Melody: Generally, it depends on what the client has available and what their bandwidth is.

Carlos: Sure.

Melody: Unfortunately, at home, that’s what I use. I use just a single SQL Server and a single machine VM for each.

Carlos: For each?

Melody: Yeah, while I do one for basically the admin aspects of it and then one SQL Server. I only have two machines when I do it in my home lab, but when I’m doing at a client site, I’ll expand that out depending on what they have available and what they’re trying to test.

Carlos: Now, we talked about gathering the trace on the production environment already, we are using Profiler. There’s no way to use extended events to capture that?

Melody: Unfortunately, not yet. Not in 2016 either. I have not seen anything coming for 2016 that includes that either, unfortunately. No love for Distributed Replay in 2016 yet.

Carlos: [laughs] I know all these other nuggets and candies they have for us in 2016, but no love for Distributed Replay. I have to get the program manager on here to see if we can figure out what’s going on there.

Melody: I’d love to be on that call.

Carlos: There you go. Now that we’ve done that and we have that trace, and again how long…Ultimately, I guess it’s going to be test specific. What is it that we were thinking about changing, but how long do you normally keep that workload around? I guess let’s just use the example for testing some indexes? Once I have implemented those index, is my workload, is it garbage or can it be reused?

Melody: It can definitely be reused. Once you have everything all set up, everything can be reused. Probably the data is the one thing that will change the most often. Because once you’ve got all the pieces set up because honestly that is by far the hardest part is setting up all the pieces.

Carlos: Right.

Melody: Once that’s all set up in your test environment, you really don’t have to change it.

Carlos: OK.

Melody: Once it’s all installed and set up, that’s the hard part. After that, there’s so many different pieces that can be fine-tuned and changed that you can rerun that same workload multiple times and get multiple results.

Carlos: Sure.

Melody: That said, it’s also because it’s Profiler and you have a template for collecting the information. It’s super easy to go and get more information. One of the things that happens in banking quite often is, at the end of a month, there’s a lot of processes that run.

Carlos: Sure.

Melody: A fair amount of your data changes at month end. That’s when a lot of my clients will gather a new data set.

Carlos: OK. Now, is that just like at that point because it’s regular like that? Do you just have your template, your Profiler or server site job, if you will set to run, “Hey I’m just going to collect for…” What’s an adequate time there or what’s normal time? An hour, four hours, maybe just a couple of minutes?

Melody: That depends on your volume really and depends on what you’re trying to accomplish. What I try to do in banking is do two separate runs. In banking, they have specific workloads that are typical for daytime operations.You’ll have a lot of teller activity, you’ll have a lot of ATM activity, that type of thing, which is specific to daytime. Then, they also do a completely different type of workload with reporting and mass updates at the end of their day. They have this sort of more like what we would consider years ago as batch processing that happens overnight.

Carlos: Right.

Melody: I’ll often have two different types of workloads and I’ll collect one during peak hours, during the day. When I know most of their clients are cashing checks, I’ll usually pick that morning between 9:00 and 11:00. Then, I’ll also do when they do their overnight processing, I’ll pick the peak time of their overnight processing depending on what they’re trying to test, so I’ll pick a couple of hours in there. I’ll use those two different data sets depending on what we’re testing.

Carlos: What components have to change once I have that new profiler, is that just something that I’m taking on to the client to the run. What’s changing thereabout my environment? Does that make sense?

Melody: I think so. On the controller and on the client, there are specific things that can be determined like how many threads for that particular client, for example, and whether you’re running your update synchronously or asynchronously — those type of things.A lot of it is around performance, to allow you to scale out or scale back. There is time intervals, those type of things that can be set up.

Carlos: I see.

Melody: Just running your workload, there’s a huge number of performance tuning or performance related, either customized queries or things that you can get from other people that our whole SQL community has tons of in terms of performance tuning to check out what’s going on, on your system that can actually be run on your SQL Server, during the replay of that data.Even though you haven’t actually changed anything within your Distributed Replay Setup, there’s a huge number of things that you can actually test. I know Paul Randall, for example, has a number of queries that you can run to check your IO, your wait stats, all of that type of thing.

Carlos: You’re referring more to when I have this set up, and I want to see the effects of my change on my workload, I would use these DMVs or the queries to capture that information.

Melody: Yes.

Carlos: Right. I apologize. My question refers to — I have the profiler trace. Where does the profiler trace live and when I change that, what would I have to change?

Melody: It’s just a file that will live on your controller and the controller determines how it gets replayed.

Carlos: Just override the other one and then your good to go?

Melody: Yeah.

Carlos: OK.

Melody: It’s a simple file copy, so it’s a really easy thing to do.

Carlos: Very good. You have mentioned also, kind of this setup component, or setup process. That there are some firewalls settings or some things that you will need to set up so that all of the components can talk to one another. Particularly in banking, anytime you have to deal with firewall rules, it seems like there is always a big to do. Any blowback or issues you’ve had to resolve with those firewall rules?

Melody: The nice thing about Distributed Replay is, when you’re actually running it, you’re going to run in a test system, so you don’t usually have any blowback. What I warn people about is often whether it’s firewall or component services. You’re not always the one who has the authority to make those changes even in a test environment, depending on how large your shop is.

Carlos: Sure.

Melody: That is sometimes where the issue comes. I’ve had a couple people ask me about that and they’ll take my slides and they’ll show them to their sysadmin and then he’ll say, “Oh, oh, now I understand what you’re talking about.”It’s important to understand it enough that you can explain to the person who can actually make the change what needs to be done. You don’t necessarily have to know how to set up a firewall or change the components’ services settings. But it’s important to know, so you can have that conversation.

Carlos: Now if they’re in the same domain, in the same network, there’s no issues there, right? All the ports are generally open? That kind of thing.

Melody: In a test environment generally, but I actually ran into it when I was doing my test setup at a bank and maybe it’s because I do so much with banking, but I actually run into it a lot. And there wasn’t a lot of information online, so I make sure I mention it.

Carlos: Let’s just dive into that, just for a second. Besides 1433, the default SQL Server port, what other firewall things have you had to set up?

Melody: It’s to let Distributed Replay itself through.

Carlos: OK. So it had its own ports that it needs.

Melody: Yeah. Distributed Replay has its own pieces that it needs and making sure that the passwords and such that are being used are able to get through.

Carlos: There you go. Ultimately, it’s acting like an application, right?

Melody: Right. Exactly.

Carlos: It would need to connect that way.

Melody: Yeah.

Carlos: Good. One of the things you talk about, we’ve gone through all this part to set it up, we grab our Profiler, we’ve engaged our infrastructure team to help us set this up, potentially. One of the common issues you talk about is an empty result set. We’ve gone through all this work to set this up.Is there any way we can detect why we won’t be collecting this information before we coordinate and look like a goofball because we don’t have anything to show for what we’ve done?

Melody: One of the things, particularly if you have individual usernames and such…If the data that you collect is so old that the database you’re replaying it against doesn’t have those users in it, then you can’t actually do anything. You’ll have this profiler you’ll play it against and you’ll expect transactions to show up, but if those users aren’t there to create those transactions, then nothing really happens.It’s not so much that things won’t work, it’s just an empty result set that means you’ve missed something, you missed a step or you’re missing something in the coordination of your data between your replay and your database.

Carlos: OK. So you may need to take a look at that. Maybe that’s worth diving into for a minute. Setting up those log-ins. For using something besides an application account to connect to the database, we just need to make sure that exists on our test system as well?

Melody: That’s right. Usually, what I recommend people to do is take a recent backup of your production system around the time that you’re going to replay or collect your data, so that the two are in sync as much as possible.

Carlos: Sure.

Melody: When I did it, I took the backup that we normally do overnight. Every night at the bank I restore that in my test environment and then the next morning I did my data collection, so that everything was fairly recent and that I knew that the majority of everything would work fine. It’s not that it will error out, it will just not do anything.

Carlos: Sure. You will have anything to compare and collect against. It will act like the server is not doing anything.

Melody: That’s right. You will look really efficient, because it had nothing to do.[laughter]

Carlos: All these metrics. I see pure is really low. Those indexes are really good.

Melody: That’s right. You’ll look good for about three minutes.

Carlos: Well, Melody, that’s great information. Again, compañeros. We will have on the website — sqldatapartners.com/podcast. We’ll have some links and Melody has some blogging information. We’ll make sure that her slide deck from her presentation is up there as well. You can check that out.So Melody, it’s now time for the SQL Family portion from our program.


Carlos: Here, we’d like to talk with you more about you and your life experience. The first thing we always like to talk about is tools. What’s your favorite tool? This can be a free tool or paid tool, but why do you like it and how do you use it?

Melody: I’d have to say that Paul Randall and Kimberly Tripp’s website would probably be the tool that I like the most.[laughter]

Melody: It’s not a standard tool…

Carlos: That’s true.

Melody: …but there are a lot of queries there that will get you information that you’d never thought you could get. When I was doing the Distributed Replay and trying to get metrics and information, and making sure that I got the information I needed, I found it really useful and well-explained.It’s still one of my favorite go-to places. It’s like my little library, so for me that’s a good tool. Every client I go to is different all the time. I just need basic information and be able to help my client get done with what they need to get done, without telling them to buy something more.

Carlos: Now you’re a partner in your own firm, done lots of different things. As you’ve gone through your career, what’s one of the best pieces of career advice that you received?

Melody: I would have to say networking at Summit instead of making sure that you hit all the sessions you want. And make sure you buy the sessions afterwards. There’s always so much going on at Summit. The learning aspect of that is phenomenal. The nice thing is you can buy that learning and take it home with you and do it on your own time…

Carlos: In your own pace.

Melody: …when all those people are not there to network with. They’re honestly some of the nicest people in the world. One of the reasons I love Microsoft’s data platform is because the people are so generous in their time and knowledge.

Carlos: That’s right. I think even this interview is a small example of that. Ultimately, we met at Summit. You had come out to DC as well and we chatted there. Us getting together and connecting was the result of a little bit of networking at the Summit event.

Melody: Absolutely.

Carlos: I have a new question that hasn’t been in the SQL Family list of questions before. I’m looking to try this one out. You inherit a million dollars, what are you going to do with it?

Melody: That’s interesting. I would probably create a new tool to use in banking.

Carlos: There you go.

Melody: That would be so much fun.

Carlos: Start a little SAS or, or an Install tool. You’ll have to be careful with the regulators that will allow there.

Melody: More of an Add-on for banking that uses the Microsoft stack, but something specific that banking needs. There’s lot of things out there that they could really use that they don’t have right now. I just love working with great minds.

Carlos: It’s always nice to work with people that challenge you and want to make you learn more. Melody, our last question for today — if you could have one superhero power, what would it be and why do you want it?

Melody: I don’t know if you call it a superpower, but I’d really love to not ever have to sleep.[laughter]

Carlos: Interesting.

Melody: There are so many things to do in a day and so many things I would love to do that I just sometimes feel I don’t have time for. A lot of people will say, “I’ll do it when I retire.”, but I want to do it now.

Carlos: That is right.

Melody: Whether it’s learning or spending time with my family, whatever it is, I want to do more of it, so it’d be really nice to not have to sleep.

Carlos: Somebody was telling me the other day that one of the two deficiencies of human beings are either the need to eat and the need to sleep. We could get over with those and we’d be in business, right?

Melody: I’m OK with the eating part, because I can multi-task. I can eat and do other things, but I can’t sleep and do other things.

Carlos: That’s true. Well, Melody, thanks again for being here, I do appreciate your time.

Melody: It was a pleasure. Thanks for having me.

Carlos: Compañeros, if you liked this episode I invite you to leave a comment or review on our iTunes or Stitcher. This will allow others to find us, so they can more easily enjoy the program.It’s always good to have you, as always our show notes will be up at sqldatapartners.com/podcast. You can reach me on Twitter @CarlosLChacon or by email at [email protected]. And we’ll see you, the SQL trail.


Children: SQL Data Partners

Episode 26: Devops for the Database–Everyone is Doing it, Right?

1400DevOps for the database–you have probably heard the term if you work in an organization that is deploying code on a regular basis.  The developers have this down to a science, and now it is the database’s turn.  The idea of deploying code and having a certainty it will run correctly instead of crossing your fingers and hoping you don’t see the red error messages has a certain appeal.  There are lots of tools out there now to help with this, but it seems like we still drag our feet.

I know I did.  When I chatted with Cindy Bean from DBMaestro, I didn’t have much database change automation experience.  I had exposure to source control, but DevOps seemed more than that.  After speaking with Cindy I created my first database project in Visual Studio.  I definitely get the idea and hope to implement it fully in my environment.  This episode is an introduction to the idea of database change automation and how you might get started.

I am on the trail, and I hope you will join me as well.  Are you using database automation?  I would love to hear your comments/concerns in the comments below.  This won’t be the last time we discuss this and I hope you see your progress along the path.

[smart_track_player url=”http://traffic.libsyn.com/sqldatapartners/Episode26DatabaseAutomation.mp3″ title=”Database Change Automation” artist=”An Interview with Cindy Bean” image=”https://media.licdn.com/media/p/3/000/0c6/1ee/12e3abd.jpg” color=”#3176B5″ download=”false” social=”true” social_twitter=”true” social_facebook=”true” social_gplus=”true” social_linkedin=”true” ]

Transcription: Coming Soon!


Episode 25: Stretch Database

1400SQL Server 2016 has several new features and this episode explores the stretch database feature.  One of the unique attributes of this feature is that it uses both the on premise instance and the Azure platform.

I chat with Anthony van Gemert, one of the product managers for stretch, about how the features works and under what scenarios organizations might use stretch technology.

Are you planning to use stretch database?  Let me know via twitter.  I am @CarlosLChacon.

Transcription: Stretch Database

Carlos L Chacon: This is a SQL Data Partners podcast. My name is Carlos L Chacon, your host, and this is episode 25. Today, we’re talking about Stretch Database, a new feature in SQL Server 2016. I have with me the product manager for the Stretch Database, Anthony Van Gemert.

Now, the Stretch Database feature is unique in the fact that it is the only feature that I am aware of that uses both the on-premise instance and the Azure technology in a kind of combination way. So, we’re going to be talking with Anthony about how that works.

I think it may be a niche solution, but I think it’s definitely one that for organizations that want to try out some of the storage options that have large storage requirements, this might be something could take advantage of fairly easily. The integration actually looks pretty, pretty cool.

As this is the first episode of the new year, Happy New Year to you, if you haven’t heard that a million times already. I hope you’ve made some good resolutions this year which includes listening to a few new podcasts. If you’re new, we welcome you, my new compañeros, to the program and I hope you enjoy the content we share here.

As always, I’m interested in getting your feedback, on iTunes or on Stitcher. You can leave your comments also at sqldatapartners.com/podcast. It’s always good to have you compañeros. If there’s something you want to hear, you can reach out to me on Twitter. I am @carloslchacon, I’d love to hear from you and compañeros, welcome to the program.

Children: SQL data partners.[music]

Carlos: Sir Anthony, welcome to the program.Anthony Van Gemert: Thanks for inviting me.

Carlos: Thanks for taking a little bit of time. I know that lots going on there in the Microsoft campus and a windstorm to boot today, so thanks for chatting with us a little bit.Ultimately, today we do want to talk about Stretch. What Stretch Database is and some scenarios around that. Help me understand what Stretch Database is. Why would I want to consider that?

Anthony: Well ultimately, just the basic overview of Stretch is, Stretch is a feature of SQL Server 2016, and what Stretch allows you to do is selectively migrate your existing historical data, your call data, from your on-premise storage solution.It could be a SQL server or your SAN network, and we allow you to selectively migrate this data to the cloud. It’s transparent, your applications, your queries, and your tools and be aware of it.

We don’t impact your operations at all. It’s secure. We secure the data during transit and of course, you have all the additional encryption that always encrypted as you that we pretty support as well. The idea is that once you’ve moved this data to the cloud, it’s always online, always available but what it does is allows free up existing disc space.

This call data, this data that you used to have. A kind of your archive data, the historical data, the data that you didn’t really use very often but important data that you like to access when you need it. We take this off of your network and we store it in the cloud so that obviously it reduces that space you need.

It reduces your costs on maintaining your existing, on-premise data. It essentially reduces cost all around as far as managing storage and actually maintaining your data.

Carlos: I think one of the common scenarios is this I have some large historical data. Maybe this is sales information archive data. I just wanted to keep around that there are users who will query from time to time but it has a large footprint in my environment.Maybe takes up a large chunk of the database. I think the idea here is that I’m going to take one of those tables. I’m going to create an SQL Azure database, and I’m going to push that over there to Azure.

I’m going to get the benefits that you talked about and reduce storage cost and some of the administrative pains that I could potentially avoid there.

Anthony: That’s essentially it. The call data that we talk about is you’re actually right. This is large data. This is sales records. It could be that something patient that may have a medical history. It could be public record if you sort of local government things like that.This is data that is building up in the network. We all know that the data grows and grows. Your IT budget is always shrinking. You can always guarantee that but your data, you can also guarantee this is just going to continue to grow and grow.

We give you the ability. We build in that functionalities that allows you to take this data, identify this data, and push it to the cloud. The beauty of it, you don’t need to…You don’t know have multiple locations to gather this data. This data is still accessed from the local source for local instants.

Your data your user still access it through using same applications, hitting the same source service. Nothing changes. Your policies remain the same. Everything is managed locally.

We just simply give you the ability to go from that local server to the cloud, fetch the data bring it back.

Carlos: That is super impressive, and in that my application is now to change. I’m going to be able to migrate that. Recalling its stretch database but I’m really migrating table by table. I’m going to specify this migration.

Anthony: Actually, table by table. You can migrate your table. Also, we’re going to give you the ability to actually migrate just the individual rows of the table. Yes, you can migrate an entire historical table if that’s how your system is set up.If, for example, you have data that ages out after a certain period, we can give you the ability that you just migrate those rows that you care about so that you can actually keep your hard and your call data in the exact same table.

Carlos: It’s a word of thanks that they will notice [inaudible 06:34] that these things continue to evolve and change. You talked about specifying that row by row data. It’s not currently in the CTP released. Will it be there in the final release?

Anthony: It will definitely be there by RTM. Obviously we will update our users as we release our CTP releases.

Carlos: Versions?

Anthony: Yeah, versions. That will definitely be available by RTM next year.

Carlos: OK. Yes, very cool. That will give a lot of control, that ability for the users to be able to specify what they want to move over. You talked about individual tables. Ultimately, I am choosing the tables, then the data within that table that I want to move over to Azure.

Anthony: Correct. The choice of that data, we do provide you with tools that help guide you on which data would we consider Stretch Ready, the data that you don’t access very often. Ultimately the decision is yours. If you’ve decided this is call data and it’s Stretch Ready, you make that decision. You make the decision to migrate it to Azure.

Carlos: It will be interesting to see the 2016 Upgrade Advisor. Lots of need little features back in there, lots of Azure integration that I think this is one of the tools that folks will have to be able to use to understand if their data is Stretch Ready.

Anthony: Absolutely. What we are building with the Stretch Advisor you will get a set of rules built into it that we’ll run against your database and we’ll make a recommendation based on the data that you run through the tool. We’ll make a recommendation on the specific tables and will give guidance on those tables. If there is a table that is Stretch Ready, obviously we’ll identify that.The fewer tables that we consider Medium Stretch Ready will provide guidance on why that’s the case. Obviously again the decision is up to you then to whether you want to migrate that or not.

With each subsequent, a release we’ll be updating those rules as we support more and more functionality in Stretch.

Carlos: Now we’re chatting here in November of 2015. RTM, obviously it’s SQL Server 2016. We know we are still few months out. Things are subject to change obviously. Do you have any guidelines or any thoughts on what some of those Stretch Ready requirements would be at this point?

Anthony: A Stretch Ready table would be a table that you don’t access very often. If this is the table that you’re not making not a lot of traffic on this particular table, not a lot of inserts, deletes, updates, data that is essentially very infrequently accessed.They are type of data that auditors would be looking at historical data where essentially if you order to come, it comes knocking. This is the data that would be running queries against to order your business.

If this is data that your day-to-day applications are hitting on a regular basis, update, insert, delete, then that would not constitute a Stretch Ready set of data.

Carlos: You mentioned auditors. Occasionally, they are going to come and select that. That’s hopefully not more than once a year. If this data was occasionally used even like on a quarterly basis, for example some reports that needed to hit this for whatever reason, would that still be candidate eligible or…?

Anthony: Definitely, that would be candidate eligible. If this is historical data that would need to be accessed as part of your day-to-day business, it’s a historical data like old customer debts you are following up on a customer history, that would be a perfect candidate. You would access this data as part of your day-to-day operations. You would access it from the same applications.You would be making the same updates, deletes, inserts, whatever into your hard data. At the same time, you would be reading data from these cold table and it would be seamless.

Carlos: Let’s go into that and talk a little bit more about the changes that my database would experience one such Stretch. I’ve decided to take some of that, identify that eligible data. I pushed it there into the archive.Some of the things that are going to change is that, now the size of my on-premises database – I guess, I shouldn’t use that word on-premises, because the database is still one and the same. It’s still connected. I can only access it through my on-premise tools. I’m thinking of, if I take a backup of that database…

Anthony: Yes. Right.

Carlos: …that archived data that is now in Azure is no longer part of that backup, correct?

Anthony: Correct. This is how it reduces your maintenance costs significantly. You’ve gone from backing up and restoring potentially a massive database.Now, all you really need to care about is the data that you have on your source database, which is considered to be yours’. This is your active data, your active day-to-day data. You only need to backup and restore that data and manage that data, reduce your perf tuning, index building on that data.

Carlos: Right.

Anthony: A Stretch instance on the Azure side will take care of all the backups and restores and re-indexing and performance tuning in Azure. You don’t have to worry about that. It significantly reduces the amount of efforts to manage that same database.

Carlos: The other change that will happen, should you want to look at your execution plans, once the data is over there, is that, if it’s going to access that data, you’ll see a remote query option or a parameter there and because that data is now going over to SQL DBAs and the database system that’s running and grabbing that?

Anthony: When you’re clearing your data, obviously you’re clearing it from the local source. The feature is coming but we don’t have the feature to allow you to differentiate between, just give me the remote stuff or just give me a local stuff.There is a feature that we’re building in. That’s a functionality we’re building in. At the moment you send your query to the source database.

We package up that query and we send it to Azure. Whatever remote query is done in Azure, we do all the compute there and just send you back the results. The idea is that again, it has minimal resource reduction on your local server when running queries and maintaining a database.

Carlos: All right. You talked a little about what Stretch database is. Under what scenarios we would use it, some of the benefits. But it might be helpful to talk about what Stretch database is not. Right. So it’s not a replication tool?

Anthony: It’s not a replication tool, no.

Carlos: …or a DR tool?

Anthony: No.

Carlos: Right. In this instance, if the data still exists in only one place? It just happens to be in the cloud and all of the processes or all of the things you’d have to do to ensure consistency and availability for that data you still have to implement?

Anthony: Correct. The tables themselves still exist on your source server. Essentially, we take a mirror of the tables and your indexes and create a mirror in Azure. We’re literally just moving the raw data to Azure. This is not for replication or anything like that. Effectively, it’s a large database but just significantly easier to manage.

Carlos: Sure. Some of that might be helpful as we talk about the migration strategy of that data and I think, again things are changing all the time but there are two migration options. One is the trickle migration, where you can specify where the data will go and then the Stretch database will take care of actually moving those rows slowly, in small chunks at a time versus a, “Yes this table is ready to go over, how about it?”

Anthony: Yes. The current CTP 3 functionality that we have and this is the standard functionality. We have a set trickle rate for getting your data to Azure. We have a set number of rows that get moved at a time and it’s very small. The idea is that once you set the migration going, it shouldn’t impact your local performance. It shouldn’t impact your operations at all.We try to make it as small as possible so it has as minimal impact as possible. The idea is that as soon as you activate and enable Stretch and begin migrating data, it does literally just trickle. Large databases could take a significant amount of time. Obviously, there are cases in countries where the telecom systems aren’t as robust as is for example here in Redmond.

What we’re looking at is a functionality that allows you to, in fact determine what that packet size is and schedule your migrations to be during a particular time where there is significantly more bandwidth available and potentially cheaper. So that you could create a larger package sizes and migrate those during a specific period.

Carlos: Sure. So during the overnight hours potentially I can crank that up a little bit more than one might in daytime.

Anthony: Exactly. The idea is, this isn’t a cloud archiving solution where you create a massive big package and then migrate and push that to the cloud. This is specifically designed to not impact your day-to-day operations. This should run in the background and note it should be utterly transparent to you and you users.

Carlos: Now that I’ve got that data over, we did talk a little about backups. I backup the database locally. I’m not going to read all that. I have lets just say a billion rows, that table is being Stretched, those billion rows no longer exist locally. I back it up. I have a backup minus a billion rows. On the Azure side, I’m giving the redundancy, the reliability of having the multiple copies and things like that.

Anthony: Correct.

Carlos: In the event that I needed to restore that database to, let’s say, a test system. What’s that process going to look like?

Anthony: Very simply you would restore your local backup. Restore it to a different system. A part of that backup is the metadata that contains the connection information to your Azure instance. What you do is when you restore that backup to your local instance, you would send command that says create a copy of that cloud data.We create a copy at the same time and you could initiate a connection between the new instance that you’ve just deployed and a copy of that data. It doesn’t impact your production data.

Carlos: Sure.

Anthony: You can test against that particular copy. We will support snapshot by RTM. We don’t currently support snapshots the idea being that you’ll be able to deploy multiple systems, for testing using snapshots as opposed to creating a copy. A copy obviously is just more space.We don’t have that just yet, but you can create if you wanted to, set up a parallel system for testing or developing. Restore the database to a different instance. Create a copy at the same time, then do the handshake. Connect the two and your system will be up and running.

Obviously, you can test against that, and when everything is…We’ve done the reconciliation, because what we will do is reconciliation between the data that’s local and the data that’s in Azure. You’ve confirmed that everything’s as you’d expect. Everything’s reconciled. Obviously you could connect to your production.

Carlos: That reconciliation, that will take place all in the background?

Anthony: Correct.

Carlos: That’s going to be pretty nice. I had a thought as you were describing that process. What would happen if I were to issue an update command against my archive data? In the test environment I’m doing something silly, and an update statement gets loose. What would happen?

Anthony: Essentially, if you did that, and specifically on the CTP3, the current CTP3 release, we wouldn’t allow that. You wouldn’t be able to actually make that update to the Stretch table.

Carlos: That’s probably pretty smart. Going into that is, again in the case of the backup, is what would you do then if you wanted to take a backup? How would you get that information back to before it was updated? Lots of implications there.

Anthony: You would lose that moment in time.

Carlos: Keeping it frozen or solid helps me make sure that it’s consistent whenever I want to get it back. What kind of testing would you recommend for a company interested in looking at migration?

Anthony: Basically to see if Stretch is the right product for them.

Carlos: Right. It would be more like testing the waters, like, “Hey, we want to try this out, but, you know, we don’t have copies of you know of our terabyte systems laying around.”

Anthony: Essentially all you need is a large set of data that you don’t touch very often. A large set of what you would call your historical data, something that you’re not updating or changing regularly. That would just be a perfect test bed. Essentially install 2016 SQL server and Stretch comes as part of 2016 SQL server and test it on that table.If you’re on MSDN, we’ve just released a new version of the AdventureWorks database to support SQL server 2016. Within the actual package that you download from the MSDN download site, you’ll find a set of sample scripts that have been created specifically for Stretch database.

We’ve created a new table in AdventureWorks called Order tracking. It allows you to insert new rows into that particular table. That table is set up for Stretch. It’s what we’d call a fantastic candidate for Stretch. It allows you to test all the functionality on Stretch.

Once you’ve enabled your Stretch on that particular table, we have a sample script that’s been added to that download package. That can give you step-by-step instructions on how to test Stretch, the Stretch functionality, against that Stretched database.

You’ve got your pre-stretch testing, testing performance pre-stretch, and then stretching the database. You can run multiple queries against the local and the remote data. You can get an idea of the performance and you can get an idea of what…a feel for the performance on that particular table.

Carlos: That would give you an option, it’s set up, you can kick your tires there, get a feel for the process and then once you feel comfortable you begin moving that over.

Anthony: What we would recommend if you’re seriously thinking of trying out Stretch, what we would recommend is running the Stretch adviser over a production database. That will give you a very accurate idea of how Stretch-ready your actual production data is.

Carlos: We know that the Stretch implementation is baked into SQL server management studio now, in the 2016 version. Are there going to be add-ons to SQL server management studio 2012 or 2014 to allow us to Stretch?Let me clarify. I’m assuming that I’m running a SQL server 2016 database, that’s why I need to be able to stretch. Let’s say for whatever reason I haven’t updated my local client just yet. Am I still going to be able to manage those components from 2012 or 2014?

Anthony: With the current implementation, no, you wouldn’t be able to do that. This is specific to 2016. That may be different closer to RTM. I don’t have visibility of that just yet. Currently you’d only be expected to be able to use the functionality in 2016.

Carlos: We’ve taken the time, we’ve kicked the tires, we’ve used the AdventureWorks database, we think we’re ready to go. We’ve moved our data over into Azure, now our database is Stretched. A couple of things, which probably should have gone into testing. Is there best ways to estimate cost, of what it’s going to cost me to have my data over there?

Anthony: There will be a fix. I don’t have the fixed pricing model yet. That will come out closer to RTM, but there will obviously be a compute core cost for setting up your Stretch instance. In the upgraded adviser we will be building in a local storage cost estimate.You’ll be able to, when you run your upgrade adviser, we analyze the databases that you identify and we will give you a…you plug in how much it’s going to cost you per month per gig for your local storage. We give you a total of the amount of data that you have on those databases. You can just plug that into the Azure cost pricing estimator. That will give you an idea of what your storage costs will be in Azure.

As we get closer to RTM, we will have a clearer pricing model where you can actually calculate both the compute and storage costs together, but today it’s storage costs that we have. We can help you get the storage costs, but the actual compute costs unfortunately I can’t, because I don’t have the pricing model yet. I can’t give you that just yet.

Carlos: Sure. We are still in CTP so things are obviously subject to change. Once we were to migrate the database over there and for whatever reason let’s just say that our time has expired. In the case of some government regulation I have to keep my data for seven years. A year has passed. I no longer need what is now the eighth year of data.Do I go and get that back? How am I going to be moving that data back to premise?

Anthony: You will have the ability to un-migrate data. You’d be able to, using the tools, be able to un-migrate that data. You’d also be able to using the normal Azure functionality, be able to actually purge that data from Azure.You can still do that. You can use the Azure functionality for that. If required, you can un-migrate that data. Bring it back from Azure, bring it back to your local system and delete it from there if you want.

Carlos: Probably best practice is going to be to use the Azure tools to purge it from there. Wrapping up a little bit this discussion, we’ve migrated our data, are there any other implications? Some of the things that I either can or can’t use in my system, now that I have Stretch enabled. Availability groups, for example?

Anthony: Availability groups will be supported. Log shipping, yes. That will be supported. We don’t currently support replication, but we’re investigating it. It’s in the discussion, but we don’t currently support it. As we chart towards RTM I may be able to give you a further update on the replication. As far as partitioning, yeah. We support partitioning no problem.Encryption, absolutely. The encryption as far as if you’ve always encrypted, if you’ve encrypted. Whatever you’ve encrypted on your source remains encrypted, obviously. Fully supported there. The big thing that a lot of our users have been talking about is temporal tables. We will fully support that as well.

Carlos: Actually, help me understand that a little bit. When you say temporal tables, what does that mean?

Anthony: These historical tables, you have might data at a certain age of this particular data, this historical data, you could just essentially identify this data to automatically migrate at a certain period, certain time. It’s complete, hands-off, you identify the parameters. It would just take care of it for you.The data itself will still be accessible, you access it the same way, but we would have moved this historical data to Azure.

Carlos: It’s almost like a sliding window type thing. Once my data is X age, I’m going to move it over. You mentioned partitioning. One of the features that will be available in RTM which we talked about, will be the ability to define which rows, or based on a certain age, something like that. Say, “Yes, go ahead and move those over.”If I have a currently partitioned table, will that table be able to be stretched?

Anthony: Yes, it will be. That will be fully supported.

Carlos: That will be very nice. I think actually probably more likely is that there will be tables that got a little too big, a little too fast, maybe? That weren’t partitioned. [laughs]

Anthony: We have customers who have, I think the largest one was 1.3 trillion rows, obviously partitioned. I believe the majority of that is call data. We would certainly…That is a perfect candidate for Stretch.

Carlos: Very good. Well, compañeros, there you have it. A little bit of information about Stretch database. As we mentioned, as things get a little bit closer to CTP, we’ll bring Anthony back in, have a little chat, and get some of the updates on the team, what they’ve been doing, and make sure that you have the latest information.Anthony, before we let you go, we want to switch gears here just a little bit. Now we’ve gotten to the SQL family part of the program. I’d like to talk a little bit more about what makes you tick, the things that you enjoy. One of the things we enjoy talking about is different tools.

What tool do you enjoy using? It could be a free tool, paid tool, how do you use it and why do you like it?

Anthony: I suppose in a SQL sense, I would say definitely would be my upgrade adviser tool. I love woodwork, so my favorite tool is actually a table saw. [laughs]

Carlos: Very nice.

Anthony: That’s how I relax.

Carlos: What’s your current woodworking project?

Anthony: Nothing at the moment because it’s too wet at the moment. The woodworking season has passed.

Carlos: If you could have one superhero power, what would it be and why would you want it?

Anthony: Breathing underwater.

Carlos: Breathing underwater, Aquaman. OK, very good. Are you a big scuba diver?

Anthony: I do like scuba diving, yes. The idea, there’s so much unexplored planet underwater. Imagine being able to go wherever you want to explore that. The rest of the world that other people haven’t been to yet.

Carlos: Right, all that buried treasure down there.[laughter]

Anthony: There’s that, of course. There’s that as well. You’ve got to fund this somehow.

Carlos: That’s true. All that time you’ll be taking there. Anthony, thank you so much for being on the show again, I do appreciate it.

Anthony: Thanks a lot, Carlos. It was a pleasure being here.

Carlos: Compañeros, you can check out sqldatapartners.com/podcast for the show notes today. We will have information available as to the things get a little closer to CTP. We’ll see you on the SQL trail.[background music]

Children: SQL data partners.