Have you ever wanted a feature to help you assess the impact of future SQL Server upgrades, the impact of hardware or operating system upgrades, and for SQL Server tuning? Well that tool has been around since 2012, but I have found that not many people have used Distributed Replay. The level of difficulty to get this feature set up lends to the small adoption rate, but I chat with Melody Zacharias about her experience with the tools and I think it is time to give this feature a spin.
Are you using Distribute Replay? What do you think the best feature is? What is the worst thing about it? Leave your thoughts in the comments below. I look forward to hearing from you.
Transcription: Distributed Replay
Carlos L Chacon: This is the “SQL Data Partners” podcast and this is Episode 27. I am Carlos L. Chacon, your host. Today we’re talking about Distributed Query, or maybe, better titled, “How you can test changes in production before you actually break production.”
In a recent Twitter conversation, this tool was mentioned. If you’ve ever installed SQL Server, you’ve come across this feature, otherwise I do believe it’s 2012 feature, we’ll get into that in just a minute. I’m guessing that the majority of your compañeros haven’t used it. I know I haven’t fully set it up but what’s new, right? If you’re a regular listener to this program, you might have asked yourself more than once already, “What has he in our SQL Server?”
At times I wonder this myself but today we’re saved because I have Melody Zacharias on the show today and I know she has used it, so I’m safe there.
Melody is a partner with ClearSight Solutions, a technology firm in Canada dedicated to the banking industry. She, as a matter of fact, is the first Canadian I’ve had on the program. I should clarify, first Canadian resident on the show.
Chris Bell, a previous interviewee was on the show. He is also from Canada, but he settled in the States some time ago.
Melody is a great example of why the SQL community is so great. Distributed query is something I wanted to tackle, but wasn’t quite sure how to get started. But after sitting through Melody’s session, I feel a little more equipped to take it on. She’s put up some blog post and other material to get us started. The same holds true for you, compañeros.
Thanks for tuning it today. I’m glad you decided to tune in and I know it’s going to be a great episode. Compañeros, welcome to the show.
Children: SQL data partners.[music]
Carlos: Melody, thanks for being here. Welcome to the program.
Melody Zacharias: Hey, thanks. Great to be here.
Carlos: Earlier, we were actually in a Twitter conversation with Argenis. He happened to mention people were talking about this idea of testing their performance environments. They were looking to make changes in their environments, and this feature of distributed relay came up. I guess this is a 2012 feature?
Melody: Yes. Distributed replay came on the scene for Microsoft in SQL Server 2012. It’s continued from there through to 2016, but it didn’t show up until 2012.
Carlos: I feel like it’s been around a lot longer. I guess maybe I just done that many SQL installs that I’ve seen it. Compañeros, if that’s ringing your bell from you, that’s maybe where you have seen it is there is that install. We’ll talk a little bit about that and over that controller piece, I guess, that you can get with the SQL Server install.Why do you think distributed query is not getting a whole lot of love?
Melody: Distributed replay is not a common tool. It’s a tool that will be very useful in certain specific situations, but not, and because it takes a fair amount to set it up. The documentation is, I have to say, not the best that I’ve seen.
Carlos: Not so good.
Melody: Yeah. That can be part of the problem. It’s also hard for people to change. There’s still a lot of people who aren’t up to 2012, so…
Carlos: There is that.
Melody: Yeah. It is backward compatible in terms of acquiring your data load, but it’s not backward compatible in the ability to run it. You’d have to have, at the very minimum, a developer edition of SQL Server 2012 running in a test system so that you can actually do the replay part.
Carlos: Thank you. It’s called distributed relay. I think I said distributed query, but that’s not right, so distributed relay, replay. I’ll get it right. Distributed replay. Ultimately, I guess that if we’re going to define this, we would define distributed replay as, as what?
Melody: The ability to test your data your way. One of the key things about it is the ability to use your own data. Often, when you’re doing testing, it’s either difficult to acquire your own data, or you want to create specific situations that are very custom to either your industry, or your particular workload. It’s very handy when you’re checking things like indexes for example, because the indexes are specific to your data, right?
Melody: You’re not just testing general workload. You want to test a specific workload.
Carlos: To do that testing, we could set up Distributed Replay. Now, you mentioned certain scenarios that you would want to set this up. Are there some known scenarios where it wouldn’t be of the greatest use?
Melody: That’s a good question. Anywhere you want to test something that you require specific parameters for, it’s really great for it. But just general volume, if you want to do that. I can’t really think of anything that it wouldn’t work for. It may not be the best tool, but it would work for almost everything.
Carlos: I thought you were going with that is something like so testing the input or the throughput of your disks. Before SQL Server even gets installed, you could still do some of those SQLIO or whatever the new tool is that has replaced that to use that, because that’s something in and of itself.I guess maybe once you get into SQL Server, you’ll be talking about changes inside of SQL Server. That’s when Distributed Replay might be a handy feature.
Melody: Yeah, it’s definitely data-dependent. When you’re talking about data, but I can’t see any reason why it wouldn’t be useful, if whatever you were testing was related to data. Even on occasion if that data’s related to disks.Distributed Replay can have up to 16 different servers that are hitting your SQL Server. If you wanted to test workload, you can increase the number of clients that you have, that are hitting your SQL Server. You can do some performance testing that way as well.
Carlos: OK. There you go. Let’s go ahead and get into the setting up components. There’s some vocabulary or some terms that we should become familiar with. Let’s talk about some of those, now.When we go to set this up, some of the terms that I think about, we have on display here, the administration tool, there’s a controller. You already mentioned clients and then the target server. Are those the main moving pieces as far as architecture are concerned?
Melody: Yeah. Those are definitely all the pieces.
Carlos: When I go to grab my load on my server, I’m ultimately using Profiler, right?
Melody: Yeah. Distributed Replay is built on top of Profiler. You get that basic look and feel. Most people, nowadays, have actually used Profiler for one thing or another. The problem with Profiler is making sure you use it on server site trace.As long as you got that covered off, you can only run it in parallel or in serial. You’re not getting the parallel workload out of it. The sort of really what Distributed Replay brings to the table with it. It allows you to do the parallelism and really hit your server and change things up.
Carlos: If I understand what you’re saying, I could take a Profiler on a single server. Then I could replay through that. What you’re saying is that I’m just going to hit one courier at the time, basically. That’s not how real life works.We have many different queries coming. Distributed Replay allows me to separate that. It feels like I’m getting hammered by multiple connections or something.
Melody: Yeah. Think of it like a web connection. Your web servers are going to get hit from all sorts of different places. That information has to go into your SQL Server. If you have one web server versus you want to expand to four web servers, how is that going to affect your SQL Server? You can do that test.
Carlos: That’s an interesting thought. Is it only helpful if you have multiple clients? If you’re just going to do one client, is there any benefit to going through the trouble of setting up Distributed Replay? This is well off, just replaying the Profiler trace.
Melody: The single client is a standard feature. Unfortunately, that’s all you get in standard. It’s still useful to be able to use your own data. Even if you can only do it that way, there are a number of features within Distributed Replay that allow you to change parameters.So, it gives you a lot more flexibility than just your basic Profiler, anyway. Those parameters can be helpful in testing and tuning.
You can also multithread that single client. Even if you only have one client, you can multithread that client. That will change up your workload, too.
Carlos: There you go. That something we should point out is that if I’m going to use more than one client, I have to get Enterprise version. Is that what you’re telling me there?
Melody: Unfortunately, yes. [laughs]
Melody: We keep telling people that, but it’s true.
Carlos: That’s good to know. The first part, which might seem, at least to me, the most straight forward as I go out to my production database, I use a server site trace, a Profiler trace. I grab this workload. Then we’re going to set up the administration tool, we talked about, and a controller. What’s the function of the controller?
Melody: I would liken it to a project manager. A controller just keeps everybody on task, tells everybody what to do, when to do it, it’s like a scheduler. I think of it as the project manager of the whole situation.
Carlos: OK. It’s going to schedule the clients, which then point to the database.
Melody: Yeah. It coordinates that flow of information in data to make sure things are happening when you tell it to make them happen. One of the key things, before we get too much further on with that, when we’re talking about the data collection, there are specific things that you need with your Profiler.In 2012 tools, there’s a template that you can use within Profiler that is specific for Distributed Replay. That will make sure that you gather the key components that are required for Distributed Replay to work. That’s pretty important.
Carlos: OK, very good.
Melody: I just wanted to mention that to make sure that people are aware that you have to have specific pieces of Profiler to make sure that they can be replayed properly. That tool, although it’s a 2012 tool to allow it to collect data from other older machines can be installed on older SQL Servers, you can do that if you need older servers.
Carlos: OK, good deal. We want to be able to collect the right information…
Melody: That’s right.
Carlos: To go through all that trouble, particularly if we’re going to collect it off production, we know there’s a little bit of a watcher effect there, so we’re going to impacting that. We don’t want to be doing that too many times when I get all those things lined up.
Carlos: Once we have that, then each of those compete those pieces, the administration tool, the controller, the client then uses an XML file to do all those connections?
Melody: Yes. Unfortunately, there’s no pretty GUI, and that might be another reason people don’t use it.
Melody: It’s a lot easier to just follow through GUI, and know what to do. Because there are the need for the XML files and the need to do a command line prompt to get things running and working. That could be part of the reason why it’s not been as popular as I would think it would be otherwise.
Carlos: Sure. Now, we’ve gone to that. Let’s say we’re going to invest that time in setting that up. Can my admin tool, my controller, and my client, all be the same server?
Melody: Yeah, they can be. They don’t need to be, they certainly can be. In some situations, it would make it easier.
Melody: You definitely going to want to run this in a test system anyway.
Carlos: OK. All those components are…
Melody: It doesn’t matter.
Carlos: That’s right. I would think test systems…
Melody: It doesn’t matter if they’re all in the same machine. Again, it’s going to depend on what you’re trying to accomplish and what you’re trying to test.
Melody: A lot of that is quite specific to what you’re trying to accomplish. At the very least, you want to separate your SQL Server from the other components, I would think.
Carlos: Yeah, that’s right because that’s what you’re trying to test. You want to mimic your production environment, hopefully to see, “OK, if I change this, what’s that going to do to my environment.”
Melody: Yes, but honestly if you’re doing it in a test system and you’re only trying to see what the impact of the certain indexes are, there’s no reason why you would need more than two machines. They don’t have to be physical machines. You can just spin up a couple of VMs and get this going quite quickly.
Carlos: There you go. Now, I guess is that typical? I guess your typical setup, are you each one separate using Enterprise. That’s what your, I know you engage in lots of different environments.
Melody: Generally, it depends on what the client has available and what their bandwidth is.
Melody: Unfortunately, at home, that’s what I use. I use just a single SQL Server and a single machine VM for each.
Carlos: For each?
Melody: Yeah, while I do one for basically the admin aspects of it and then one SQL Server. I only have two machines when I do it in my home lab, but when I’m doing at a client site, I’ll expand that out depending on what they have available and what they’re trying to test.
Carlos: Now, we talked about gathering the trace on the production environment already, we are using Profiler. There’s no way to use extended events to capture that?
Melody: Unfortunately, not yet. Not in 2016 either. I have not seen anything coming for 2016 that includes that either, unfortunately. No love for Distributed Replay in 2016 yet.
Carlos: [laughs] I know all these other nuggets and candies they have for us in 2016, but no love for Distributed Replay. I have to get the program manager on here to see if we can figure out what’s going on there.
Melody: I’d love to be on that call.
Carlos: There you go. Now that we’ve done that and we have that trace, and again how long…Ultimately, I guess it’s going to be test specific. What is it that we were thinking about changing, but how long do you normally keep that workload around? I guess let’s just use the example for testing some indexes? Once I have implemented those index, is my workload, is it garbage or can it be reused?
Melody: It can definitely be reused. Once you have everything all set up, everything can be reused. Probably the data is the one thing that will change the most often. Because once you’ve got all the pieces set up because honestly that is by far the hardest part is setting up all the pieces.
Melody: Once that’s all set up in your test environment, you really don’t have to change it.
Melody: Once it’s all installed and set up, that’s the hard part. After that, there’s so many different pieces that can be fine-tuned and changed that you can rerun that same workload multiple times and get multiple results.
Melody: That said, it’s also because it’s Profiler and you have a template for collecting the information. It’s super easy to go and get more information. One of the things that happens in banking quite often is, at the end of a month, there’s a lot of processes that run.
Melody: A fair amount of your data changes at month end. That’s when a lot of my clients will gather a new data set.
Carlos: OK. Now, is that just like at that point because it’s regular like that? Do you just have your template, your Profiler or server site job, if you will set to run, “Hey I’m just going to collect for…” What’s an adequate time there or what’s normal time? An hour, four hours, maybe just a couple of minutes?
Melody: That depends on your volume really and depends on what you’re trying to accomplish. What I try to do in banking is do two separate runs. In banking, they have specific workloads that are typical for daytime operations.You’ll have a lot of teller activity, you’ll have a lot of ATM activity, that type of thing, which is specific to daytime. Then, they also do a completely different type of workload with reporting and mass updates at the end of their day. They have this sort of more like what we would consider years ago as batch processing that happens overnight.
Melody: I’ll often have two different types of workloads and I’ll collect one during peak hours, during the day. When I know most of their clients are cashing checks, I’ll usually pick that morning between 9:00 and 11:00. Then, I’ll also do when they do their overnight processing, I’ll pick the peak time of their overnight processing depending on what they’re trying to test, so I’ll pick a couple of hours in there. I’ll use those two different data sets depending on what we’re testing.
Carlos: What components have to change once I have that new profiler, is that just something that I’m taking on to the client to the run. What’s changing thereabout my environment? Does that make sense?
Melody: I think so. On the controller and on the client, there are specific things that can be determined like how many threads for that particular client, for example, and whether you’re running your update synchronously or asynchronously — those type of things.A lot of it is around performance, to allow you to scale out or scale back. There is time intervals, those type of things that can be set up.
Carlos: I see.
Melody: Just running your workload, there’s a huge number of performance tuning or performance related, either customized queries or things that you can get from other people that our whole SQL community has tons of in terms of performance tuning to check out what’s going on, on your system that can actually be run on your SQL Server, during the replay of that data.Even though you haven’t actually changed anything within your Distributed Replay Setup, there’s a huge number of things that you can actually test. I know Paul Randall, for example, has a number of queries that you can run to check your IO, your wait stats, all of that type of thing.
Carlos: You’re referring more to when I have this set up, and I want to see the effects of my change on my workload, I would use these DMVs or the queries to capture that information.
Carlos: Right. I apologize. My question refers to — I have the profiler trace. Where does the profiler trace live and when I change that, what would I have to change?
Melody: It’s just a file that will live on your controller and the controller determines how it gets replayed.
Carlos: Just override the other one and then your good to go?
Melody: It’s a simple file copy, so it’s a really easy thing to do.
Carlos: Very good. You have mentioned also, kind of this setup component, or setup process. That there are some firewalls settings or some things that you will need to set up so that all of the components can talk to one another. Particularly in banking, anytime you have to deal with firewall rules, it seems like there is always a big to do. Any blowback or issues you’ve had to resolve with those firewall rules?
Melody: The nice thing about Distributed Replay is, when you’re actually running it, you’re going to run in a test system, so you don’t usually have any blowback. What I warn people about is often whether it’s firewall or component services. You’re not always the one who has the authority to make those changes even in a test environment, depending on how large your shop is.
Melody: That is sometimes where the issue comes. I’ve had a couple people ask me about that and they’ll take my slides and they’ll show them to their sysadmin and then he’ll say, “Oh, oh, now I understand what you’re talking about.”It’s important to understand it enough that you can explain to the person who can actually make the change what needs to be done. You don’t necessarily have to know how to set up a firewall or change the components’ services settings. But it’s important to know, so you can have that conversation.
Carlos: Now if they’re in the same domain, in the same network, there’s no issues there, right? All the ports are generally open? That kind of thing.
Melody: In a test environment generally, but I actually ran into it when I was doing my test setup at a bank and maybe it’s because I do so much with banking, but I actually run into it a lot. And there wasn’t a lot of information online, so I make sure I mention it.
Carlos: Let’s just dive into that, just for a second. Besides 1433, the default SQL Server port, what other firewall things have you had to set up?
Melody: It’s to let Distributed Replay itself through.
Carlos: OK. So it had its own ports that it needs.
Melody: Yeah. Distributed Replay has its own pieces that it needs and making sure that the passwords and such that are being used are able to get through.
Carlos: There you go. Ultimately, it’s acting like an application, right?
Melody: Right. Exactly.
Carlos: It would need to connect that way.
Carlos: Good. One of the things you talk about, we’ve gone through all this part to set it up, we grab our Profiler, we’ve engaged our infrastructure team to help us set this up, potentially. One of the common issues you talk about is an empty result set. We’ve gone through all this work to set this up.Is there any way we can detect why we won’t be collecting this information before we coordinate and look like a goofball because we don’t have anything to show for what we’ve done?
Melody: One of the things, particularly if you have individual usernames and such…If the data that you collect is so old that the database you’re replaying it against doesn’t have those users in it, then you can’t actually do anything. You’ll have this profiler you’ll play it against and you’ll expect transactions to show up, but if those users aren’t there to create those transactions, then nothing really happens.It’s not so much that things won’t work, it’s just an empty result set that means you’ve missed something, you missed a step or you’re missing something in the coordination of your data between your replay and your database.
Carlos: OK. So you may need to take a look at that. Maybe that’s worth diving into for a minute. Setting up those log-ins. For using something besides an application account to connect to the database, we just need to make sure that exists on our test system as well?
Melody: That’s right. Usually, what I recommend people to do is take a recent backup of your production system around the time that you’re going to replay or collect your data, so that the two are in sync as much as possible.
Melody: When I did it, I took the backup that we normally do overnight. Every night at the bank I restore that in my test environment and then the next morning I did my data collection, so that everything was fairly recent and that I knew that the majority of everything would work fine. It’s not that it will error out, it will just not do anything.
Carlos: Sure. You will have anything to compare and collect against. It will act like the server is not doing anything.
Melody: That’s right. You will look really efficient, because it had nothing to do.[laughter]
Carlos: All these metrics. I see pure is really low. Those indexes are really good.
Melody: That’s right. You’ll look good for about three minutes.
Carlos: Well, Melody, that’s great information. Again, compañeros. We will have on the website — sqldatapartners.com/podcast. We’ll have some links and Melody has some blogging information. We’ll make sure that her slide deck from her presentation is up there as well. You can check that out.So Melody, it’s now time for the SQL Family portion from our program.[music]
Carlos: Here, we’d like to talk with you more about you and your life experience. The first thing we always like to talk about is tools. What’s your favorite tool? This can be a free tool or paid tool, but why do you like it and how do you use it?
Melody: I’d have to say that Paul Randall and Kimberly Tripp’s website would probably be the tool that I like the most.[laughter]
Melody: It’s not a standard tool…
Carlos: That’s true.
Melody: …but there are a lot of queries there that will get you information that you’d never thought you could get. When I was doing the Distributed Replay and trying to get metrics and information, and making sure that I got the information I needed, I found it really useful and well-explained.It’s still one of my favorite go-to places. It’s like my little library, so for me that’s a good tool. Every client I go to is different all the time. I just need basic information and be able to help my client get done with what they need to get done, without telling them to buy something more.
Carlos: Now you’re a partner in your own firm, done lots of different things. As you’ve gone through your career, what’s one of the best pieces of career advice that you received?
Melody: I would have to say networking at Summit instead of making sure that you hit all the sessions you want. And make sure you buy the sessions afterwards. There’s always so much going on at Summit. The learning aspect of that is phenomenal. The nice thing is you can buy that learning and take it home with you and do it on your own time…
Carlos: In your own pace.
Melody: …when all those people are not there to network with. They’re honestly some of the nicest people in the world. One of the reasons I love Microsoft’s data platform is because the people are so generous in their time and knowledge.
Carlos: That’s right. I think even this interview is a small example of that. Ultimately, we met at Summit. You had come out to DC as well and we chatted there. Us getting together and connecting was the result of a little bit of networking at the Summit event.
Carlos: I have a new question that hasn’t been in the SQL Family list of questions before. I’m looking to try this one out. You inherit a million dollars, what are you going to do with it?
Melody: That’s interesting. I would probably create a new tool to use in banking.
Carlos: There you go.
Melody: That would be so much fun.
Carlos: Start a little SAS or, or an Install tool. You’ll have to be careful with the regulators that will allow there.
Melody: More of an Add-on for banking that uses the Microsoft stack, but something specific that banking needs. There’s lot of things out there that they could really use that they don’t have right now. I just love working with great minds.
Carlos: It’s always nice to work with people that challenge you and want to make you learn more. Melody, our last question for today — if you could have one superhero power, what would it be and why do you want it?
Melody: I don’t know if you call it a superpower, but I’d really love to not ever have to sleep.[laughter]
Melody: There are so many things to do in a day and so many things I would love to do that I just sometimes feel I don’t have time for. A lot of people will say, “I’ll do it when I retire.”, but I want to do it now.
Carlos: That is right.
Melody: Whether it’s learning or spending time with my family, whatever it is, I want to do more of it, so it’d be really nice to not have to sleep.
Carlos: Somebody was telling me the other day that one of the two deficiencies of human beings are either the need to eat and the need to sleep. We could get over with those and we’d be in business, right?
Melody: I’m OK with the eating part, because I can multi-task. I can eat and do other things, but I can’t sleep and do other things.
Carlos: That’s true. Well, Melody, thanks again for being here, I do appreciate your time.
Melody: It was a pleasure. Thanks for having me.
Carlos: Compañeros, if you liked this episode I invite you to leave a comment or review on our iTunes or Stitcher. This will allow others to find us, so they can more easily enjoy the program.It’s always good to have you, as always our show notes will be up at sqldatapartners.com/podcast. You can reach me on Twitter @CarlosLChacon or by email at [email protected]. And we’ll see you, the SQL trail.[music]
Children: SQL Data Partners