How do you migrate 8,000 databases on a single instance in an environment that doesn’t tolerate downtime? Today’s episode focuses on migration strategy and how our guest Andy Levy faced the challenge of moving all those databases in the time constraints given him by the business. We also talk about some of the strategies Andy employs to administer and monitor all those databases.
“When people say, ‘I’ve got a lot of databases on my server,’ somebody will say, ‘Andy, you’re up.’ I SELECT COUNT(*) FROM sys.databases and just screenshot that, paste it and say, ‘is this how many you’ve got?’”
“Our organization had never done a major upgrade of SQL Server like this before, so there was some nerves, there was some trepidation.”
“I wrote this whole process for doing this whole thing, and when it started not working the way it should have, I started taking it really personally.”
“Even though the meat of the process took way, way longer than we anticipated, because our developers and our testers could pivot and get me the information I needed to support them, they were able to test in parallel with the rest of my tasks, and we hit our schedule at the end of the night.”
Listen to Learn
00:38 Intro to the guest and topic
01:54 Compañero Shout-Outs
02:37 SQL Server in the News
04:27 With just once instance, Andy had to migrate over 8000 databases from SQL Server 2008 to 2016
07:55 Their process of deciding what method to use – the methods that wouldn’t work
14:20 The method they settled on using
18:30 Andy’s Thanksgiving to upgrade day job: clean up the supporting cast
19:40 The implementation day plan and how it went
23:43 What happened at 5 minutes past 11pm?
25:34 Processes and steps that would Andy do differently
29:53 Do you use SSMS to get into that many databases?
31:16 Things that you see with large numbers of databases
34:42 DBATools was a lifesaver on this project – you’ve got to be automating things
35:22 SQL Family Questions
44:30 Closing Thoughts
About Andy Levy
Andy is a SQL Server DBA, PowerShell fan, former developer, SQL Saturday organizer and user group leader. He’s been in IT for nearly twenty years, doing web development, system integration, and presently database administration. When he isn’t picking queries apart and wrangling an unruly herd of databases, he can be found helping out with his son’s Boy Scout troop, making BBQ ribs for his daughter, or planning the next family vacation with his wife.
Music for SQL Server in the News by Mansardian
Carlos: Compañeros, welcome to another edition of the SQL Data Partners Podcast. I am Carlos L Chacon, your host, and this is Episode 161, and today on the show, we’re happy to have Andy Levy with us. Andy, welcome to the program.
Andy: Hi, Carlos, thanks for having me.
Carlos: And, my trusty sidekick, or pod–
Kevin: Sidekick? Sidekick?
Carlos: Podcast Extraordinaire? Friend of the podcast?
Kevin: Well, not anymore.
Carlos: Kevin Feasel is with us.
Kevin: Hello everybody.
Carlos: For how much longer, we don’t know, but for right now–
Kevin: I’m going to go on strike.
Carlos: Well, you did miss the last recording.
Andy: Hold out for more money.
Carlos: Okay, so ultimately, we are going to be talking about our subject today, is Andy’s experience migrating 8000, that’s 8 with three zeros, compañeros, 8000 databases on a single instance, and we’re going to let him tee up a little bit what that machine looked like. Not a small server, compañeros. But ultimately, we’re going to be talking a little bit about that and what he encountered there, and maybe why they have to go that route, a little bit of that. But first, before we do that, we have a couple shout-outs we want to give, so, I’ll give mine. I want to give a shout-out to Gordana Belic, Sanjeev Kumar, and Kris Adant. Thanks for connecting with me and reaching out. And Andy, you’ve got a couple.
Andy: I wanted to give a shout-out to Gail Shaw, Cathrine Wilhelmsen, Kevin Hill, and I’m sorry, John, I’m probably going to mispronounce this, John Hohengarten, also known as “Shoe” in the SQL Server Community Slack.
Carlos: Very nice, awesome. Well, good deal, yes, so we will be mentioning the DBATools a time or two, and maybe comparing and contrasting a little bit with the route that you ultimately decided to take. Before we do that, a little SQL Server in the News. So, now, these things change, we have to keep up with them. Anybody who’s doing database administration knows that we have to keep up with service packs, or CUs, now, and so Andy has one that has been released and wants to talk a little bit about it.
Andy: Yes, so as we record this, just last week, SQL Server 2016 Service Pack 2 Cumulative Update 5 was released. What it does not appear to have it, which a number of us are hoping for is support for the trace flag 460, which provides a more detailed error message with “String or Binary data would be truncated.” We were told a number of months ago that that would be getting backported to 2017 and 2016. It’s already in 2019, the pre-releases. It does not appear that that’s made it into this cumulative update, but hopefully another one will be coming soon. The other big fix, hopefully, that’s of interest to me is there was a note about fixing a high CPU usage condition for high end servers with many batches being run, many SQL batches, which kind of dovetails into what I’m going to be talking about tonight. we seem to be experiencing some of that, so I’m hopeful that this cumulative update will resolve that for us. We can talk about that later.
Carlos: Sure. Yeah, so Andy has several posts that he told about his experience and gives all the gory details about that experience up on his site, which we will link to in our show notes. For today’s episode, the show notes are going to be sqldatapartners.com/migratemany, so all one word, migrate many, or sqldatapartners.com/161, for the episode number. Okay, and so with that, let’s dive into this conversation. So, we’ve talked about, in fact, it was a conversation we had with Tracy Boggiano about administering lots of instances of SQL Server that Andy approaches me and he’s like, “well, I don’t have that many instances, but my instance has 8000 databases.” He’s like, “you want to talk about that? And oh, by the way, I’m going to migrate it.” Which is now complete, but he was in the works of doing that, and he’s like, “hey, are you interested in talking about that?” And I’m like, “uh, yes, that sounds like a very interesting conversation to have.” So Andy, tee this up for us. Let’s talk about the server. Let’s give the physical specs, here, and why 8000 databases?
Andy: So yeah, when we spoke back in November about this, I didn’t want to really set anything up, because I wasn’t sure yet if we were going to be successful, and I wanted to make sure we were successful before I went out and talked about it on the internet.
Kevin: Smart idea.
Andy: And actually, I had a number of conversations at Summit with people that gave me a lot more reassurance that we were on the right path with our migration. So, our server, it’s actually a two-node fail-over clustered instance. We have 16 physical cores on each server, so two distinct boxes, each one with 16 cores in it. 768gigs of RAM, three quarters of a terabyte of RAM, about 4 terabytes of storage, 3 terabytes of which is our databases, and I say 8000 databases, it’s over 8000. About 6 months ago, we were actually over 9000 and I was able to clean up a whole bunch over the course of the past few months. So, when people ask in the Slack channel, “I’ve got a lot of databases on my server.” Every now and then, somebody will say, “Andy, you’re up.” And so I just SELECT COUNT(*) FROM sys.databases and just screenshot that, paste it and say, “is this how many you’ve got?” So yeah, it’s kind of interesting. I started this job, just seeing how many databases it was, and that was about two years ago. Because in the past, my experience had been a dozen, maybe two dozen databases on an instance, and I get dropped into this new environment in my new job and at that point it wasn’t 8000 databases, we’ve grown from that point. And I was, “wow, how am I going to handle all of this?”
Andy: As far as the why, why this many databases, we’re a software-as-a-service platform and each customer gets their own website, and each website is backed up by its own database. Now the number of databases we have does not directly map the number of customers that we have, but just to give you an idea of that, that’s why we have so many databases there, is because we have a lot of customers.
Carlos: Sure. This seems primed almost like an Azure SQL Database type environment, so 8000, at three terabytes. I guess there’s some pretty good sizes in there, I imagine.
Andy: Our average size is around half a gig per database. But we have some that are very, very large, compared to that. We have some that are quite a bit smaller, maybe we set up the shell of something and it hasn’t been finished, yet. It’s got a pretty wide range on the sizes.
Carlos: Okay, so when you think about migration, and ultimately, the precipitous of this is end of life, for SQL Server 2008. I think you were on 2008 R2?
Andy: 2008 R2, yes.
Carlos: Yes, and so for anybody who might be listening, that end of life is coming up in July, so the extended support will go away. Windows Server 2008 somehow gets extended until January of next year. There’s a little more time on the OS for whatever reason, but SQL Server is going to get the plug pulled this July. So if that’s something that’s of concern to you, compañeros, of course we’d be very interested in chatting with you about that. But yeah, so this is the conundrum that Andy finds himself in. He’s got 8000 databases and so now the question becomes, how do you move it? And so there’s a couple of tried and true methods and so, let’s go through those and maybe talk about why that would be a challenge. So you talked about DBATools and their claim to fame is the migration process. That’s kind of what started the whole DBATools thing in the beginning. And so, yeah, let’s start there and talk about why you didn’t want to use that, per se, to migrate?
Andy: Yeah, so like you said, when Chrissy originally started DBATools, it was really just one script and it was Start-DbaMigration, actually I think she called it Start-SQLMIgration at that point. I would say probably, the most common use for that is do a backup and restore. You backup from the old instance and you restore to the new instance. The challenge that I face there, and you’ll hear this over and over again, is it’s single-threaded and the time that it takes to backup that many databases and then restore them it’s just prohibitive. We’d be offline all weekend long. Just as a comparison point, we use Ola Hallengren’s maintenance solution to do our backups and we actually have that multi-threaded. We have six concurrent jobs that run for that, and it takes us, right now, between 3 and 4.5 hours depending on the day and we’re not sure why it’s so inconsistent. But it takes 3 to 4.5 hours across six threads to backup. So if we look at using Start-DbaMigration, that’s single-threaded, it’s going to take us the better part of a day just to do the backups. So, because a lot of our customers have operations on the weekends, we can’t be offline for an entire weekend to do backups and restores for migration. So that was right out. The next thought was, “well, maybe we can do log shipping.”
Carlos: Sure, get it all set up and, yep.
Andy: Get it all set up and then just be copying stuff over and again–
Kevin: I see where this is going.
Andy: How long does it take to cycle through 8000 databases just doing a transaction log backup with all of them? And it turns out, we didn’t even test that out. We actually have almost all of our databases is the simple recovery model, because of what our SLAs and our RPOs are. So if we were to switch over to use the full recovery model, it’s certainly possible, but again, we’d have to restore to the destination server, the new server, taking these log backups, copying them and restoring them. We wouldn’t be able to do it every 15 minutes. We’d probably have to do it every couple of hours. The other issue that we faced was, we didn’t have enough space allocated to us on the new server to stand up a complete clone of every database. We were pinched for space, pinched for time.
Carlos: Yeah, but you were ultimately moving discs over, some of that disc coming with you.
Andy: In order to do it, we would have to arrange for another four or five terabytes to be allocated to us by our hosting provider, and if we could have gotten more discs added to the device, I don’t recall how much we would have had to add. But ultimately, we just decided just because of the volume, doing the log shipping was just not going to work in our favor. So, the other thoughts that we had, that we kicked around were either transactional replication or mirroring. The trouble with transactional replication, there’s actually two issues. First, my understanding is that you cannot do replication with more than a two major version difference for SQL Server, and so going from 2008 R2 to 2016, that’s three major versions.
Kevin: You would have to republish it, yeah.
Andy: The other issue is that there’s a hard limit of 32,767 articles that you can have in a transactional replication setup and we have way more than 32,000 tables across our 8000 databases.
Carlos: Sure, okay, wow. Yeah, so just the size perspective is a little daunting.
Kevin: And the idea of creating hundreds of publications just made me nauseous.
Carlos: Oh yeah, that’s right. I mean dealing with one sometimes makes me nauseous.
Andy: Yeah, so everything that we could come up with and actually I had a good hour, hour and a half long discussion in the DBATools Slack channel one morning, just brainstorming, every idea, and I talked to my colleague and I, we talked it out quite a bit as well. There’s two of us in our company, two of us managing 8000+ databases. Talked to other people in my local user group, and every idea that we had was, “it would work great for a dozen databases. I don’t have a dozen databases.” Then the last option is just doing an in-place upgrade, which our hosting provider doesn’t support, and I don’t have a good feeling about doing that anyway. I know it’s a lot better than it used to be, but you don’t have much of a parachute if something goes wrong.
Kevin: Going from 2008 R2, that’s a pretty big jump.
Andy: It is.
Carlos: That’s right.
Kevin: If you’re talking ’16 to ’17, I’d be much more willing to say, “ahh, go for it”.
Carlos: Yeah, especially, when you’re changing operating systems as well, right?
Andy: No, we were already on Windows Server 2012 R2.
Carlos: Oh, okay.
Andy: So February of 2018, we had to upgrade the servers, and at that point, we also upgraded the operating system, and we actually learned a lot about our environment in doing that upgrade.
Carlos: Oh, okay.
Andy: So, we hashed through all of these possibilities and with all of them, we kept coming back to the other challenge of, how do we just move three terabytes of data around. And so, what we settled on, ultimately, was, “let’s just go ahead and stand up a new cluster and take the LUN off the old cluster, detach it, attach it to the new LUN. So we’ve moved the data by just rewiring it, and then just zip,” I say zip, it ended up taking a little more time than it sounds. “Go through all the databases and attach them to the new instance.”
Carlos: Right, so you’ve neutralized the data movement, per se, and then are then making, in essence, some metadata changes to the SQL Server to get those up. Now, the one trick there, which I’m sure we’ll get into here is, so then you actually have the SQL Server upgrade piece. SQL Server’s going to then upgrade all those databases. Yeah, so I guess what was that like? I guess, you know, continue the story.
Andy: Well, yeah, so what we did was we moved the LUN over and attach all of the databases and what I skipped over in the middle there, was once we had the LUN detached from the old server, we also took a SAN snapshot. And the reason for that was our organization had never done a major upgrade of SQL Server like this before, so there was some nerves, there was some trepidation. I’d been through one or two upgrades at my previous job, but never anything on this scale. We migrated a couple of databases with just a backup and restore and unfortunately, I learned about DBATools in the middle of that project and I couldn’t rewind the clock. It would have made it so much easier! So there were some nerves there, and I wanted to make sure that we had a backup parachute. So, our parachute was we’re taking a SAN snapshot, and if something goes wrong on 2016 after we get everything moved over, we just swing the LUN back, revert the snapshot, bring up 2008 R2 and it’s like nothing ever happened. The backup parachute was all of our backups that we take, because we’re not crazy DBAs who fly by the seat of our pants. Yeah, we’ve got our snapshot, we’ve got backups, so we’re covered, there. So, yeah, getting into the actual attachment process, again, DBATools has a function for that. It’s called Mount-DbaDatabase and it works really well. The trouble is, again, I’ve got so many databases, it’s going to take a long time, because even though it only took maybe 5 or 10 seconds per database to attach and do the metadata upgrade, the scale gets me. So, what I ended up doing was writing another PowerShell function as a wrapper for Mount-DbaDatabase and I’ve actually got the whole function in one of my blog posts, and what it does is it mounts the database, there’s a logical rename on the files. I’m actually looking at the code right now to remind myself of everything that it did. Optionally rebuild the indexes, because I learned from a blog post by Erin Stellato that if you’re upgrading from anything pre-2012 to 2012 or later, there is a change to the on-disc format of the non-clustered indexes, so it’s a good idea to rebuild those indexes. It’s not required, but it’s beneficial to rebuild those indexes when you do that upgrade. So optionally, rebuild the indexes, upgrade the compatibility level, and then the last one was making sure that the database owner was set properly. So I wrote this function, and I tested it with about 50, I think it was 50 or 60 databases, and it worked really, really well. I would run it and just extrapolating, I figured, okay, for me to do 800 databases, it will take me about 10 minutes to do those, so that’s pretty good. If we can get through attaching all these databases in, in an hour and a half, we’re in pretty good shape, here.
Carlos: Right, right, everybody’s happy.
Andy: That was the idea, and so we did it. I should take a step back, actually. So I said I did some testing. We spun up the new cluster right after Thanksgiving. We did this upgrade December 8th. So, right around Thanksgiving we got the new cluster turned over to us. It had a couple of small drives attached to it, so we had all the drive (?) and the paths ready to go for when we did the switch-over. And I was able to use Start-DbaMigration to bring over my (?) configuration, my logins, my linked servers, all of that kind of stuff so I didn’t have to manually recreate all of that.
Carlos: Yeah, yeah, the supporting cast, if you will.
Andy: Yeah, exactly, the supporting cast. And it was also a good opportunity to clean out some cruft, so we’d had logins that we didn’t need anymore, so let’s just bring over the logins that haven’t been disabled, so now we’ve got a clean slate. Same thing for agent jobs. We had a whole bunch of agent jobs that were decommissioned. We disabled them, we didn’t really need them anymore. They hadn’t been run in 6 months, we don’t need them, so I bring over just the active jobs. So we were able to check out a whole bunch of the system without having everything live, yet. So that was my Thanksgiving to upgrade day job. So, on Saturday I get in there late-morning and kick off the final backups we need to kick off. We start preparing some other data to do some reports in case our customers need some information, they need some other data while we’re offline they can call in and we can then email them a PDF. We shut down 2008, and coordinate with our hosting provider. They did all the moving of the LUN, brought up the server, and I kicked off my first batch of 800 databases, and it ran in about 10 minutes. I said, “great, my predictions were right!”
Carlos: Your numbers have lined up so far.
Andy: Everything lines up! Then I run the next batch, and it takes 15 minutes. And I say, “okay, well, maybe I had some larger databases in the mix, here. Maybe there’s something else going on there.” I run the next batch and it’s 20 minutes.
Kevin: Uh oh.
Andy: We had this all planned out in Excel for our whole implementation day plan. We had developers and testers coming in at particular times based on our initial predictions of when things were going to be attached and ready to go. We had communicated to people outside IT, “this is what our timeline looks like it’s going to be”, and I’m looking at these numbers and thinking, “oh boy, I just blew the whole schedule. I just wrecked the whole schedule.” But you know, we’ve got to keep going, keep pushing through and each batch progressively takes longer and longer and longer. And I haven’t dug in far enough to really understand why, but I suspect what was happening is when you connect an instance with SMO, which is the Server Management Objects, the .NET classes that we can use to manipulate SQL Server through code, that isn’t SQL code, my thinking is that it does a whole bunch of enumeration of items when you establish your connection and as I was putting more and more databases on, it had to enumerate more and more things. So, in each one of these batches of 800 or so, I was running 12 concurrent threads, so I’m just throwing work at the server like you wouldn’t believe. It was handling it like a champ, it was just slowing down because of, I think this enumeration. So, we start getting close to dinnertime and I’m only 75% of the way through and our testers are supposed to be testing now.
Carlos: It’s always that pressure failed, you know, like, “are we ready, yet?”
Andy: The hard thing for me is, I used to be a developer, and I wrote this code, I wrote this whole process for doing this whole thing, and when it started not working the way it should have, I started taking it really personally.
Carlos: Oh man.
Andy: I’m thinking, “I’m letting the whole team down. It’s my code, it’s not working right, this is all on me.” And it’s a really bad place to get to in your head. Especially after the whole week saying, “yeah, all my tests say it’ll be an hour, hour and a half, tops,” and now, here we are at four hours and we’re still waiting.
Carlos: So, Andy learned a valuable lesson about giving yourself a little extra padding.
Andy: Yeah, yeah, that was tough. We learn from our mistakes, though.
Carlos: That’s right, there you go.
Andy: So, our QA team and our developers, they kind of got together and said, “well, you know, what databases do you have already done? We need these databases. If we just have these 12 databases already set up, we can start testing while you’re attaching the rest of the databases.” I said, “oh hey, that’s a great idea, why didn’t I think of that?” So, we kind of pivoted a little bit and said, “okay, here’s the list. I’m going to do just those databases, get them done, get them attached.” And so, once I had that done, they were off to the races. They were testing at the same time that I was attaching all the databases. And in hindsight, yeah, there were probably other better ways we could have done the attachment. You’re always thinking about that, even down to the last couple of days before implementation, “oh, maybe I can do it this way”, and I have to keep reminding myself, “nope, I’ve tested this way a dozen times, I’m not going to change gears, now.”
Carlos: That’s right, that’s right.
Andy: Because, we know this process works, we’ve tested it, we know it’s going to work, so let’s just stick with that plan. So yeah, we got through, we started attaching databases about 2pm, a little before 2pm. We finished at about quarter of 10.
Carlos: Oh wow, okay. Your hour and a half was just a little off.
Andy: Yeah, it was off by what, a factor of four, four or five.
Kevin: Not even an order of magnitude, you’re fine.
Andy: No big deal. The QA guys and the developers, they kept going through their test plans and I was starting to get real nervous because they weren’t finding anything. Nothing was popping, you know?
Carlos: Yeah, isn’t that weird? Yeah.
Andy: It’s not that I want things to be broken, but I’m always afraid that if we don’t find at least one little thing, we must have missed something.
Andy: You know? It’s just that paranoia sitting in the back of your mind. But yeah, they got through it, they didn’t find anything that night, and on our project plan we’d said, “our final go-no-go decision is going to be 11pm. We’re going to make the call at that point. Do we stay with 2016 or do we fall back to 2008?” We had that conversation, and that was the target time based on where we were with the testing and everything. We had that conversation at 5 minutes past 11. So even though the meat of the process took way, way longer than we anticipated, because our developers and our testers could pivot and get me the information I needed to support them, they were able to test in parallel with the rest of my tasks, and we hit our schedule at the end of the night.
Kevin: And you could have known that if you’d just had a Gantt chart. There went our last viewer.
Andy: Well, it’s a question that I should have asked up front, at the very beginning of this whole process, which databases do you need just to start to test it? That’s something that I’ve learned for the next time we do this sort of project.
Kevin: Speaking of that, let’s say tomorrow your boss comes to you and says, “Andy, we’re going to go to 2017.”
Andy: Well, first I’ll hug him.
Kevin: What would you do different?
Andy: Yeah, I would love to be on 2017, because I really, really need that string aggregate function.
Kevin: It’s a good function. But are there any processes that you would change? Any steps that you would change along the way to do this again?
Andy: Yeah, definitely. I think number one, I would probably not do the attaching the databases through SMO and DBATools. DBATools is phenomenal, I cannot imagine my day-to-day life without it, but at that kind of scale, unfortunately you don’t run as quickly as maybe we’d like to. And I fully acknowledge, we are an edge case. We are way out on the edge of edge cases. So, yeah, what I would probably do instead is I’d still use PowerShell to do it, but I would just generate the attach database SQL scripts and just run those directly, rather than going through SMO. I think that would probably run a lot faster. That’s number one. Number two, I’d like to try the log shipping process, just to see what it would look like, and how quickly we could get through things with that.
Carlos: Maybe if they had a certain scale, so databases over X size, potentially, instead of doing all of them?
Andy: Yeah, something like that. Try maybe a hybrid approach, do some of the databases this way, other databases, by detach, reattach. Those are probably the two big ones. I think I would also try a lot harder to get more space on the new instance, even if it’s just temporary, to be allocated so we can test a larger scale of the process. We can’t do everything all at once in a test environment, but we can get closer than 50 databases, which is what I tested with.
Kevin: True. Now, are there any non-technical process-style changes that you might have made, other than talk to QA and say, “hey, what do you need upfront to begin testing?”
Andy: That’s a big one. The other thing, this may be considered technical, we did have one problem and it took us several days to find it. What we didn’t do was we didn’t shut off the old server, so we had shut down SQL Server, but we had not powered down the physical machine. We had one place in our code where we had a connection string that we missed, that was still pointing to the old server, and so it was tied. And the embarrassing part is that it used that query string every time somebody logged into one of the websites. And so, because the server was still powered up, it responded to a ping, it would fire off a query and then wait 30 seconds for it to time out. It took us several days to find that, and once we did, dude, it was kind of one of those face-palm moments, “why didn’t we think of that?” So yeah, I would definitely power down the old server when we’re doing our testing and everything, because then those sorts of things would pop up immediately. The login process would break, and that’s going to be pretty noticeable when you start testing.
Carlos: Right, it’s almost like negative testing. Like, “I shouldn’t be able to do this anymore.”
Andy: Yeah, so those were the two maybe non-technical changes to the whole thing that I would do. We were really constrained for time, as far as when we could start and when we had to be back online because of the nature of the business, so anything that we could do to shorten up our downtime would be beneficial, there.
Carlos: Yeah, that’s pretty cool. Another one of my takeaways, as you mentioned it is not meddling with the process. I know I’ve been, in my younger days, you keep tinkering. You mentioned like, “hey I have this process, but what if I were to do X?” And you’re like, “okay, the testing’s done. That bridge has kind of been crossed,” and even though your times didn’t go your way, necessarily, the fact that you were able to move forward is huge. That’s way better than getting halfway through and the thing kind of blowing up on you or whatever, and then what do you do? You’re kind of thinking on your feet, so to speak.
Andy: Yeah, and funny you say thinking on your feet. I have a standing desk and I never sit. I think I sat for maybe 5, 10 minutes that entire day. It was you know, just kind of standing at your desk pacing back and forth, “is it finished yet? Is it finished yet?”
Carlos: Off topic from the migration itself, one of the things, and maybe DBATools is the answer.
Andy: DBATools is always the answer.
Carlos: Yes. Do you ever use SQL Server Management Studio to get into the individual databases, and if so, does that mean you’re always filtering?
Andy: That’s a great question.
Kevin: Never use the dropdown.
Carlos: Yeah, I was going to say.
Andy: You’re absolutely right. Yeah, that’s right, Kevin. I learned that the hard way.
Kevin: Yeah, so did I.
Andy: My first couple days on a job, I went up to the drop-down and hit it and then SSMS just kind of froze up for 5 minutes. I said, “well, that’s not going to work.” I still use the object explorer, and once you get past the initial expanding the tree of databases, which that can take a minute or two or refreshing lists, it’s actually not that bad. I’m not sure if there have been improvements under the hood there, or what exactly is going on, but it’s not too bad. I don’t use the filter in the database in the object explorer, because the majority of our databases just have numeric names or just integers. So I click anywhere in the database lists and start typing the number and it jumps me right there.
Andy: And then any of the alpha numeric databases, they start with a letter, they all sort to the bottom, because that’s just how it gets sorted. So I just scroll all the way to the bottom and boom, there they are.
Carlos: Bada bing, bada boom. Very nice.
Kevin: A couple other questions on the topic of administering a very large number of databases, what types of things have you run into that you haven’t seen before back when you were working with a much smaller database set? I’m thinking things like threadpool exhaustion or any other oddities?
Carlos: Oh, sure.
Andy: I think we did have threadpool exhaustion, I want to say in 2017, we were having issues there. We used to have a problem, and it has not happened to us in the past 6 months, knock on wood, actually no, it’s been more like 9 months, where it seemed like something’s going haywire on the webservers. And they would start throwing a lot of requests, the same query against many databases. And the combination of the query being just a SELECT * against a 100+ field table and most of the fields on that table being either NVARCHAR 8000 or NVARCHAR MAX, you get very large memory grants.
Andy: And so the webserver would execute the query, it would throw the query at us, we’d throw the results back, and the webserver would not disconnect the result set it’s not using disconnected record sets from, if you remember from classic ASP days. So what would happen was the webserver would be churning through those results and not telling the database server, “hey, I’m done with these results. You can close the connection,” so we’d get ASYNC_NETWORK_IO waits. So what happens is, we have a query come in, it’s a very large memory grant, I’m talking on the order of 5, 10, 15 gigs, and it gets held open for two, three, four minutes. Meanwhile, we’re getting more of those same queries getting thrown into the database and if we didn’t catch them and kill them, we’d exhaust our memory and we’d get RESOURCE_SEMAPHORE waits. So, I’ve got three quarters of a terabyte of memory and I’m getting SQL Server saying, “hey, I’m out of memory.” Which seems kind of crazy on the surface. That has somehow settled down. We have not had that problem since, I want to say March of last year.
Carlos: But the solution there is through the code, right? You’re like, “hey, I’ve got to cut these connections off”?
Andy: Yeah, what we ended up doing was, as an emergency solution, was we just get in there, we kill these processes, because we know that the webserver has the data, it just hasn’t released the connection yet. So we just kill off those queries as they’re coming in and we start playing Whack-A-Mole with it or maybe it’s more like Hydra. We kill one query and two more pop in.
Kevin: Because they were all waiting for resources.
Carlos: Yeah, that’s right.
Andy: Exactly, it was just a huge log jam and once you give a little bit of room, more come in. But it happened quite a bit in 2017 and then we implemented SentryOne at the end of 2017 and then we were actually able to see, “hey, this is what’s happening, these are where these are coming from,” and start attacking them very specifically, tackling that issue right there. What’s great about having a monitoring suite, whichever one you pick is, it shows you all the ugly parts of your code. What really stinks about having a great monitoring suite is, it shows you all the ugly parts of your code. But what it’s done is really it’s given us places to target. These are the places that we can optimize our queries, we can optimize our schema, we can optimize how our applications are interacting with the database and through that, we’ve been able to significantly reduce the frequency that we have any kind of significant problem.
Andy: I gotta say it again. DBATools was a total lifesaver on this project. Without that, it would have been a much, much more difficult migration end to end. Just setting up the new server, getting things moved over to stage it, and then doing the actual implementation day. It’s been such a productivity booster for me that I can’t imagine not having it, now.
Kevin: This episode brought to you by DBATools.
Carlos: There we go, happy to give them a plug, there.
Andy: Oh yeah, and I’m not just saying that because I’ve worked on the project, too. At this scale, you have to be scripting things. You have to be automating things.
Carlos: Okay, so shall we do SQL Family?
Andy: Yeah, let’s do it.
Carlos: All-time favorite movie?
Andy: All-time favorite movie? Well, I actually have 5 listed here.
Carlos: Oh, here we go.
Andy: Top of the list for me is always Raiders of the Lost Ark.
Kevin: All right, I’ll give you that.
Andy: I will watch that movie anywhere, any time.
Kevin: The melting sequence was really well done, special effects-wise, for the era.
Carlos: For sure.
Andy: Really, the whole movie, I mean, the opening scene, where you don’t even see who Indiana Jones is for several minutes. He’s this mysterious figure and he’s navigating through the jungle like he’s always lived there. Spielberg just did an amazing job by that movie. I absolutely love it. And I could not wait until my son was old enough to watch it to show it to him.
Carlos: That’s funny. Okay.
Andy: There’s another one on the list that he’s almost old enough to watch and just because it’s a little gory and that’s Jurassic Park.
Kevin: The original?
Andy: Of course, the original.
Kevin: Yeah, okay, cause there’s like 14 of these movies, now, so you know.
Andy: I think I saw the first three and I don’t remember the second two, and I haven’t seen the new iteration of it, but the original Jurassic Park, that’s a classic for me. But Raiders of the Lost Ark, I mean, I know they’re both Spielberg, but yeah, Raiders of the Lost Ark is number one for me.
Carlos: Well, sometimes sitting in the passenger side of the vehicle and I look in the sideview mirror and I see that little etching, you know, “objects may appear closer than they are.” I mean, I think about that movie every time.
Andy: Oh yeah, oh yeah.
Kevin: Every time I get a glass of water.
Carlos: Okay, so the city or place you most want to visit?
Andy: That I most want to visit? So this is really tough for me, because the three that I really wrote down are all places that I’ve been to before, but I want the time to really explore them, which I didn’t have a good opportunity to. But yeah, I think the top of that list is San Francisco.
Carlos: There you go.
Andy: I’ve been there, I was there for a conference in 2001. And I got to hang out for a little bit there with a friend who lives there. I spent the weekend there. The conference was Monday through Wednesday or Thursday and so I flew out actually before the weekend, played tourist for the weekend. But I’d love to tour the city with a little bit more time and take the whole family around, take the kids on the cable cars and do all of that.
Carlos: That’s a fun town.
Andy: Yeah, and you know, take them out to Alcatraz and maybe we bring both kids back with us, maybe we leave one in a cell. You know, it depends on how the trip’s going.
Carlos: There you go. Food that reminds you of your childhood?
Andy: This was kind of a tough one for me, but I think I’m going with meatloaf.
Carlos: There you go.
Andy: And I remember my mother would, she’d always make us meatloaf and mashed potatoes and then for a day or two afterwards, she’d have a meatloaf sandwich for lunch. So any time we make a meatloaf at home, I still picture the meatloaf sandwiches, the leftovers the day after.
Kevin: So is that with ketchup or catsup?
Andy: It’s all ketchup up here.
Carlos: It’s all ketchup, right.
Andy: I’ve never seen catsup in New York.
Carlos: Yeah, so we’ve been talking about SQL Server tonight. That’s what we talk about on this program. Now tell us, how did you first get started with SQL Server?
Andy: So this is kind of an interesting story. I kind of had two starts with SQL Server.
Carlos: There you go.
Kevin: Had a false start, was fined a five-yard penalty and had to redo it.
Andy: Yeah, the receiver jumped. So my first job out of college was at an insurance company and the first couple of applications I worked on were using DB2 on a mainframe that I had to connect to from classic ASP. But I ended up moving over to a project that was using a web-content platform called Vignette StoryServer, which, if you remember the old CNET.com URLs, in the beginning of the URL and then it was a number comma a four-digit number comma comma another number?
Andy: That was Vignette’s StoryServer, it was their CMS/web platform. And for the company I was working for, we were running that on SQL Server 6.5 and then upgraded to 7. So back in the good old days of Enterprise Manager and DTS. So I didn’t really work with SQL Server a lot. The most vivid memory I have of that is when I was spinning up a new application, I was writing from scratch and I had to go to the DBAs to say, “okay, these are the tables and fields that I want created in a database for this application.” And they came back and said, “okay, here’s your final schema,” and it looked nothing like what I had originally asked them for. And it was because they had a naming convention for DB2 that was built around the limitations of DB2, eight column names–
Kevin: Eight characters, yeah.
Andy: Eight-character column names and all of that kind of stuff.
Kevin: All capital letters.
Andy: Yep, and they applied the same standard to SQL Server, which limits didn’t exist, so that was very strange to me. So yeah, that was my initial start. I didn’t really get too far into SQL Server at that point. And then my second job, I was working for a real estate company and we migrated from a DOS-based application for managing the properties. This was in 2006. They were still running a DOS application. To a web-based application and that was running originally on SQL Server 2000 and then when we upgraded to 2008, that was when I actually got my first exposure to the SQL Server community through Allen White and Kendal Van Dyke. They were working as consultants and they worked with us through that upgrade. That was where I really got interested in SQL Server and said, “oh, this is really cool stuff,” and started getting further and further into understanding SQL Server learning how to write good SQL write good code for databases before I started setting my sights on becoming a DBA.
Carlos: Now, in that time you’ve been working with SQL Server, if there’s one thing you could change, and I guess we talked about it at the top of the show, so I guess I won’t hold you to that, necessarily, but if there’s one thing you could change about SQL Server what would it be?
Andy: The biggest one actually to me is the licensing. The cost and the complexity of licensing SQL Server.
Carlos: Interesting, and see it’s only going to get worse.
Andy: It seems to get more confusing every year. So yeah, at our scale, we’re talking about, it’s pretty serious money. And I understand Microsoft has an Enterprise-level product, here, they put vast resources into it, and I’m not saying that it should be 50 bucks per CPU or anything like that. I mean, you definitely get your money’s worth, it’s just really complicated, to me, to explain “this is why we need this many licenses for this and this is how much it’s going to cost,” you know, navigating that whole world.
Carlos: Yeah, and unfortunately, I don’t think it’s going to get easier any time soon.
Andy: In some ways, Azure SQL Database makes it a little bit easier, because it’s just, “okay, this is how big your database is, this is how fast you want it to go, this is what it’s going to cost you every month.” Right?
Carlos: That’s true, I suppose.
Andy: It’s the gist of it, anyway.
Carlos: Right, right. “And I want eight thousand of them,” and you’re like, “whoa, gosh, that is a little pricey.”
Andy: Image the size of that invoice.
Carlos: Yeah, yeah, that’s right.
Andy: If they’re itemized, ouch.
Kevin: That’s not a boat payment, that’s a boat.
Carlos: Yeah, that’s right.
Andy: Yeah. It’s a good thing they’re PDFs and not paper.
Carlos: What is the best piece of career advice you have received?
Andy: That really goes back to my first job. I worked with someone and he would say repeatedly, “perception is everything”. And you don’t have to have all of the answers, just give people the confidence that you really do know what you’re talking about. You may be nervous as all get out when you’re doing a big implementation, but if people perceive that you are calm, cool and collected, and you’ve got this thing taken care of, they will trust you more. It’s all about your communication, both the style and the frequency. Make sure that people have the right perception of you and what you’re doing, and it will make your life a lot easier and you’ll be trusted a lot more.
Carlos: I like it, yeah. Sometimes hard to do, but still good advice.
Andy: Yeah, you gotta keep your cool and sometimes it’s more important to know when not to say anything than it is to know what to say.
Carlos: Sure, sure, there you go. The old Kenny Rogers song, “you gotta know when to hold ‘em”.
Carlos: Okay, Andy, our last question for you today. If you could have one superhero power what would it be and why do you want it?
Andy: So, I don’t know that I’ve heard this one on the podcast, yet, but it’s not really a po– it’s kind of a power, I guess, but Captain America’s super soldier serum.
Carlos: Oh, there you go.
Andy: People say flying, I gotta think about the heights. Iron Man and Batman, their super power is their money, but yeah, super soldier serum that Captain America has. Just to not have to worry. You can run as fast as you want, there’s no recovery time, you’ve got the strength.
Andy: I admit, I never really read comic books, so what I know of the Marvel universe is just what’s been on screen. I kind of feel like they’re overdoing it now, with Captain America’s strength, that he’s almost infinitely strong, which seems a bit much to me. But just to have the physical abilities without having to put in 15 years of time in the gym. You know, nobody’s got time for that.
Carlos: That’s right, that’s right. Here, where’s my go-faster button?
Andy: Exactly. It’s hidden behind the NOLOCK button.
Carlos: That’s right, that’s right. Well, awesome, well Andy, thanks so much for being on the program with us today.
Andy: Well, thank you for having me, it was a pleasure being here.
Carlos: That’s going to do it for today’s episode, compañeros, and as always, we are interested in what you have to say. You can reach out to us on social media. Of course, we’re interested in connecting with you.
Andy: You can find me on Twitter at alevyinroc. That’s a-l-e-v-y-i-n-r-o-c and my blog, which Carlos will have the links in the show notes, but it’s flxsql.com.
Kevin: You can find me on Google by Googling my name. Don’t try Bing, you won’t find any results. At least not what you’re looking for.
Carlos: Oh boy. And compañeros, you can reach out to me on LinkedIn. I am at CarlosLChacon and we’ll see you on the SQL Trail!