Clustering is a now tried and true technology to help increase the availability of your SQL Server environment. I talk with Paul Popovich about his experiences with clustering and how his approach has changed as the technology has evolved. If you are looking for a good introduction to some concepts of clustering, this episode is for you!
It should be noted this episode does not include the new feature of Availability Groups.
Transcription: SQL Server Clustering
Carlos L. Chacon: This is the “SQL Data Partners” podcast, my name is Carlos L Chacon, your host and this is Episode 14.
Today, we are talking about clustering, with Paul Popovich. Just to make sure that we’re on the same page, this is the clustering that has been around for a while. I think at least 2005, it was available. This is not the availability groups that have become a little bit more popular in SQL Server 2012. Clustering is our topic today, and we hope that you’ll listen in to the session today.
For those of you who are headed to Seattle, that will just be in another week, probably another couple of days you guys will be heading out. We will have ribbons available this year that will say SQL Compañeros. I hope we can connect at Summit. Be sure to get your ribbon from me, and it would be nice to chat with you.
As always, if you have feedback or if we missed something in today’s session, maybe it wasn’t quite right, we welcome that feedback. You can hit me up at [email protected] or on Twitter, @CarlosLChacon.
Super excited that I was able to be here, in Pittsburgh, and that’s where I ran into Paul. I’m excited to have him on the show today. As always, compañeros, welcome to the show.
Children: SQL Data Partners.[music]
Carlos: Paul, welcome to the program. It’s nice to have you. How are you doing today?
Paul Popovich: Hi, Carlos. I’m doing well. Thanks for having me on.
Carlos: Not a problem. Now, remind me. Where are you working for and what are you doing these days?
Paul: I recently started a new job at a large northeast Ohio hospital system, University Hospitals, with a title, for whatever that’s worth, of Senior SQL Server Database Administrator, or Senior SQL DBA.
Carlos: OK. Very good. I take it you have a handful of clusters over there in your environment.
Paul: Yeah, we have somewhere in the 20 to 30 range of clustered instances, and a grand total of 300 plus SQL instances, in general, with mirroring, replication, clustering. Then, as the university chain grows, we end up buying hospitals and having to absorb 50, 60, 70, who knows how many instances per hospital, into our collection.It’s an ever growing thing of cleanup, standardizing, best practices, getting our scripts into all the instances. We have an offshoot of Ola’s maintenance to do all of the things and all of our backup jobs and standardizations, best practices, stuff like that.
Carlos: Yes. Thank goodness for Ola’s scripts. Right?
Paul: We love Ola.
Carlos: That’s right, Ola, if you’re listening, you’re still welcome to come on to the show.
Carlos: I’d like to come back to that actually. That idea of replication or mirroring outside of clustering, or I guess on top of.Let’s make sure we come back to that at the end, but ultimately today we are talking about clustering. Why do we want to cluster? Why would we want to build a cluster?
Paul: Well, the main argument that I had to my boss when I built my first cluster was, “I like sleep.”When you have single standing SQL Server instances, if things go bump in the night and it doesn’t come back up and it’s not there and you have all sorts of things go wrong. Having a cluster allows you to take the hardware aspect out and let the SQL instance fail over from one node to another.
That way, when the motherboard fries or whatever may happen, say you lose power in a rack or something really silly happens. SQL just starts back up on the other node, goes through its check recovery on start-up, and your instance is back up in a few minutes.
Carlos: You’re off and running. Yes, we should note that clustering is more about the operating system or the server components, as far as being available?
Paul: Yeah, you’re trying to take away hardware failure. Because the SQL instance is only alive on any single node, in the cluster, at any single point in time, so it provides a higher level of availability, I like to say.
Carlos: Right, so the idea is you have two operating systems, two servers, each of those servers connected…
Paul: Ideally identical.
Carlos: Ideally identical, yes, you know at least they have to have the same operating system. That I do know.
Paul: You don’t have to have the same hardware, but don’t cheap out on your secondaries, because then if you ever have to use it, your users are not going to be happy that their performance is halved.
Carlos: No, that’s right, even I would say from a patching perspective. A lot of times we’ve seen nodes just swing over and stay there to help with that patching perspective. They should be the same.
Paul: Same patch levels. Should be the same hardware, buy in pairs. Or if you’re buying multiple boxes buy whatever you’re buying all at the same time. Build it all up at the same time, same OS patching.
Carlos: But they have a shared repository so there is only one database on these servers.
Paul: Yes. You need shared storage for this. And actually that’s half kind of a lie because if you buy some tool called SIOS DataKeeper you can get away with not needing shared storage because it does block level replication for you.But, this is not an ad or anything for SIOS, I just want to make it aware that if you want to do some crazy things in Azure, you need SIOS DataKeeper to make a cluster in Azure. That is actually supported by Microsoft and you can do all that in the cloud, since you can’t get shared storage in the cloud.
Carlos: Very interesting. I might have to chat with those SIOS folks. I did not realize, that it was an Azure ARK ready component if you will.We’ll make sure that we have that post, and some of the other information in the show notes, which you can find at sqldatapartners.com/podcast. You’ll be able to see today’s episode and other information out there.
Ultimately, about availability. That’s where we want to be able to keep those up, so in a hardware failure we can get back more quickly.
What do we need in order to build a cluster?
Paul: We’re doing this on-prem…
Paul: I need minimal two boxes. You want to build them on Server 12 or Server 12 R2. In fact, I would argue Server 12 R2 because it has nice clustering enhancements with, “Last Man Standing”, and some waited voting. Then you need SQL12, SQL14 and you can build clusters in standard edition, which is nice.There is some licensing implications, they change things around in SQL14, to get the passive note. It used to be free. SQL8 R2 12 you could just buy. You could spend eight or ten grand on SQL Standard, put it on a box, make it a cluster, that’s great.
You didn’t need to license the other note, because you weren’t physically using it were for anything, it was just sitting there waiting. In ’14, they changed the licensing around, where you now need software insurance. Thank you, Microsoft.
Carlos: Thank you for that.We need the hardware, but what about some of the other objects that we need in order to get the actual physical cluster up and running?
Paul: If you don’t have a SAND, you’re going to need a SAND. Assuming, we’re not going the SIOS path, so you need shared storage. You can either do it over a fiber channel or you can do it over iSCSI. It all depends on what your infrastructure team is going to support.If you’re a large invested environment, that already has a lot of things up and standing, you probably already have a SAND or many SANDs with all the plumbing needed for fiber or iSCSI in place.
Carlos: Talk to your SAND administrators, right? Make friends with them and they’re going to need to give you some shared storage. You’re also going to need to talk to your network folks.
Paul: When you start to build clusters, first your Windows cluster gets an object. That’s an AD object, that’s going to go somewhere in an OU. That’s going to need an IP.When you install SQL Server, there’s four different parts of ways to think of this. You have a physical Windows Server with a name on your domain, so that has a DNS entry.
Carlos: Server 1?
Paul: Right. You have server 1 and server 2. Then you have a Windows cluster that sits between the two. That’s cluster 1. That’s a OU object and a DNS item.
Carlos: So the outside things can ping that name and depending on which one is active, it would respond.
Paul: The Windows cluster will return with an IP address just like anything else, and that’s a look-up just as you would expect. That’s 1 and 2, node 1 and node 2, on cluster 1. Then you have your SQL Server, so you’re going to have a named instance out there.It will be, “SQL\” instant cluster. The front half of the “\” is a DNS entry, so that’s a pingable object too. That needs an IP. So we have 4 IPs, minimum, to get one instance out there.
In my current job, we have an enterprise cluster with eight nodes and eight instances of SQL. We can drop, shift things around, patch, do upgrades, do whatever we want.
If you’re already running out of IPs on a certain block, or if your CISCO guy is not exactly your best friend. [laughs]People love booze. Nice bottle of Jack, or whatever your friends are into. Make friends with your SAND guy, your AD guy, and your network guy.
Carlos: Grease those wheels a little bit.
Paul: Yeah. Whatever you got to do.Once we have those objects and we talked about the active directory components that we’re actually going to need to create some objects in the active directory. You’re going to need some permissions to be able to write.
The permissions, I’ve only ever gotten it to work with domain administration, and I believe they have changed that. To make the Windows Cluster, you need to be a Domain Admin.
A lot of SQL Server DBAs will have to get everything presented to them afterwards. Their server team will stand up to servers. Their networking team will lay the ground work. The server team will build the Windows cluster itself, IP everything, set it all up and then you’ll go ahead and install SQL on it with the IPs they give you for your cluster instances.
Carlos: Now I have used it. The example I have is that the objects by default got created on a specific OU.
Paul: They do. They end up in the default computer’s OU. So you’re server 1 and server 2, cluster 1 and instance 1 will all end up in default old computers. You can move those around.That’s the way, we got around the security components. I didn’t have a domain administrator. The user I was using to install had rights to create on those default containers and then by policy, they begin to move into where they should be.
You can re-stage them, but I haven’t had a lot of…success. I don’t know if I would bother.
Carlos: I’m on the same opinion. You get someone with rights or give them a place where you can create them when you have no rights.
Paul: I would feel better about just having them just show up as you want them created. Deal with moving them around, making sure you have proper OU policies or proper rights to the OUs afterwards. Clustering will tell you through warnings if it can’t update something, there’s a cluster error log that’ll tell you some of that stuff.
Carlos: There you go. It creates its own little world there in the cluster.
Paul: Cluster manager.
Carlos: There you go, cluster manager.
Paul: It’s actually the failover cluster manager.
Carlos: There we go. I think some of the different server additions have changed the [inaudible 11:40] a little bit.
Paul: Yeah, they’ve changed the name and changed the look a little bit. There’s one 2003 cluster hosting SQL 05 at UHN and the server, they’ve totally changed the look of it from ’03 to ’08, but they’ve done that with everything, so that’s to be expected.
Carlos: The only constant is changed, right? Ultimately, this is about availability. Now we have our cluster setup. We have all the objects that we need. How does that work from an availability perspective?We talked about that only one server is going to see the disks, for example, that shared disk and then be available. How do I make that a little more available so when things go bump in the night, as you mentioned, I’d have to be woken up too?
Paul: There’s actually two ways you can go ahead and present your disks. You present the disks to both servers and clustering handles the on/off button. If you go into disk management, you’ll see them, enable them with one node and disable or offline on the other.That’s the easy traditional way, because you only have to worry about storages on a node. I know that the other node has it disabled, I don’t need to worry about that other node doing anything silly to the disks like querying it or running NTFS check line and nothing. It’s on or it’s off.
Carlos: The SQL services are stopped on that other node.
Paul: The SQL services on the other node are stopped, so everything…even if you wanted to turn them on, they couldn’t because there’s no drive, so life is easy.
Carlos: [laughs] Don’t do that, I have…should I admit this? I have made the mistake of going into a cluster and thinking, “Hmm…”
Paul: Always know which node you’re on.
Carlos: “…The services are set to manual. That’s not right.” [laughs]
Paul: You want to handle everything, all the on/offs from inside cluster manager, because it will do all the work for you. It’s an RSAT tool snapping, you can manage them from your PC remotely, everything is great and wonderful. It is a little annoying with the way they’ve changed RSAT tools, where if you are on Windows 7, you can’t manage RSAT on server…
Carlos: There’s that word, I apologize. RSAT, what is that?
Paul: The Remote System Administration Tools. It’s for AD, DNS, failover clustering, there’s a whole bunch of other…file system resource manager.
Paul: Those are free download from Microsoft, but if you’re on Windows 7, you can only manage Server 2008 or 2, I believe. If you’re on Windows 8, you can do 12. Windows 10, you can do I think 12 or 2, and then Server 16, which isn’t still on public availability, or CPT. Whatever it’s on nowadays.It gets annoying with having the right version of Windows and the right RSAT tools installed for the right server you want to manage.
Carlos: That breaks up this idea of…in clustering, there is a number of items that we need. We know that we’re going to need some disks available for SQL Server, but the other thing that we have is we have quorum.
Paul: Yes. Quorum is long and involved topic. To keep your cluster up, you need votes, and quorum contains votes. Each node in the cluster gets a vote.
Carlos: You have to have a majority.
Paul: To have your cluster be up, you have to have a majority. But if you have a two-node cluster, you have…
Paul: …equaling votes, yes 50-50. There’s two different ways to add a third vote. It’s a file share or disk witness. Depending on your setup, it depends on the way you want to make your file share or disk witness.I guess the easiest way I could probably think to describe it is if you are doing a Windows failover cluster with a SQL instance and you’re serving up via fiber channel, I would do it via disk witness because if your fiber channel drops, you’re going to lose SQL, you’re going to lose the quorum drive. It’s all going to go away.
Carlos: SQL Server will not be available.
Paul: SQL Server will not be available, and you know you’ll have an issue. If you do SQL Server on fiber, OK, that’s great, then you do a file share witness. If your disks drop, your cluster is still going to be up because you’re going to have networking. You’re going to have a vote for Node A, and a vote of Node B, and a vote from your file share quorum witness.It all depends on the situation, what is best for you and the best way to configure it. You could talk ad nauseam just on quorum itself. But the easiest way to think of it is, it’s votes. If you don’t have enough votes, you don’t have a cluster. You have a cluster, you have problems.
Paul: That’s right. The other comment or the other idea with that file share is that in certain cases, I think in test and again I would say in other versions, which we’re not talking about today, the file share can be helpful in…the file share quorum can be helpful in test situations where you maybe don’t want to be pinging for disk space from your SAND guys or whatever. You can use that [inaudible 16:42].
Paul: That is true. There is actually another way to present disks to your cluster. Starting in SQL 14, you can use cluster shared volumes. Hypervisor has been using it for years. That’s how they do live migration, so each node in the cluster has all the disks on it mounted, which it would anyways, but they’re all live.Cluster manager tells the VM it’s living on Node A, then you do a live migration to over it in Node B. It doesn’t want to drop the disks because then your VM would go offline and then remount them on Node B. Since the disks are both alive and up on both nodes, it just does magic Microsoft stuff in the background where it just shifts the workload over.
SQL Server now has that option for failover cluster instances in 2014. I avoided it in my last job only because it was Version 1 implementation in SQL. I’m not a big fan of anything that’s Version 1 and new in SQL. I would prefer not to have to worry about those sorts of things, given that if you’re doing the traditional way of just storage on both nodes and it’s on and off. There’s really nothing to go wrong. There is just, “It’s on or it’s off.”
There’s Knowledge Base articles in Windows Updates for cluster shared volumes. I don’t want to have another thing that I have to worry about to patch in and read the release notes, and add another layer of complexity there. It just makes your life so much simpler to go, “I have storage 2 in B, it’s on A, I don’t have to worry about B. It only goes on to B when I do whatever I do, be it maintenance or a fiber failover so.”
Carlos: Yeah. Ultimately we create these clusters for availability. We talked about a scenario in patching. I can patch server 2, reboot that, and then during the maintenance…outside of the maintenance window and the maintenance window, I move it over…
Paul: Outside of the two to three minutes you should take for your SQL Server instance, and that depends on number of drives and number of databases on startup and what’s…
Carlos: The size…
Paul: …the size.
Carlos: …all that stuff.
Paul: Ideally, that makes your patching window somewhat simpler. Your downtown would have so much shorter. You patch Node 2 on whatever your schedule is, then you roll over the instance to it. Patch Node 1 because you want to keep them the same, because you don’t want them to get out of sync with patches. Then you cut your window in half.
Carlos: You’re done. That is an interesting point that we probably need to specify, because I have ran into situations where the managers are like, “We put in clustering. Our downtime should be zero.”
Paul: It’s a higher level of availability, not highly available. That is…
Carlos: Even two minutes, I’ve had my comeback even a little bit faster than that. When you failover, there will be some downtime.
Paul: When you hit the failover button, SQL stops, it moves the disks from one node to the other, and it starts up again.
Carlos: The nice thing is that it will take care of those transactions for us. If there was something that would, that it hit SQL Server that wasn’t yet committed for example, it will roll that back. It would be highly consistent database. If there’s something that is there, we’ll know that we’ll have it, so new connections won’t be available until it actually comes up…[crosstalk]
Paul: …the databases come back up and available.
Carlos: There will be a little bit of a downtime. We’re good. I think that was a nice little introduction, an overview of SQL clustering. We will put that in the show notes, links to some of the references that we’ve made today. You can check those at sqldatapartners.com/podcast and click on today’s episode.Before we let you go, Paul, we have a couple of things we want to ask you about. Here we’re always looking to add value to our listeners, we’d just like to get different opinions, different ideas about how you’re using tools to help you in the SQL Server space. What’s your favorite SQL tool? Why do you use it and what you like about it?
Paul: If you get your hand on any monitoring software, just buy it. Don’t try to roll your own. Be it Spotlight, or Idera or SQL Sentry. It makes life so much simpler. If you have to start rolling your own type of tools, go find, Brent Ozar’s tools. It’s sp_BlitzCache, sp_Blitz, ask Brent. Go find Adam Machanic’s Who is Active.For all the free stuff you can get online, the Ozar suite and Adam Machanic’s Who is Active will give you a pretty good overview or look into most things that are going on in your SQL Server and help you find some pain points pretty quick.
Carlos: Yes, and only because he was a prior guest on the podcast, Steve Stedman also has a monitoring tool that he’s been rolling for the last five years. It’s actually pretty good. I kicked the tires on that. Another free option.Very good. Paul, I know you’re here in Pittsburgh presenting today. You’ve made some friends in the SQL community, and you had some experiences in the hospital space. Ultimately, we’re looking for an experience that will help illustrate why it is you enjoy being a database professional or DBA.
Paul: The ability to take a query that runs for minutes, then to have it run for a second, two seconds, 10 seconds or whatever the reduction in query run time you can get by. Be it rewriting it, or finding the right index. That feeling is just awesome, like, “Oh, that report used to run for half an hour. Now it runs for two minutes.” That’s free. That’s 28 minutes of cycles you just opened up for whatever other work is going on. It’s just fun to me.
Carlos: Very cool. That ability to provide value, and people, they want to see those results.
Paul: Right. That’s completely quantifiable. You can’t just say, “What did you do all day?” “Well, this 34-minute query now runs for four seconds.”
Carlos: [laughs] Yes, there you go.
Paul: It’s the best feeling in the world.
Carlos: Very good. We do have one more question for you, Paul. But before we ask you that, let’s hear about one additional instance that folks have to learn about SQL Server.[music]
Carlos: Hello there, compañeros. I want to tell you about a unique SQL Server training opportunity that is unlike anything you’ve encountered. As a listener of this podcast, you’re entitled to a special offer.SQL Cruise as a premiered training experience set in the most unlikely of places — a cruise ship. Tim and Amy Ford have created a wonderful culture on SQL Cruise. With fiber 6 technical leads from various industry sectors, you and roughly 20 other students will talk shop in classes while at sea. While you are in port, you can enjoy the wonderful world we live in, either by yourself or with the group.
The beauty of SQL Cruise is the unique relationships you will form with other SQL Cruisers. Worried about leaving your spouse? Bring them along! I did and she had a great time with the group. In fact, I have been on SQL Cruise twice, and I’m looking to go a third time. You may be wondering if this is serious training, and I can assure you it is as good any conference you will attend on land.
It never crossed my mind that I can be talking about SQL Server with several Microsoft MVPs on the beaches of St. Johns. I know you must have other questions, so I suggest you head over to sqlcruise.com and check out the site.
Once you are satisfied and you want to sign up, when you get to the question, “Would you be so kind to share the name of who referred you and what source material lead you to us?” You enter SQL Data Partners, and Tim will ensure you get $100 off the training registration.
This is a great offer and I hope this $100 savings will entice you to check out the site. I did and went on the Caribbean Cruise and had a great experience. I know you will too. Maybe I’ll see you on board, so head on over to sqlcruise.com and use the code SQL Data Partners to save $100.[background music]
Carlos: Paul, we do appreciate the conversation today. Before we let you go, we want to know if you could have one superhero power, what would it be and why do you want it?
Paul: I’d want Batman’s power, and that’s being rich.
Paul: Then I could buy all the toys he has and the Batcave.
Carlos: There you go. You can overcome a lot of obstacles there, right?
Paul: Probably, you would think. I mean, you would think the Wayne Foundation would have fixed up Gotham by now with all the money he had. Why are all these super villains all around there? We got to look it from a different perspective, it really doesn’t make a lot sense.
Carlos: Money is not everything there.
Paul: There’s something missing there.
Carlos: [laughs] OK, very good. I do appreciate the conversation, Paul. Thanks for being here.
Paul: Yeah, thanks for having me, man. I appreciate it.
Carlos: Compañeros, again if you’re out in the summit… here at the end August or October, rather, 2015. We hope to see you there. I’ll see you on the SQL Trail.
Children: SQL Data Partners.