Episode 19: How Solid State Drives Can Help SQL Server

1400A hard drive is a hard drive, right?  Storage has come a long way, but is probably still the biggest bottleneck in your system.  Solid State Drives (ssd) have changed the rules in terms of performance and there is new technology on the horizon to allow us to access our data even faster.

I talk with Justin Strong and Ananda Sankaran from Micron Technologies about their experiences with SSD and how organizations are using in their environments.  We also chat about some of the new technologies on the horizon for Micron.

Transcription: SSD

Carlos L. Chacon: This is a SQL Data Partners podcast. My name is Carlos L. Chacon, your host, and this is episode 19. This week, the United States celebrates Thanksgiving Day. I know I have lots of things to be grateful for, and I hope that you do as well.

Today, we’re talking about SSDs, the role that they play in SQL Server, how they’ve changed some of the server infrastructure, and some of the things that we can take advantage of.

This is an example of why you’d want to talk to the vendors at some of these events, the SQLSaturday events and other conferences that you attend. Yes, they do have products they want to sell you, but they are people, and they have interesting stories as well.

I happened to meet the Micron guys, and they were at several different events. They mentioned they had a technology evangelist in Justin Strong. So today, along with Justin, I’ve got Ananda Sankaran. They talked to us a little bit about their experiences in SSDs.

Yes, they are a vendor, but it’s not vendor-specific. This is more of an abstraction. In very general terms we will provide some information if you’d like about that, but again, it’s not super specific to them.

Ultimately, we are talking about a variety of different topics. We’d love to hear about what you want to talk about. You can hit me up on Twitter at @CarlosLChacon, and, as always, compañeros, it’s good to have you and welcome to the show.

Children: SQL Data Partners.[music]

Carlos: So compañeros, today we are talking about SSDs, and as I mentioned, I have Justin and Ananda here from Micron Technologies. Gentlemen, welcome to the program.

Justin Strong: Thank you, Carlos.

Ananda Sankaran: Thank you, Carlos. It’s great to be here.

Carlos: Ultimately, today, we’re talking about SSDs and so I thought first, just in case someone is not as familiar with SSDs, let’s define what they are and how they’re being used. We talk about SSDs, solid state drives, solid state disks, and what role or how are they changing the way that computers are getting built and used and how they are different from traditional disks.

Justin: With solid state disk drives, what we’re seeing, basically they are made of flash memory. Flash memory is a storage technology that is quite a bit faster than disk drives. What we’re seeing is a fundamental shift in the marketplace in storage technology. It’s enabling storage to be significantly faster.It’s allowing storage to be significantly less consuming of electricity, of power. It’s also enabling significantly denser storage in the marketplace. I’ll talk about it a little bit later as we go through the podcast, but that’s having some significant impact on the way that storage is being implemented.

Carlos: Even from the physical perspective the way that SSDs fit into the system is different from disk drives. They have to have certain slots for it to be able to handle the SSDs.

Justin: In some ways, there are some form factors that are slightly different. Right now, the SSDs have basically the same form factor as a lot of the hard drives. There’s a two and a half inch, three and a half inch form factor.There also is some additional form factors, there’s cards. Traditionally those have been PCIe cards. Those are very high performance. There are also a couple of new form factors. One is called a U.2. That is a form factor that allows you to plug in to a standard SATA slot, a PCIe, a SAS drive, or a SATA drive.

There’s also another form factor called M.2 and that’s the size of a gum stick. It’s quite a bit thinner than that, but that allows a significant amount of flash storage to exist in a server or a client within a very small amount of space.

Carlos: You have these different flavors, if you will. I’m sure based on size generally there could be a cost differentiation. From a speed, or from a usability perspective, are there any differences between these technologies?

Justin: For the standard two and a half inch, it’s typically what you would find in a SATA or a SAS. PCIe is also now available in the two and a half inch form factor. That’s probably going to be the most popular one.Form factor wise, what you’ll see is the cards usually have the most speed available to them. That’s typically going to be your PCIe. That’s the only one that’s available in that half-height, half-length, or full-height, full-length card format.

You’ll typically see the fastest speed from that. Part of that is because you have more space to lay out some of the NAND, you can actually get a higher speed, because you can light up or power more NAND chips simultaneously to access a lot more data simultaneously and, hence, get a lot of parallelism in that.

Carlos: That’s not a word that I’m super familiar with. You said the word NAND chips. Can you help me understand what that is?

Justin: NAND chips, those are basically the type of flash memory, the type of flash memory that SSDs are typically made out of.

Carlos: So each of these have these different segments or areas, these NAND chips. Then the ability to power them will dictate simply how much we’re able to use in the speed.

Justin: Yeah, there are other things also. The PCIe does have a faster interface so that comes into it. Then the architecture PCIe allows you to do faster, but there is some relationship to the speed.M.2, you typically have a very small amount of real estate to work with. There are other reasons for M.2 to having maybe a lower performance. But typically, you’ll have less NAND that you can actually power up at the same time. Then the two and a half form factor fits in between.

Carlos: Obviously, listeners are over the space, if you will, from what they are able to afford and the kind of servers that they have, but we tend to have a niche with some of the smaller players. I have a server, let’s say we’re considering SSDs, but we’re not exactly sure how those would integrate.The PCIe you mentioned is the most popular, how would an organization go about deciding whether if, one, if they can fit SSDs into their current infrastructure. Then two, how do they go about testing it or getting their hands on somebody to see if it’s an option for them?

Justin: There are a lot of different options available in SSDs. I can say, everything from low end SATA drives, all the way to the high end PCIe drives. Typically, we have it fall into three inter-phase camps, SATA, SAS, and PCIe.Each one of those has different form factors. I think one of the things that somebody would have to look at is what are they trying to accomplish putting in an SSD? The most common reason that people use SSDs is for increasing performance.

It’s not necessarily just increasing performance, but they’re really looking for consistent performance. One of the things that an SSD will provide is you will always get that same level of performance. When you have like, let’s say a SAN or NAS, sometimes you’ll get a pretty decent performance, but because it’s shared, sometimes you’ll have other applications that are on that shared resource impact the performance that you’ve got.

Typically, with an SSD, you can obviously add SSDs into a SAN or a NAS, but because the performance level is so much greater, that you’ll always get that consistent level of performance from an SSD and hence, consistent level of performance.

That’s one of the things to do to evaluate what do we need to do, what are we trying to accomplish with our application. Once you do that, then you can decide how much performance do I need? Do I need a PCIe card where I need the ultimate level of performance? Or can I deal with the performance of, let’s say, a SATA SSD and that will be sufficient for what my application needs?

Then the other thing to look at is where do I want to put my SSD? Do I want to put it directly on the server, which will give me my greatest bank for my buck? Or do I want to put it somewhere else in my hardware architecture?

Carlos: You have mentioned the SAN and sharing that space? I’m not sure that Micron, if Micron heads up, I’m sure they probably do, but I have heard of flash arrays in SAN architecture. Can that flash drive not be shared or allocated, if you will?Let’s say I have several flash drives to make up, for example, I heard one of a terabyte flash array. Can I not divide that up to share among several machines?

Justin: Yeah, you can absolutely do that. The most common way that we see people at Enterprise in a way implementing flash is a tier in a SAN or a NAS. The other thing you can do is buy up on all flash array.

Carlos: If I’m a small organization, obviously you have the SAN array. I think you mentioned the ORD Enterprise, so for larger customers. Let’s say, what’s even direct or go more towards the SQL Server, the route. I’m a small organization that has a SQL Server, we are not seeing the performance that I want, how would I make a determination that maybe flash arrays are the way to go versus the SSDs are the way to go?

Ananda: If you’re a small organization and have a SQL Server deployment, let’s say, you’re using a Gboard attached to the server or its local drives on the server for SQL data. The simplest thing you can do to add flash is add a couple of flash drives, maybe one or two PCIe or a SATA, depending on your performance needs, like Justin said.Move some of your database objects. For example, move your SQL Server logs to the flash drive and move your SQL Server database in [inaudible 11:56] . What we have found in our internal testing is just by moving logs onto a flash drive, we are able to increase the transaction throughput by almost 50 percent and reduce the latency by almost 90 percent for your transactions. That’s huge.

Not only reducing the latency but the consistency of the latency. Justin again mentioned that, the 99 percentile. I’m talking about majority of your transactions will be at a constant low latency especially when you use a PCIe drive. That’s the simplest move that a small organization can make by deploying a few number of flash drives and immediately see the benefit in terms of performance.

Performance extends to larger business problems. Now you’re able to do more with [inaudible 12:49] . You’re able to drive more transactions with Video SQL environment. That translates to business productivity.

Carlos: Let’s continue with that. On the technical perspective, obviously, the logs become the bottleneck because everything has to go through the log first.

Ananda: Correct.

Carlos: You mentioned the transaction. Throughput can increase because now my logs are super, super fast. More transactions can happen because my logs are much faster. Now, question. My think or my understanding is that a flash with SSDs they actually do slightly better maybe at those speeds, its six of one half a dozen of the other.But they do slightly better in a random reading and writing perspective, which is more common with our data files. It’s just that you mentioned moving the log files there first, obviously the easiest because it’s the very first step. Talk to us about using random reads and writes versus the incremental reads and writes.

Ananda: That’s a great point. There is this understanding that SSDs are good at random reads and writes, which they are. They are very good at random I/O patterns, but what’s not well known is that they’re even better with sequential I/O. For example, if you look at the NAND media, let me go a couple of levels deeper.The NAND media stores information in the form of electrons stored in a transistor. What it essentially is matched down to is the media is stored as blocks and pages. Let’s take an example, you can have 4K page sitting in a one megabyte block. You can write and read in terms of page sizes, which is 4K in this example.

If you want to rewrite a page you have to erase the whole block, which is one megabyte block. That’s the fundamental characteristic of the NAND media. What happens is when you have a lot of random reads and writes to rewrite an area you have to erase a block and you end up with a situation of writing the new data in some other block and later moving the data pieces within this original block, which are stale, and gather them and move somewhere else, so called garbage collection which is a common term within SSD devices.

Why I went this path is, take sequential I/O, if you have no data written in an SSD device, if you have sequential I/O coming, you fill a block and then you go and fill the next block and therefore you avoid the problem of rewrites. Once you reach the end of the device, then you come back and start writing the first block.

Sequential I/O benefits the way the NAND media operates and it’s even better. That’s something not widely discussed in the industry. When you take a fresh SSD out of the box, we call it the default factory state where you secure which means all NAND blocks have been erased and then you directly put it for log files. It’s going to perform very well.

Carlos: We should probably just one back just a moment. First of all, any time you’re experimenting with your server configurations, your individual instants will vary. Your server needs are your server needs. They’re not my server needs. They are not what we did in the lab. We should throw the flag out there and say ultimately, you need to test your environment first.We’ve seen some patterns here that we’re discussing, but compañeros, please do your own testing before you [laughs] make changes in your server, obviously. Getting back to that is you had mentioned the 4K pages. In SQL Server the default is 8K, is there a change by default there?

Ananda: Mine is just an example. Each NAND media will have its own page and block sizes. Let’s not assume everything is 4K, but it’s a good question. You’ve said SQL is 8K per page.

Carlos: Per page.

Ananda: That’s correct. 64K per extent, pages are grouped together into extents. We just find when you write, you’re going to write into a 4K page NAND, you’re going to write into two pages.It’s extremely fast. Think of this, if you compare to the rotating hard drive media, you’re writing on a disk platter and you’re magnetically coding your information on sectors of vital bytes and size.

Carlos: Yes. You can specify the size. There are some formatting options, but yeah.

Ananda: Correct. In NAND, it’s all semi-conductor. You have multiple flash NAND chips and when you are writing 8K synchronously, which is what the data files for SQL Server would do, when you’re sending multiple 8K writes, you just platterly write them to all these NAND chips within the SSD.Think about a single hard drive, you have a head, so you have to go all over the platters to write that data. You’re moving across back and forth. That’s why your traditional hard drive is, a 15K rpm hard drive, is rated at 300 IOPS or so, whereas the parallelism relies electronically storing data in a semiconductor chip yields you hundreds of thousands of IOPS on a single drive.

Regardless of the sizes, where I’m going back to the original question is, regardless of the sizes you are still able to come with them to the media quicker than how a traditional hard drive would come at that data to a magnetic platter.

Carlos: We talked about IOPS. Inputs and outputs I think is generally how that’s referred to as, the number of times we have to either write or read from the disk, we group together and call that IOPS?

Ananda: Correct. It’s an Input/Output Operations Per Second. Usually that’s a term people use, but as always there are details behind it. What is the block size of that? In SQL we talked about 8K.For an application you need to be aware of the block size for the IOPS, as well as the read/write percentage. Are you doing 70 percent read, 30 percent write? Or is it in worse, or is it a 100 percent write? All those characteristics of the IO matter. IOPS is directly related to that.

Carlos: I talked about the read/write because I’ll admit where most of the information that we’re getting, I’m not sure if it’s vendor-specific. Many of the sessions that I have attended they talk about the read/writes and maybe being as fast.It’s one of the recommendations that we’ve heard is to remove your random read stuff over. Ultimately, I guess where I’m going with that is we are going to have some additional information available to you. If you want to take a deep dive, of course compañeros, you can always check out sqldatapartners.com/podcast.

There we’ll have links to the show, today’s show and with information about all of these things that we’re talking about so that you can find out and deep-dive a little bit more.

Ananda: They can share some of our findings. Justin can send out our published material that has some of this information available.

Carlos: I know you’re in the very, very beginning stages and obviously SSDs are very, very cool. I don’t have as much experience as I would like to with them, implementing them. You’ve just announced a joint venture, if you will, with Intel for something called NVMe drives and 3D XPoint.An announcement has just been made, but can we quickly talk about that and what the future of that will be?

Justin: I’ll go ahead and take that Carlos. NVMe and 3D XPoint are two different technologies. NVMe is an industry standard implementation or industry standard for SSD. What happen was is that the SATA specification and any of the interfaces SATA, SCSI, PCIe, SATA and SCSI mainly and SAS to some extent were defined mainly with hard drives in mind.You actually look in some of these specifications and they talk about cylinders and things like that. The industry got together and said, “Hey, these SSDs are much faster, they’re a different technology, they really deserve a different interface, a software interface to really take advantage of what they have to offer.”

The industry we came up with, the standard they came up with is NVMe. NVMe is Non-volatile Memory for enterprise. The NVMe really takes advantage. It was specifically designed for SSDs. It’s a software stack for SSDs, and it allows similar interfaces and similar exchange ability, I guess if you can use that term in between SSD.

You could have an out of the box, an operating system would have an NVMe interface and you could put in a drive from one vendor or another vendor and each of the drives should work with that standard type of interface and driver.

Carlos: That’s an interesting point. I didn’t read the announcement very well obviously, is that I envisioned that those replacing SSDs, but it almost sounds like it’s going to be the software that helps really open up the floodgates to the SSDs.

Justin: Yes. Right now, most of the NVMe drives are PCIe. They’re almost synonymous right now with PCIe drives the high-end drives. You’re still going to have SATA drives and you’re still going to have SAS drives. Then NVMe right now mainly is this PCIe interface.

Carlos: SATA will be coming down the road. I’m sure we’ll get some more information in 2016 as to how that evolves.

Justin: Yes, absolutely. 3D XPoint, completely different technology. As we talked about flash, NAND flash is the most common type of flash memory that you have today. It’s been around for 10 years, very, very good technology. We continue to see NAND flash have a long future.3D XPoint is a new type of non-volatile memory. I don’t want to call it flash memory, but a new type of non-volatile memory. That fills the gap in between DRAM and flash memory. DRAM is very fast. It pretty well has infinite capability for endurance. You can read and write to it as many times as you want, but it’s relatively expensive and it’s volatile, you turn off the power and you’d lose all the data.

Flash memory on the other hand is non-volatile and it’s relatively inexpensive, but it’s much slower than DRAM and it’s got some other attributes that make it as suitable for a memory, like a DRAM type of replacement. There is two ends of the spectrum.

What 3D XPoint does is fill that gap in between. What I see is the use for this is two areas. One is storage and then the other is storage semantics and the other is memory semantics. For storage, obviously you’re going to get a much faster SSD than you would using NAND flash.

There are two reasons for that. One, it’s just inherently a thousand times faster than NAND is. The other is its byte or even bit-addressable. Right now, the smallest area that you can write to a NAND flash is 4K. With the 3D XPoint you’re going to be able to write to individual bytes or bits, so for small data that will make that inherently faster to use.

That’s the storage. The other type of semantic is the memory semantic. While it’s not a complete replacement for DRAM, there are some instances where you can see where you may want quite a bit of memory in your server and DRAM might be a little bit too expensive and the 3D XPoint while it’s not quite as fast as DRAM, it is quite a bit cheaper and it’s also durable. It’s non-volatile.

You could see something, let’s say for example like swap files. I know that most servers are engineered with enough DRAM so you don’t have to swap, but if you could put in a little less DRAM and then maybe put in some 3D XPoint that could potentially be a use for that where you could use a 3D XPoint for a swap file because you can get away with having slower access to that.

Carlos: Then the SQL server space, for example SQL server 2016 they’re going to introduce the option of extending your buffer pool out to even flash drives, or in this case, could be 3D XPoint when that becomes more available.Question there again, so it’s not yet available, any testing or any experimentation in NAND now that you’ve done with SQL server in putting the buffer pool out onto flash storage?

Ananda: Yeah. I have done limited testing. This extension is only for read data, not write data. Write data will still go back to your original storage whether it’s your jboard or SAN, or whatever it be.Environments where you have a contingent for DRAM, for example virtualized environment. You have SQL server running on VMware, maybe for VMs running on 128GB DRAM server, each of them getting GBs each. In those cases you will benefit by adding a flash drive and adding a buffer pool extension file on the flash drive.

Where I’m heading with my answer is from my testing, where I believe this feature benefits is where environment SQL server, environments where you have low DRAM allocated to the server and your active data set is larger than the DRAM allocation for SQL server.

If you have 32gig DRAM and then your data set is for example 50GB or a 100GB, the active portion of your data, that SQL server access is larger than that then probably you will benefit by using a flash drive and the BPE feature.

Carlos: I think this is a nice segue for organizations that maybe aren’t quite ready to invest in another server or to increase exponentially, but they need a little bit of wiggle room to help them get a boost in performance, because they’ve hit a threshold. I think this would be an option for them.

Ananda: What I deemed this and there are already a lot of conversation we had, more locks to an SSD drive or have this buffer pool extension feature to extend your DRAM on a flash drive. These are all good immediate options for accelerating your SQL server deployment. We call them as accelerating your existing deployment.Then on the other end of the spectrum is what you earlier mentioned, the old flash arrays is and flash storage to your external SAN.

Carlos: Justin and Ananda, thanks so much for the conversation. I think we always learn something a little bit and today we’ve talked about SSDs, and so I do appreciate the conversation.We do have a couple of questions, changing, switching gears just a little bit. One of those things that everybody is always interested in is what tools are other people using? Really quickly, so what’s your favorite tool? This could be a free tool, a paid tool, but what are you using that helps you be more productive?

Justin: In our industry one of the biggest things with SSDs are performance and measuring performance. Iometer I think will have to, or a file called FIO file would have to be my two favorite tools, because we use them pretty well on a daily basis to measure the performance of our SSDs.

Carlos: Ananda?

Ananda: Likewise, performance as being my favorite topic over the year, the good old Windows performance monitor and the surrounding tools for that. I live and breathe it every day. It’s a great tool to diagnose an environment, especially Windows environments.

Carlos: Last question. If you could have one superhero power, what would it be and why would you want it?

Justin: That’s an easy one for me, flying. From a kid I’ve always wanted to be able to fly, and I thought that it would be just an awesome thing to be able to fly whenever you wanted, yeah, hands down, flying.

Carlos: Yes, I agree. That’s a very neat super power. Ananda, what about you?

Ananda: You all are familiar with the European comics Asterix?

Justin: Yes.

Carlos: I must admit I’m not, but I’m very interested to hear.

Ananda: It’s not French, but it’s a village in France, geographically, that is fighting against the Roman invasion. This village is powered by a magic potion built by a wizard. My super power would be to be that wizard, building this magic potion, so I can add strength to everyone around me.

Carlos: That’s very noble of you there, Ananda, to be able to help other people. Justin, Ananda, thanks again for being here on the show, taking some time with me.

Justin: Thank you, Carlos. It was great to be able to talk to you.

Ananda: Thanks Carlos, it was great chatting with you.

Carlos: Yes. Compañeros, again, check out sqldatapartners.com/podcasts for this episode, the show notes there. If we missed something let us know. I’m on Twitter at @CarlosLChacon, or you can hit me up at [email protected]. We’ll see you on the SQL Trail.[music]

Children: SQL Data Partners.

Episode 18: BIML

1400I learn stuff every day.  In this episode I tackle a new subject for me and one I am hearing a lot about–BIML.  BIML is a markup language you can use to build your SSIS packages.

I met up with Cathrine Wilhelmsen and Bill Fellows in Portland and they shared with me some insights on BIML, what they use if for, and some tips on getting started.  If you find your self doing more SSIS packages and haven’t yet tackled BIML, this episode is for you!

Transcription: Podcast BIML

Carlos L. Chacon:  This is the, “SQL Data Partners Podcast” My name is Carlos L. Chacon, your host, and this is Episode 18. Today we’re going off the beaten path a little bit, at least for my experience, and we’re talking about BIML.

BIML is not something I’ve worked a ton with. Why would I lie? I really haven’t worked at all with it. I’ve heard a lot about it. Those of you in the SSIS environment will find it very interesting.

We talk today with Bill Fellows from BlueGranite and Cathrine Wilhelmsen from Storebrand. They share their experience about getting involved with BIML. Of course, some of the resources that you can use to get started, if that’s not something that you’re currently doing.

Ultimately, we want to talk about lots of different topics, introduce them, in maybe a different way, and discuss some scenarios that you aren’t currently using. If there’s topics you’d like to hear about, you can hit me up at [email protected], always interested in getting your feedback. I’m also on Twitter @CarlosLChacon.

I am still basking in the enjoyable experience that was the Summit, in Seattle. I happened to go out to Portland, my mother‑in‑law’s in Portland, I was there the weekend before the event and that’s where I ran into Cathrine and Bill. They agreed to chat with me and I’m grateful that they did.

Not only did I pick up some BIML information but I also caught up on the great game of Rugby. I met some guys from New Zealand and Australia, and the World Cup happened to be going on, so they indoctrinated me, or showed me some things about Rugby, so Hamish and Warwick , if you’re listening, thanks for letting me tag along and be part of that exciting event.

Let’s get to it. It’s good to have you and as always compañeros, welcome to the show.

Children:  SQL Data Partners.

[music]Carlos:  Bill, Cathrine, welcome to the show.

Cathrine Wilhelmsen:  Thank you.

Bill Fellows:  Howdy, howdy.

Carlos:  Thanks for being here. First things first. Let’s define, “What is BIML.” BIML’s an acronym.

Bill:  BIML stands for the Business Intelligence Markup Language. I think the thing I always hear it described as is it is basically setting your operating system essentially for your business intelligence infrastructure, so it can handle everything from soup‑to‑nuts types of things, if it’s a business intelligence related item, the concept behind BIML is it can handle that.

Cathrine:  Yep.

Carlos:  Cathrine, you made the point that ultimately people who would be interested in BIML are people who do…

Cathrine:  SSIS.

Carlos:  Why is that so important?

Cathrine:  Because you have a free tool you can use right now to develop in BIML. It’s called, “BIDS Helper,” it’s an add‑on for our Visual Studio. That gives you the ability to generate SSIS packages from your BIMLScripts.

Instead of hand coding, hand creating all your SSIS packages, you can use BIML and BIMLScript to generate those packages automatically for you. Since that’s what’s in the free version of BIML and the tools, that’s where most people get started with BIML.

Carlos:  Let’s back up even a little bit further. I think that the scenario that at least clicked a little bit for me, so, I don’t do 100 percent of my work in SSIS, but I have done data transformations, from time to time, taking data from A, I want it to get to B. A lot of that is coming from tables, so I have a transformation. I want to do a transformation on, let’s just even say, 10 tables.

I have to have 10 unique objects in my package to do that. I’ve got to go in, set each individual table to do that. Then set the destination 10 times. The idea of that framework with BIML, is that there is something that can help me do that easier?

Cathrine:  Yes. What you do with BIML is you specify the pattern. You have that pattern where you pick up some data from a table, and you want to do some transformations and load it into a destination. That’s one pattern. Then you write a BIMLScript for that pattern and then you just go into your database, pick up the metadata with a list of all your tables and columns.

Then you can use that metadata in BIML to create one package for each table or one data flow for each table, anything you’d like to do. Instead of actually creating all of those tasks or packages by hand, you just went into our metadata store, pull out the data, loop over that and generate the packages from that.

Carlos:  Data transformation is the only thing I can do with BIML, or the most common thing?

Bill:  Definitely the most common thing to begin with. BIML itself can describe more than just your SSIS. You can describe your entire database and you can specify your tables, your schemas and actual database itself. It has the ability to emit, actually go ahead and create the table if it doesn’t already exist, create your indexes. You can use it to fully model that.

You can also specify, I just learned this the other day, you can have an initializer set of data, so if you have a set of lookup values you could also model that within the context of BIML. I think it’s really awesome just how expressive it is.

I’ve been doing this for I think two years now and I keep discovering. Today in Cathrine’s session, I learned, “Oh I didn’t realize I could do that,” and I have a to‑do to get even more knowledge out of her head. It’s impressive, to me, how much you can do with it beyond just the SSIS. That’s the easiest one for most people to grab onto. But it goes so much deeper.

Carlos:  We talk about that that it’s helping us with that data transformation. I’m familiar with a package, that package. There’s a file there, I can open it up. Is my BIML creating my SSIS package? Will I have two files then? Do I have one SSIS package, one BIMLScript?

Cathrine:  Yes. You will have a BIML file. It’s just XML code describing your package. The whole point of BIML is that it’s readable, it’s writable. It’s really easy to see how it’s going to be translated into your package. You read the code from top to bottom and that’s what’s going to happen in your SSIS package.

They use the same names for their BIML tags as the SSIS objects and tasks are named as well. That makes it quite easy to see. You can have an execute SQL task, you can have a data flow, you can have a source or an account destination, things like that. It’s named like that so it’s readable and it’s writable.

What happens is that XML code is run through what’s called the, “BIML compiler,” and that generates the SSIS packages for you. It basically just translates that readable and writable XML to the SSIS XML which is more verbose.

Carlos:  In the case where I’m going to create my destination, does the destination get created in the BIMLScript before the package gets created? Or is the BIML going to create some kind of T‑SQL script in my package that will then create the destination? Does that make sense?

Bill:  People really love that and the short answer is, “Only if you tell it to do so”

[laughter]Bill:  Within SSIS, initially when you connect to your OADB destination and you pick your table name, you can also click that Create button. That’s an enhancement that only exists in the Visual Studio tool. When you were in your package it won’t automatically create that table for you.

Carlos:  It has to already exist then.

Bill:  Exactly. If you wanted to properly, “Properly,” I say, if you wanted to do that, you would have to turn on, “Delay validation,” because when the package starts up it’s going to say, “Hey, this table doesn’t exist. We’re going to stop right now”

But you could have a precursor step, an execute SQL task that says, “Hey, if the table doesn’t exist go ahead and create it,” and then your data flow task will work.

We have our static BIML running through the engine as Cathrine described. You could also have, “Hey, as part of this, go ahead and create that table now for us” You could have a separate task fire‑off that’s C# basically that happens to run at same time.

I don’t know that I would do that because I think that would be very clunky. Unfortunately, you can probably do just about anything in BIML, so I think between the two of our minds, “Yeah, you could do that”

[laughter]Bill:  But that’s really not the workload that you necessarily want to take.

Cathrine:  You can do anything in BIML and BIMLScript as long as you tell it to. But at least for people who are just starting to learn BIML from the SSIS world, think of BIML as just another way to create your SSIS package.

Anything you can do in SSIS, you can do in BIML as well. If you want to create just a basic staging package where you load data from A to B, you would have to first create those tables in SSMS. That’s the workflow you would do while developing in SSIS, so it’s the same thing.

You would create those tables first, make sure all of that is good, and then write your BIML code to look up and get that data from A to B. You can write scripts and you can write really advanced C# code if you want to in BIMLScript to do all of that for you, but you wouldn’t really want to.

You want to have your database and source control in a controlled environment, and then just have SSIS read and load data. It’s the same thing with BIML.

Bill:  A really great thing, while you’re listening to my wonderful voice here…

[laughter]Bill:  …a great reason why you might be interested in BIML is if you’ve ever made a change to a package and you’ve tried to show it to someone else, it’s a nightmare to try and figure out what has changed between it.

A great story for why you might be interested in BIML is that it’s just XML. It’s not the just XML that is an SSIS package that is thousands and thousands of lines long. It’s going to be 30, 50 lines. You can use any text diffing tool to try and discern, “You made a change to this field”

It’s much, much easier. You can check that into your version control systems and discover, “Here’s what’s different between version zero and version one of this,” versus trying to do a diff between the SSIS packages and then crying tears for…

[laughter]Carlos:  Interesting. Now that scenario, would I have had to create my packages in BIML to have that difference evaluation? Let’s say I’ve created a package. I’m not using source control.

Bill:  Bad, bad man.

[laughter]Cathrine:  Use source control.

Carlos:  Use source control. That’s right. There’s been a change. I have two versions and I’m not sure what the difference is. I didn’t create them with BIML. Can BIML help me identify where the differences are?

Cathrine:  It will be a lot easier because in BIML the code is simplified, it’s readable and writable. You can compare line to line. If you have a destination, with some properties and you change that, it will just add another string to do that.

Carlos:  But I didn’t create these with BIML. Can I do that with BIML or would I have needed to create the packages with BIML to get that functionality?

Cathrine:  Yes. You would need to create the BIMLScript first and then generate the packages off of that.

Bill:  Although…

[laughter]Bill:  …there is a tool out there that you’re welcome to sample. It is called, “Mist” It will soon be re‑branded as BIMLStudio. There is a 14‑day trial. As part of that 14‑day trial you can reverse engineer up to five packages. In your situation, you could take reverse engineer package one, and then here’s the second package, package two and that would get you back to your original BIML.

That could be an option for you. The other plug I always have to give, the way all of this works, in the free version, is you need, as Cathrine said, BIDS Helper, bidshelper.codeplex.com. It’s a free thing that you install to work with your install of [inaudible 12:52] Visual Studio.

It has a feature that’s called, “Smart Diff” Smart Diff attempts to ignore the layout, separate the presentation part of your SSIS package from your actual physical I’m‑doing‑work part of it.

That can help you. You can only see what’s different but heaven help you trying to merge those differences between package zero versus package one.

Carlos:  You’ve said script, so SSIS. I’m not a developer. I came, actually, from the network. I was going to be in network, a CCNA guy. I started getting into databases. T‑SQL, as far as languages go, that’s my max.

I’ve done some SSIS. The GUI nature of it makes it very easy to get in there and start plugging along. What would a person need, from a scripting perspective, to understand, to start using BIML?

Cathrine:  To start using BIML, it’s actually really easy because you get the Intellisense in Visual Studio with BIDS Helper. Bill mentioned that in his session. Just play with it. Do an open bracket, and then see what comes up in the Intellisense. Press, “Tab,” see what happens and what comes up.

That makes it really easy to learn BIML, the flat BIML, because it’s just XML and it’s easy. Then you can take that one step further and start learning BIMLScript which is C# code nuggets that you use to manipulate your BIML code. That’s the next step where you need to look into C# development. C# code.

But by just knowing a couple of basic C# things like doing foreach loops, doing if‑else statements, simple, simple control blocks, you can get some really powerful features in just a couple of hours by learning that. There are some great tutorials on especially bimlscript.com.

They have walk‑throughs, they have the whole learning path now on the new website where you can start with the basic, “What is BIML?’ All the way through C# code and automated development. Absolutely go there if you want to learn BIML. That will take you from where you need to start learning the basics to a full staging environment and creating that.

Carlos:  Who owns BIML? BIML is its own company? Is it a Microsoft thing? Is it like BIMLScript? You have BIDS Helper on Codeplex, so that’s free. But who’s maintaining bimlscript.com?

Cathrine:  It’s a company called Varigence based in South Carolina. They created BIML and BIMLScript. They also created the IDE called Mist. That’s the licensed tool you need to use for the more advanced features. Then they put their BIML compiler into BIDS Helper and free BIDS Helper.

Varigence teamed up with BIDS Helper guys, gave them their full engine, so you can start using the BIML compiler and start learning that. They also run the bimlscript.com website which is more of a community website.

They have varigence.com as well, which is their business website. Bimlscript.com, yes it’s for the community, but it’s also by the community. I know we’ve posted blogs, external content, things like that, walk‑through snippets. Everyone can contribute to that website.

Bill:  Useless historical fact. BIML actually stems from Project Vulcan, which is something that as I understand it came out of Microsoft research. It’s, “Here’s this cool thing that we thought of,” and then it went away. But the guy, Scott Currie, who started Varigence, said, “I like that,” and he took it and ran with it and now my life is much easier.

[laughter]Carlos:  OK, very good. That is ultimately the reason why you would want to consider using BIML is to get out of some of those manual processes and being able to automate…

Cathrine:  Yes!

Carlos:  …with those components.

Cathrine:  For SSIS when you start learning SSIS, the GUI is great. It’s really intuitive, it’s easy to learn but when you need to create hundreds of packages that follow the same pattern, you need to drag‑and‑drop and align and connect, and drag‑and‑drop and align and connect, and create the same thing over and over and over again, you don’t want to do that. You want to be lazy developers.

[laughter]Cathrine:  You want to write it once, and then have everything automated and created for you. That’s what BIML and BIMLScript allows you to do with SSIS development. The cool part about this is, even for SSIS when you start creating your packages, you can create SSIS templates, which makes it really easy to create more packages based on that template and you don’t need to create everything from scratch.

That is a nice feature in SSIS. The problem is when you need to update that template because you can’t just update the template and apply that to all your packages. You would have to go back and change every single one of the packages you created. While in BIML, you can just change one line of code and have that regenerate all your packages and you’re done.

Bill:  For me, my use case and the reason I really do love BIML, I answer a lot of questions on forums. People asking about, “How do I do X in SSIS?’ If you ask the question on Stack Overflow I’m probably going to answer your question, unless it touches Excel, because no one likes to deal with Excel in SSIS.

[laughter]Bill:  But, it used to be if I were giving an answer, not only would I have to write up all the text, then have to take lots of copious screenshots of getting every single thing in there, now I can maybe get the relevant screenshots put out and then say, “Here’s the BIML. Copy‑paste this into your thing,” assuming you’ve installed BIDS Helper, “You can see exactly how I’m showing you to solve your problem”

You’re trying to learn how to do X. Here this simple 15, 20 lines of code, you’ve got a working example. Otherwise I’d have to upload an SSIS package to someplace and you have to download it and trust that this random code I’ve downloaded off the Internet is not going to toast my machine.

For me it’s a wonderful mechanism for sharing. A very concise way of, “How do I solve this problem?’ “Here” Then you can take that pattern and apply it throughout whenever you need to use it.

Cathrine:  That’s a really cool use case, actually.

Carlos:  Very cool. Now, I once had someone say that if you didn’t learn PowerShell within five years, you weren’t going to be a DBA.

[laughter]Carlos:  We’re going to separate the haves from the have‑nots. Is BIML of that same category? From an SSIS developer perspective, not that they won’t exist, because the GUI is not going to go away, but from a being‑good‑at‑what‑you’re‑doing, will BIML be required knowledge?

Cathrine:  I’m not sure it will be required, but it will make your workday a lot simpler, a lot easier, and that’s the great part about this. You don’t even have to tell anyone you’re using BIML to create your SSIS packages.

You can’t tell the difference between a package you created by hand and the one you created in BIML. BIML is, for SSIS packages at least, it’s just a tool to create those packages for you.

There’s this whole thing that when you start learning and using BIML you finish your work for the week before lunch on Monday, and you take a rest of the week off. Just don’t tell your boss you’re doing that. You’re on your, “BIML break,” is what they call it. It’s actually on my T‑shirt as well.

That’s the fun part about BIML. You don’t have to tell anyone you’re using it. It’s not a very big install, nothing like that. If you’re working heavily with SSIS, start learning BIML as well because you can really save up a lot of time to get more focus on the business value instead of just the plumbing of creating the same package over and over and over again.

Bill:  Completely agree with all that. I can also add in, at least, my personal experience. I use the Base.net libraries to create SSIS packages. If I was heavy in the C# realm, maybe that makes sense, but I think most people that are doing SSIS probably aren’t that heavy into it.

I’ve used Easy API to do it. It’s a little bit better than the Base libraries, but it left me wanting. I’ve been doing this for two years. If Microsoft comes out with the coolest new thing tomorrow to help me make my ability to create packages so much simpler, I would totally drop BIML. But I haven’t seen that.

I haven’t seen anything coming that has made me say, “BIML is great, but..” I don’t have any buts yet for my BIML. I don’t know what your experience is, but I still love this thing, and I’m still learning cool things because they keep adding more features into it.

To me, it’s really awesome. I don’t have any things like, “I just hate it,” about this thing. It’s like, “Give me more,” I wish I was doing more things with it.

Cathrine:  Absolutely, and it’s not just for SSIS, as well. You can do all your relational database things. You can get that model. You can get your cubes and dimensions and facts model and you can use it to generate things like T‑SQL scripts, instead of using dynamic SQL, you can just write BIML scripts to generate that for you.

There are so many use cases where you can use BIML when you start digging into the more advanced features, and you can build up a whole toolbox or a tool belt of BIML scripts that you can run ad hoc. I’ve seen examples of things that are super useful for DBAs for example or into the BI world, that they can use one BIML Script to automate things and check things, and change things by using BIML Script.

When you start digging into it it’s like, “I haven’t found any buts yet” It’s just really awesome.

Carlos:  That example, because I was not aware that there were scenarios where DBAs could be using that for automation purposes, for monitoring or maintenance type. Is that all on the bimlscript.com as well? Is that where those examples would be?

Bill:  Think about what you do. You probably need to, “I need to operate on this set of servers” You know what your servers are, “For all these servers, I need to perform task X” Maybe you’re going to use central management server to fire off queries across all those or, “I need to do [inaudible 23:17] ”

If you don’t want to do those things, you can use BIML Script, so you have the ability to run a SQL script, “Cool, I’m going to define a connection and I’m going to process my first server, change that, go to the next server, go to the next server” You can very easily define a package that’s going to go do that task.

Carlos:  Interesting. Where would some of those examples be? Good scenarios to…? [laughs]Bill:  Right here in my head.

Carlos:  In Bill’s head. Very good. We’ll hit up Twitter. If your Twitter feed blows up…

[laughter]Carlos:  …with requests…

Bill:  The best place to start is bimlscript.com. Again, that’s the community resource. If you don’t see what you need on there say, “Hey,” just ask a question, “Does anyone have one of those?’ Then people will come out of the woodwork, because that’s the beautiful thing about the SQL community is people share.

It’s like, “Oh, you want to see how I do it?” “Oh, I don’t do that. I do it a little bit differently.” You’re going to get flooded with people trying to share their knowledge of, “I’ve scratched that itch and here’s how I went about solving that problem” If not, I guarantee Scott will say, “Oh yeah, I can do that.”

It was great. One night, Jamie Thomson asked on Twitter…

Cathrine:  I loved that.

Bill:  …he was like, “How do I do this?” We used to do this tabular data file thing. I spent about an hour mucking around and I was like, “It’s late. I’m tired” Scott went out and live recorded, “Here’s me solving the thing,” did it in like an hour. He’s like, “Yeah, I probably could have done this faster except I changed my mind in the middle of it,” or something like that.

[laughter]Bill:  The owner is so invested in trying. You give him curious problem, he’s a [snaps fingers] super sharp guy. He comes up with things that I’m like, “Oh God, I wish I had a tenth of your brain power.”

Carlos:  Oh, very cool.

Bill:  They are very engaged in promoting the community aspect, the community side of the organization. I have great respect for them.

Carlos:  You’ve almost persuaded me to… [laughs]Bill:  Give up your SQL ways?

Carlos:  …take BIML a little more seriously [laughs] almost. I’ll have to spend a little more time on bimlscript.com. Great. Thanks for being here. I do appreciate it. I think it’s been a great little discussion. There will be, of course, potentially some additional information to what has been presented.

We happen to be here in Portland today, where both of them presented at a, “SQL Saturday” We’re going to provide links to their presentations. It will be in the show notes at SQLdatapartners.com/podcast. That information will be available there, as well, of course with all the links to the things we’ve talk about today.

Before I let you go, and I know we’ve been hot and heavy into BIML today, but one of the things we like to do is that idea of sharing so that people become aware of other tools that are available is to talk about your favorite SQL tool. It’s OK if that’s BIML.

[laughter]Carlos:  But we’re looking for your favorite SQL tool, free tool, paid tool but why you like it and how do you use it?

Cathrine:  My favorite tool is Red Gate SQL Prompt in SSMS. It is a paid tool, but it saves me so much time. I don’t have to write all my SQL scripts by hand. They can save snippets so you can start using them instead of writing your entire code. It has a lot of great features, auto‑formatting, auto‑completion, things like that. That’s the one tool I cannot live without. It’s SQL Prompt.

Just to add to that one, because that’s a paid tool. A free tool, as well, is SQL Search, Red Gate SQL Search that allows you to search through all your databases across different servers. You can search in tables, use functions, whatever, and anything in your database just search for it and you will find it immediately. One paid tool, one free tool.

Carlos:  Very good. Bill?

Bill:  Definitely, I concur with Cathrine that Red Gate SQL Prompt is absolutely the best thing in the whole wide world.

[laughter]Bill:  I taught a class yesterday and I had already installed it on the machine and they’re like, “How are you doing that?’ I’m like, “You paid for this class…

Carlos:  “Funny you should ask”

Bill:  “But go ahead and spend a little bit more, because this tool is the best thing ever.” The efficiency I get out of that thing is well beyond what I paid for it.

The free tool plug I will give will be, if I take off my developer hat and go more towards my trying‑to‑figure‑out‑what‑the‑heck‑is‑going‑wrong hat, I’ve got to give my love for SQL Sentry’s…

Carlos:  Plan Explorer.

Bill:  …Plan Explorer. Thank you. My brain has stopped working.

[laughter]Bill:  Got to love that tool. I don’t use it often enough because, blessedly or unfortunately enough I don’t do enough on that side of stuff, but I really enjoy what I can do with it.

Carlos:  Great tool?

Cathrine:  Yeah.

Carlos:  Great. Again, thanks so much for being here. We do have one last question for you. But before we do that let’s hear one additional way folks can learn about SQL Server.

[commercial break]Carlos:  Cathrine, Bill thanks again for being here. Before we let you go, if you could have one superhero power, what would it be and why would you want it?

Cathrine:  I want to fly.

Carlos:  Fly. That’s mine.

Cathrine:  Yes, that’s my favorite superhero power. I can travel anywhere at any time, and just that freedom of not being stuck on ground.

Carlos:  Yes, I’m with you. Bill?

Bill:  My favorite superhero was always Green Lantern. I want a magic ring that lets me materialize whatever I’m thinking about.

Carlos:  There you go. OK. Very nice. Great. Thanks again for the conversation, and for informing us a little bit more about BIML. Compañeros, we hope you enjoyed the conversation as well, and we’ll see you on the SQL trail.

[music]Children:  SQL Data Partners.

Episode 17: SQL Server Partitioning

1400Do you find you have tables and indexes that have grown really large? Are you moving around a lot more data than you used to? Are your windows for jobs and maintenance getting tighter? If you answered yes to any of these questions, partitioning may be for you!
Partitioning allows us to break a table or index down into smaller more manageable chunks.  Partitioning can enable us to perform maintenance on just part of a table or index. We can even move data in and out of tables with quick and easy metadata only operations.
In this episode, we’ll go over basic partitioning concepts such as horizontal vs. vertical partitioning, how to identify if a table is ready for partitioning, and what you need to know about your applications before you implement partitioning.

Show Notes

Brandon on Twitter
Brandon’s Partitioning Presentation (Coming Soon)
SQL Sentry’s Performance Advisor

Transcription: Partitioning

Carlos L. Chacon: This is the “SQL Data Partners” Podcast. My name is Carlos L. Chacon, your host, and this is Episode 17. Today we are talking about partitioning, could be a bit of a niche topic since you have to have SQL Server Enterprise Edition. However, if you are planning for growth or maybe you were already using partitioning and maybe not seeing the results that you’d expect, then of course this episode will be for you.

I do want to say, we have a great community, and Brandon is a great of example of that. Lots of people willing to devote their time and energy to help others along the SQL trail. Sometimes, these podcast recordings don’t go quite as planned, so I’m grateful to Brandon for giving me a little extra time with him to make sure that we had a good recording.

If you were at the Summit last month in October, in Seattle, I hope that I had the chance to meet you. I know I’ve made many new compañeros, always interesting to see the different environment that people come from. I hope that we had a chance to talk.

While at Summit I attended a session where we talked about branding words. I thought it would be appropriate just to review some of the thoughts that I had about why I created the podcast and what I hope that everyone gets out of it.

I think if I were to use three words to describe what I was hoping to do with the podcast it would be community, fun, and learning. We’re trying to build a sense of community. We want to have a little fun on the way, and of course provide some learning opportunities or some things that at least you think about that you can further your learning along.

I hope that you’re getting some value out of the podcast. I know I enjoy putting them together. Talking with many different people, they provide such valuable insights and things that I wasn’t necessarily thinking about and I hope that you feel the same way.

We do always, however, welcome your feedback. We want to hear from you. You can hit me up at [email protected] or you can hit me up on Twitter @CarlosLChacon. Of course, you can leave your feedback on iTunes or on Stitchers. With that compañeros, welcome to the show.

Recording: SQL Data Partners.[intro music]

Carlos: Today we’re talking about partitioning. Of course I have with me Brandon Leach. Brandon, welcome to the show.

Brandon Leach: Thank you very much for having me. Thanks for taking a little bit of time out. We find ourselves in beautiful Portland and it’s always good to catch up with you.

Carlos: So we’re talking about partitioning. A specialized topic because it’s only available on SQL Server Enterprise Edition. It’s fairly straightforward to set up, but can quickly go sideways if we’re not careful with what we do.Now when we’re talking about partitioning there are two options available to us, right? There is vertical partitioning and then horizontal partitioning.

Brandon: Yeah, vertical partitioning being when you are essentially splitting out columns into separate tables versus horizontal partitioning which is what the table partitioning feature that SQL Server deals with.

Carlos: Right. So vertical partitioning, I have a table that’s 10 columns long or 10 columns wide. I want to take five of those columns, make table one. 6 through 10, make table two.So horizontal positioning, I’ve got 10 billion records, right? I want to group those. I want to level those. I want to put a billion records into 10 different tables. But the columns are all still the same.

Brandon: Well, 10 different parts of tables. In table partitioning, we tend to think the table is mostly a collection of rows. But every table already has a least one partition, so a collection of partitions actually forms our table and our indexes.

Carlos: Fair enough, fair enough. So you’re talking about basically a subset of that table?

Brandon: There’s essentially a hidden, additional layer or tables that aren’t officially partitioned. They have partition one, and that’s it.

Carlos: And that’s it. Right. So when talking about SQL Server partitioning, we’re talking about this horizontal partitioning?

Brandon: Correct.

Carlos: So why do we want to partition?

Brandon: Well, so often times it’s for maintenance and there’s also a performance indication there. But let’s talk about the maintenance part first. As tables grow larger, it can take us longer to, say, defragment those indexes.

Carlos: Archive data might be another example?

Brandon: Yeah, archiving of data is definitely another one. Being able to move data around from one place to another within the same data file even.If you’ve got staging tables, you can actually switch partitions out and put them into either their own table or into another table that’s very common as a key component of the sliding windows technique, where you have a table that is simply a heap.

It’s a heap table, no indexes or anything. You dump your incoming data into that as fast as possible. You then apply your indexes on that staging table and you can use the ALTER TABLE switch command and say this table is now a partitioned of that one.

Carlos: You prepped it, it’s on stage, it’s in the wings. You pat it on the back and you are like, “OK.” You toss it on stage. Now you are ready to go.

Brandon: There are some caveats with that though. Everything, the schema, the design has to be the same. Same constraints, same foreign keys, same primary keys, everything has to be the same.

Carlos: Right, so from the object perspective the database is going to look at those and it’s going to say, hm, these aren’t quite the same and it will reject it.

Brandon: You can essentially move a billion rows from your staging table into your production table in a millisecond, because it’s a data operation. It’s just checking to make sure they match and that nothing is going to be crazy and then it’s going to update the metadata.

Carlos: There you go. That’s therefore the beauty of partitioning.

Brandon: You can use that for archiving data out too.

Carlos: The opposite would be true, right? Now I have all this stuff. OK, you guys are ready to come out.

Brandon: Pull out the partition you don’t want into a new table. Run your ETL process against that new table. Now you can move that data elsewhere without blocking the original table.

Carlos: Very nice. You have all these different pieces that you can put together.

Brandon: That kind of gives you both the performance piece too.

Carlos: You talked about index maintenance. If we can separate from that we can decrease the number of reads that are going against our table because we are only going against part of it like a filing cabinet.

Brandon: Correct. Say I’ve got an index on a large partitioned table. I can analyze the fragmentation levels of different partitions. So now I can go through and I can say, well this partition, it doesn’t need to be touched. That second one needs to be re-orged. That third one needs to be rebuilt. So I’m able to speed up my index maintenance. I’m able to be more proactive.

Carlos: Customize.

Brandon: I don’t know if you ever tried to rebuild or even re-org a terabyte-plus sized table.[laughter]

Brandon: If you have you know it’s painful.

Carlos: A funny story. We’re in two different projects, there are two DBAs where we were both doing something different. The other DBA was actually in England at the time. We were talking and we decided, let’s just rebuild it.We rebuilt the index. Of course it rebuilds the index when the other one was still in play. And we ran out of disk space. [laughs]As the result of the rebuild we put in the request to have it come back. Then we had to roll that whole thing back, and it took longer. People were waiting and not good. [laughs]

Brandon: I’ve seen tables where it would take you days to rebuild the whole thing. Even as an online operation, it’s not just the time. It’s the amount of log generated. Any sort of change like this has to be tracked.

Carlos: Shannon Fawler told me once he was down in Charlotte and he was working with a large warehouse environment. They had a table that needed to be defragmented. Now luckily this wasn’t a “production.” Outside users weren’t using it. It was internal to the business.So he kicked off the re-org, the rebuild on Friday at 5:00 pm before he left. Monday he came back in and it was still chugging away. [laughs]This gets down, everybody’s like, “Hey, sign me up. I want partitioning.” But it may or may not be for you. Not only because you don’t have Enterprise Edition.

Brandon: There are some good and bads with partitioning. The real question is why do you want to partition?

Carlos: I have a quote from Michelle Uffer and she says, “‘I only partition tables that I expect to go quickly.” She says her rule of thumb is greater than 10 million records a week or tables that have regular archiving needs. That’s a lot of data, 10 million records.

Brandon: Well, it could be and it might not be. Here’s the thing.

Carlos: It is in a week.

Brandon: Sure. With 10 million records, what’s the row size? How much space does each row take? Sure, I’ll look at row counts, but it’s not my primary decision-maker when partitioning.

Carlos: It sounds like row size is?

Brandon: Well, it’s not row size, it’s table size.

Carlos: It’s table size.

Brandon: There are at certain points where I know it’s going to be more painful for me to start with that management. I will partition and I want to partition when I know that table is going to become a problem in terms of maintenance, or there’s a clear, defined performance reason. We didn’t talk about performance before, so let me bring it up.There are certain benefits to partitioning — there’s partition elimination, first and foremost. Where SQL Server is smart enough to look at the table, its schema and say, “Well, based on this query…” I’m searching for, say, a date range, it’s the most common partitioning column you’ll find, or partitioning key, I should say. It’s going to be a date column.

It looks at this and says, “Well, I’m looking for data within these range of dates. I know these partitions.”

Say we have five partitions. These two partitions can contain that data. I know the others can’t. Instead, I’m going to focus on those two partitions and forget about the rest of them. This is a wonderful thing if you know you’re going to scan. [laughs]

Carlos: Sure. Then it goes back to the reads, we are bringing in less data and it’s eventually, going to end up in memory because SQL wants to read from there, so we have less resources that we’re…[crosstalk]

Brandon: And your index can be partitioned. When you partition a table, you rebuild the table on the clustered index, you have to have a clustered index there. You have these objects underneath that partition scheme and your partition function, you rebuild on that partition scheme.That has the partition function linked to it, so it knows how to create the buckets and how many recreates and setup, and where to put those partitions. Are we sending it somewhere else, to different file groups or are we putting it on the same? You do both scenarios for different reasons. But you get another benefit, there’s partition locks.

I’m sure every DBA out there has seen an issue at some point where they’ve suddenly escalated from a row lock to a table lock. That’s painful.

Carlos: Now my system has hang up now, right?[laughter]

Brandon: Now that table is unusable.

Carlos: We’re going to wait for the transactions to finish.

Brandon: But when you have partitioning, we essentially have another lock level in place that we can turn on and use.

Carlos: Another option. An escalation path, from row to partition and then…

Brandon: From row to partition and then to table.

Carlos: If we’re using our partitionings well, hopefully we don’t have to go there. Luckily we’re using our keys and our queries and things like that which will be to refer to again, but yes, that’s how we setup.

Brandon: One of the things that you have to consider is that, back to the original question is that, yeah, you want to rebuild that table before it’s a problem.

Carlos: This goes to the planning, right? Partitioning should be in part of the planning process…

Brandon: No.

Carlos: …not a reactionary process.

Brandon: Yeah. If you’ve got a terabyte-size table, that’s going to take a lot of time to rebuild. Can you take the hit of partitioning that table?

Carlos: Take it offline, make it unavailable. You may not be able to. We’ve talked about situations where that’s not possible.

Brandon: Maybe you get by that. Maybe you can do some hybrid table partitioning, partition view to create a new table.

Brandon: Exactly.

Carlos: So more complexity. Poor planning just leads to more complexity.

Brandon: Ideally, you want to identify these tables before they become a problem, these tables that have the potential. I would say, generally, when a table size is about 40 gigs. that’s when I start to look and I say, “Is this a good candidate for partitioning?” Sometimes it is, sometimes it’s not.

Carlos: Let’s delve to it a little bit further, identifying. You’ve given us a size metric there, so now it’s kind of on radar, partitioning radar. What would be the next step in selecting whether partitioning is a match?Now, you think something maybe we should probably step back and talk about and that is the partitioning key. This is basically the column that we’re going to slice that information by.

Brandon: Yes, and it can only be one column.

Carlos: It can only be one column. This column then becomes very important.

Brandon: It does. Ideally, it will be one in which it’s going to be part of your queries in there, one of your search arguments, essentially.

Carlos: Does that mean the WHERE clause?

Brandon: Yes.

Carlos: Has to be in the WHERE clause.

Brandon: Ideally, yes.

Carlos: Ideally.

Brandon: Because if it’s not in the WHERE clause, SQL Server is not going to be able to take advantage of partition elimination.

Carlos: To interpret it, the optimizer will be able to say, “Oh, hey, WHERE this? Oh, hey, you know what? Before you should mention that, I’ve got a partition that matches that.”

Brandon: Yeah. If it’s not there, SQL Server can’t use that. It’s the same reason you try to build your indexes around how you are going to search through the data, how you’re going to SELECT that one.

Carlos: To continue on there, it’s not just going back to using DMVs and finding what the most common queries are? Are you going back and talking to your developers and saying, “Let’s look at our big reports.”

Brandon: It depends. I would say that you should understand and see the code. I really want to see the SELECTs.

Carlos: The queries that are coming in.

Brandon: Yes, that are coming in against that table. I’ll pull the numbers off the DMVs, but I also want to talk to [inaudible 18:40] , because I want to know if that’s likely the change.Because if I partition one way and then six months down the road, they’re already planning on doing something X, I want to know about that because maybe there’s an opportunity. Maybe they’re going to change things and it might be better to use a different column for your partition key.

Carlos: I cannot tell a lie here. I haven’t had a ton of experience with partitioning, but it seems like a lot of the problems people run in to is they haven’t had those discussions. The DBAs, the data stewards if you will, they say, “Oh, we have a performance problem let’s partition,” and they do it in a vacuum. Then they are like, “Well wait a second, how come things didn’t get better or minimal improvement?” kind of thing.

Brandon: There is a stereotype of your typical DBA as more reclusive, the bearer of no. Always saying, “No.” Doesn’t really get along with anyone. The problem is in order to be a really good DBA, you have to have those conversations. You have to understand your environment, your developers, how they work, their code. You have to be part of their meetings.Some companies like mine, we break it down. So you have data base operations and data base engineering. Operations is more infrastructure. That’s where I am. But everyone in operations still works directly with developers too. We are getting that information either in person or from the database engineers who are SQL developers.

Carlos: How do you get integration mix there?

Brandon: You have those discussions like communication. You will end up with better systems and less problems.

Carlos: Ultimately that’s what we want. Data base administrators who are responsible for the data and we can create less problems then that’s a win for us.

Brandon: Yeah. Also, one thing partitioning does show you how constraints help your query plans. Because the old way of doing it was that partition views, where you had separate tables and a view where you would union them all together and then you would get one view.As long as you had constraints, the planning what data could be within each “partition,” you could actively get partition elimination. So SQL server looks at the constraints you have.

Carlos: It takes all that stuff, all those little pieces, tries to put together a picture that says, “OK here’s what I think you want.”

Brandon: When I present on this stuff in person like I’m doing today, I always spend time. I actually show partition elimination using partition views so that even the people who don’t have Enterprise have a tool and everyone can see partition elimination.

Carlos: Very good.

Brandon: It’s much easier to see.

Carlos: We will put a link up on the show notes SQLpartners.com/podcast and you can go there to look up today’s episode on partitioning. We’ll make sure we have a link to Brandon’s slide deck if you want to review that.

Brandon: It will have queries in it. You can actually go and see partition elimination. A solved sample data base. Everything you need.

Carlos: Very cool, very cool. Yes, that would be a big help for those who want to try to play around with it a little bit. Again, the most important thing is knowing your data. Know where it’s going, planning for the future and then because you know that you’ll be able to choose a key where your queries are in the WHERE clause, you create that partition and then you are the hero.

Brandon: Yeah, you’re done. You don’t have to modify your queries. Table partitioning is transparent to anyone running a query.

Carlos: Very cool. Well, Brandon we do appreciate it. Thanks for being here. Before we let you go we do want to talk about something.You don’t have it on now, but I know earlier you had your SQL Cruise jacket on. So you are a former SQL Cruiser, right, on the show? I’d like to talk about SQL Cruise, so tell us about your experience on SQL Cruise.

Brandon: I love SQL Cruise. SQL Cruise was a turning point in my career. It really was. I went on SQL Cruise and it’s not just the sessions. The sessions are great, but the people you meet, they change your life.

Carlos: Yep, I couldn’t have said it better. For those who want to check out more on SQL Cruise you can go to SQLCruise.com. Of course, Tim has extended us a hundred dollar discount if you want to go to sqldatapartners.com/sqlcruise we’ll give you some information on how to get that information there.So we do have a couple of standard questions we like to ask here at the end. We want to provide some additional value. Of course the partitioning conversation was very, very good. But we want to know what’s your favorite SQL tool? Paid tool, free tool, but why do you like it and why you had to use?

Brandon: There are so many wonderful tools out there it’s really hard to pick. I was thinking about that question a lot [laughs] throughout the whole conversation. Let me give you some of my favorites. I’ll try to keep it short.

Carlos: Can you give us one and we’ll talk about the others? Is there a top one?

Brandon: Yeah, from a DBA perspective it’s definitely SQL Sentry.

Carlos: The Plan Explorer or the Monitor?

Brandon: Either.[laughter][crosstalk]

Brandon: It’s wonderful. Plan Explorer is amazing. In terms of free tools, yeah, Plan Explorer is a go-to, easy-to-use tool that you can get completely free. SQL Sentry’s Performance Advisor is awesome.It has pulled my butt out of the fire so many times. Just being able to go down and see all the information you need right in front of you and you go oh, that correlates to that and this is why. I mean I had a CTO actually look at that product at one point and we had an actual production issue.

We were pointing at production. It was once after we implemented. He literally drilled down and he found the problem. He’s not a DBA, he’s not even close to a DBA. He found the problem in five minutes.

Carlos: Wow.

Brandon: [laughs] Made a phone call, they stopped running that report, which wasn’t supposed to run then anyways.

Carlos: Life went on.

Brandon: After that, he said, “I now no longer need to know anything about this tool.”

Carlos: [laughs]

Brandon: He says this was a good buy.

Carlos: This is a good buy. Very good.

Brandon: SQL Pro from Redgate is another necessity if you are doing any sort of T-SQL development, design. It’s a huge plus.

Carlos: Very good. So, our last question. If you could have one superhero power, what would it be and why would you want it?

Brandon: One superhero power. A lot of people would say something like flying.

Carlos: Hey, that’s mine![laughter]

Carlos: Don’t go too harsh on flying.

Brandon: No, no. You know, I don’t need to fly. That’s what planes are for. What I would love is just to be able to just know. For example, when I’m troubleshooting an issue, it would be wonderful to just know about things I don’t have access in to.Knowing that’s in admin, we’ll give you access to his reports. I’d want those. Just knowing stuff like. I’d also probably win the lottery by day two.

[laughter]

Carlos: [inaudible 27:58] that’s interesting. Would you know after things happened or do you have some insight into the future?

Brandon: To be honest, to be able to just see and know at that time, which is why tools are wonderful things because they help assuming you have access to them.

Carlos: There you go.

Brandon: Even just to understand and feel what other people are kind of feeling at that point in time, because that can help you in communications so much. I don’t need flying. I want knowledge.[laughter]

Carlos: You are much nobler than I am Brandon.

Brandon: It’s not so noble, trust me.[laughter]

Carlos: Well we do appreciate you being on the show today. It was great to have you.

Brandon: I’m honored.[music]

Carlos: Thanks again and compañeros, be sure to check out the show notes sqldatapartners.com/podcast for all updates and what we talked about today and we’ll see you on the SQL trail.

All Aboard the Azure Train

AzureDatabaseI spent a week out in Seattle at the PASS Summit even the last week of October, and my goal was simple–absorb as much as I could about Azure.  What made that goal super simple was the number of Microsoft folks present in the SQL Server Clinic–I met the program managers from the Stretch tables group, the Azure SQL database group, the Query Store group and even spoke with a few customers about their growing pains.

When I looked at Azure about 3 years ago, it was a shell of the current state.  I am not saying there aren’t still issues; however, it is really coming into shape nicely.  When you consider that Microsoft has more infrastructure than Google and Amazon COMBINED, it should get you thinking.  My own state of Virginia will have 2 MILES of nothing but servers when the current plan is complete.

It is currently a 6 BILLION dollar business and I see more and more companies wanted to take advantage of it.

I spoke with James Phillips, the VP of the Business Applications in the BI space and what is interesting is how IT jobs are going to change with the adoption of the cloud.  He said “the move the cloud is inevitable” and we can either be a part of it or be left behind.  Some of the traditional roles we no play will no longer be needed; however, when we talk about connecting systems those roles will be more important than ever.

I know I am making an effort to switch my business around Azure and I am not along.  Join the march, and I will see you on the Azure SQL trail.  🙂

https://azure.microsoft.com/en-us/

 

Episode 16: SQL Server Agent

1400

I learn things every day.  One of the great things about the SQL community is the willingness to share their learning experiences and today’s episode is an example of this.  I chat with Mark Wilkinson about the SQL Server Agent–some best practices and MSX–a feature I have not used in the past.  MSX is a feature that allows you to keep your agent jobs in synch, but like always–there is a catch.

If you are interested in expanding your SQL Server agent horizon, this episode is for you.

Show Notes

Mark on Twitter
MSX/TSX Multi-Server Administration
 Security Alerts you should have set
Fail Safe Operator
How to tell jobs are out of synch

About Mark Wilkinson

SQL Server Agent

Mark is a father of 4, and lives in Raleigh North Carolina. In his spare time he likes to read, speak, and learn new things. Mark currently works as a Database Administrator at ChannelAdvisor where he solves fairly interesting problems on fairly large databases.

Transcription: SQL Server Agent

Carlos Chacon: This is SQL Data Partners podcast. My name is Carlos L. Chacon, your host. This is episode 16. Compañeros, welcome to another great edition. I’m super excited to have Mark Wilkinson on the show today. As this goes to show you, you can always learn something new. This is one of the great things about the SQL Server community is their willingness to share.

Our topic today is the SQL Server Agent. That may seem a little bit boring, a little bit dry, but I want you to hang in here with me. Now, if you know what MSX is then maybe you can turn it off. I’m sure Mark has something that’s going to be very interesting to you. I personally didn’t know what MSX was. It’s very interesting to me you’re going to be able to learn something new about this technology.

As always, if you like the show, you like what you’re hearing, we want you to subscribe in iTunes, or on Stitcher, or some of the other channels, as well. You can hit us up on Twitter. I’m @CarlosLChacon, or you can hit me up on email, at [email protected].

The notes from today’s session will be available  at sqldatapartners.com/podcast. You can get the information for today’s session there. There will be lots of things we’ll talk about, that you’ll want to deep dive into a little bit later, and so we’ll make sure that that information is available to you there.

Another exciting episode. We’re definitely looking forward to it. I hope you’ll stick around until the end. As always, Compañeros, welcome to the show.

Recorded Voice 1: SQL Data Partners.[music]

Carlos: Today, I have Mark Wilkinson with me. He’s Database Administrator, at Channel Advisors. We’ve seen each other on the SQL trail, a couple of times, but we happened to here, at a SQL Saturday, in Raleigh, North Carolina, looking at his session, talking a little bit about SQL Server agent, and again when he was talking about MSX, TSX, I thought, “Hmm. That’s something interesting.”I thought it would be interesting to you Compañeros, and so I wanted to have him on. He is on Twitter, @m82labs, and he currently hails from Raleigh, Virginia. Mark, welcome to the show.

Mark Wilkinson: Thank you very much. Raleigh, North Carolina.

Carlos: What did I say?

Mark: Raleigh, Virginia.[laughter]

Carlos: No. Now you know all the roads lead to Virginia. I guess this is the way I think about that, so I apologize. Raleigh, North Carolina. My brother in law will have me hide for that. Ultimately today we want to talk a little about SQL Server agent and why we want to talk about the SQL Server agent?

Mark: I think it’s really one of those features that the people use every day and they must spend a lot of time digging in. It runs your jobs. Why would you do anything else right?

Carlos: There you go. There are some things that you could do to make that a little bit better particularly I think as we look into a larger scale environments those are some of the ideas that you have.

Mark: Exactly. In my particular environment, we manage hundreds of instances. Some of them have got a 150 plus jobs on them, so scripting all that staff out and keeping it all in sync can be a big task.

Carlos: That’s right. Being able to kind of wrap your head around that if there are some tools or some tricks and we can get to help automate that or make it little bit more convenient then those are some of the things we’re going to share today. Let’s get into that. Best practices when it comes to SQL Agent Server jobs.

Mark: There’s a lot of opinion on our best practices. I guess I’ll share my opinions.

Carlos: There you go.

Mark: Service accounts first thing you got to deal with when you’re setting up SQL agent. You should not be using a built-in account. Don’t use a domain admin account. Best idea is just to use a domain account created specifically for the agent that gives you a few things.If you have a lot of instances, you can have all of your agents using the same account. It makes permissions a lot easier if they need to access a file chair things like that.

Carlos: I think that for Microsoft standards the best practices that the SQL service account and the agent account should be different. Agree? Disagree? Do you follow that practice probably better…?

Mark: We used a different account.

Carlos: Maybe I guess I’m alone wolf in that regard, but I guess I’ve been in a hold out and we’re still using the same thing.

Mark: I don’t think there’s any problem with it. We just happen to use different things.

Carlos: I think it just the security rights differentiating what those accounts can do and providing a better auditing.

Mark: Oh, definitely for auditing purposes it’s nice, especially views like SQL audit because you can see the user account that is doing whatever you’re auditing.

Carlos: I think I could probably be better than that. It’s not that hard, right? right?

Mark: No, it’s not that hard.[crosstalk]

Carlos: Service accounts and we want to create something domain account, doesn’t have too many permissions but has enough to get through work what we need to go.

Mark: Exactly.

Carlos: I think you bring up a good point. You want to be consistent on your environment that is one thing what I am doing is all the same.

Mark: That makes things a lot easier especially when you got a couple hundred instances that you’re trying to keep your hands around.

Carlos: That’s right. You create your job is going to run under that service.

Mark: It depends.

Carlos: It always depends.

Mark: Depends on the job owner and the type of step that it is. If the job is executing a T-SQL step, it will run in the security context of the job owner. Pretty much anything else will run in the context from the service account.

Carlos: I see and maybe that’s why I’m thinking it has to go outside the SQL Server it’s going to use that account.

Mark: Which actually brings up another, I don’t know if it’s the best practice, but it might be something to look into is proxy accounts. If you’ve got a command exec step that’s executing a batch file or something else that’s outside the SQL Server, you can set up a proxy account specifically for that step that has different permissions than the maybe the SQL agent has.If you don’t want all the jobs have access to a file share, you can create a proxy account specifically for one job so that this one job can access the file share.

Carlos: It’s actually on that step. It’s not even a job itself. Another step in that job so HR systems maybe come into play, personal data, things like that.

Mark: Yeah, exactly.

Carlos: Now, you have this job. We put it out there. Those are the permissions it’s going to execute under, but then the ownership of that job, this is actually something I see sometimes in our SQL server audits, the owners of their job.

Mark: This is where I kind of veer. For example, [inaudible 7:45] and his team, they recommend essay as your job owners. We use a local account created specifically to be a job order just so we can keep security under control.

Carlos: There you go.

Mark: Because the job owner doesn’t need to be assisting them.

Carlos: There is no function or role of the job owner other than just the owner of the job.

Mark: The owner of the job, but that’s the security context like the T-SQL job will run under is the job owner. Every other step runs under the service account with the service account credentials, but the T-SQL jobs will run as the job owner.

Carlos: Hence the other account, local SQL Server account, to do that. The downside of using your domain account for your job owners is that that person leaves, they just say the count and then your job start working.

Mark: Which is exactly why a lot of people recommend using Essay because that’s an account that’s always going to exist.

Carlos: It’s never going to go away.

Mark: But if you want an account that’s always going to exist, and you want to tighten security a little bit, you can just create your own specialized account for that.

Carlos: That’s something we flag on our SQL server audits is that if they’re owned by a domain account, I guess it’s not generic, and we like them to change that because we have had issues with they get walked out.

Mark: Domain accounts always change eventually. [laughs]

Carlos: What are the nice things about the jobs? Is it they will alert us, notify us when things go bumping the night?

Mark: Yes, a lot of people they think that when they get a monitoring system in place maybe they don’t need to set up database mail, maybe they don’t need to set up alerts with the agent anything like that, but as we know with everything things fail including your monitoring system.This is actually another tip from Grant, from a recent Twitter conversation. It’s free to set up database mail. It’s free to set up performance alerts and stuff to the agents so why not do it. You can just filter it off into a folder in your inbox and maybe check it once a week just to make sure there’s nothing weird.

I know that you’re not seeing or monitoring, but it’s kind of an insurance policy. Your monitoring system isn’t picking something up for some reason and you glance in that folder or in your inbox and say, “Hey. What’s going on here?”

Carlos: It is there. I need to look in a little bit deeper, some specific alerts. I didn’t look up the numbers so I want to say it’s like 16 through 21 or something like that.

Mark: Yeah, there’s also some specific ones, which I am completely trying to blank on now.

Carlos: We’ll put those in the show notes and make sure you get those individual alerts or the individual error numbers that you should be alerting on.

Mark: There’s a few blog post out there on that on some good ones that it’s good to set up just by default.

Carlos: We set up our mail. We have some alerts. Now, there’s something called the failsafe operator, which again, I’m a little bit ashamed to admit that I’m not super familiar with, so tell us about the failsafe operator.

Mark: First, the whole concept of the operator. The operator is what the SQL agent will contact, who the SQL agent will contact. You can set up multiple operators. You can assign an operator per job if you want to. When you’re configuring an operator, you configure an email address, a pager address, because everybody carries a pager…[laughter]

Mark: …and some other things in there. But you can also specify what time of day those operators are available, and if you screw that part up, and you make it so your operator’s never available, you’re never going to get email alerts. The failsafe operator is there for those cases.SQL knows when there’s not an operator available, so it will send those messages to your failsafe operator. Another thing about the failsafe operator is that if your MSDB database goes down, the failsafe operator will still work. It’s an extra protection in case something terrible happens.

Carlos: Again, we’re not in front of a computer screen to look at it, but I can’t put name on the top my head where I would set the failsafe operator.

Mark: That may have to be in the show notes as well.[laughter]

Mark: I can’t remember up the top of my head, either.

Carlos: Because it’s not in a nowhere place. My job, you click on the notifications tab, there’s the operator option.

Mark: It’s in that same area. I can’t remember exactly where it’s at.

Carlos: That’s something we’re going to blog about and put there on the show notes. OK, very good. We have all that together, we’re getting notified of our alerts, right? But in the instances that we have, lots of instances to monitor, it can be a little bit overwhelming, right? When you have a lot of repeat jobs that run individual instances, how do we go about managing the sheer number of instances?

Mark: That’s where what’s called Multi-server Administration comes in. A lot of people know it as MSX/TSX…by a lot of people, I mean like the five people that actually use it.[laughter]

Mark: It’s a really old technology. It’s been around since SQL 7, but it just doesn’t get a lot of use. There’s actually a…

Carlos: Why do you think that is?

Mark: I don’t know. Part of it could be because Microsoft doesn’t really push it. There’s a story about the product manager for MSX not knowing that it was in 2012, thinking that the product have been deprecated, so it [chuckles] doesn’t get a lot of attention.

Carlos: Should we be fearful about it going away? You think it’s going to stick around?

Mark: It’s not on any of the lists to be deprecated, so I don’t think so, but you never know.

Carlos: What are the advantages of using MSX?

Mark: MSX is really nice because you…for those of you who don’t know what it is, you have a master server, and then any number of target servers. You configure your jobs at once on the master server, including schedules, and then it will push those jobs out to whatever target servers that you specify, and then the jobs will run there.

Carlos: That sounds familiar to my content management server, like a CMS.

Mark: Central Management Server?

Carlos: Central Management Server. [foreign] Central Management Server, right? It’s not using that functionality to do that.

Mark: It’s not. It has some advantages over…you can certainly use CMS to push jobs out. This has a few advantages in that. When you use the MSX/TSX system, when you create a job on the master and it’s pushed out to the targets, you can’t modify the jobs on the targets. If you’re looking for consistency in your jobs, this is the system to use. You configure it once in the MSX and all the TSXs get it, and you can’t change it.

Carlos: Now, can my MSX or my master server…can that be a production server that’s still usable with other agents or my…basically nailing that down and saying…

Mark: No, you can certainly use it for other purposes. There’s no restrictions around that. The MSX can have local jobs, as well as its multi-server jobs. There’s no restrictions around that. One nice thing about MSX though is that…it’s available on standard edition, so if you wanted to spend up, in instance, just for managing that, it wouldn’t be crazy expensive.

Carlos: One of the things that we’ve had to sort out, for example, [inaudible 15:34] on the show, we’ve talked a little bit about automation. I tend to think about those jobs as being a SQL Server set up routine, so I install SQL Server the way that I wanted my instant…my servers the way that I want it, and then I push my jobs out. How do I use MSX in an environment where…Let’s say I [inaudible 15:54] a new server? Is it going to pick that up automatically? How do I go about…?

Mark: It’s not. There’s some built-in systems or procedures to enlist in MSX server, so from the target server, you run this procedure and it will join the master server. Then, on the master server, there’s another procedure you can run to add that target server to any jobs that are already there.One way you might do that is if you use job categories of the master to categorize your instances by instance type, because we have 107 production instances, we only have maybe four or five types of instance, so if you categorize each of those into a category on the master, you could easily write code that would loop through that category and add your new instance to those jobs. There’s a lot of room for automation there.

Carlos: All right. I can see an advantage over keeping the jobs or applying those jobs as a script, then if they change, you can easily push that out to multiple servers at once versus…OK, now I have this new server, it is up to date, but now you got to go and…

Mark: Yeah, you just change it once and then pushes out to all of them.

Carlos: OK, so that concept of multi-server, the TSX, is there anything to joining that group? You set up the multi-server, and it goes out and says, “Hey, there’s a target,” and you just become a target or…?

Mark: No. You have to configure it as a master. I can’t remember the exact steps, it’s pretty simple. I think you right click on your SQL agent on the master, and there’s a multi-server administration option right in there, and most people probably have never seen it there. [laughs] It’s right there.You can say, make this a master, probably not the exact option. But when you go through it, it’s a wizard that you step through, and this is one of the few wizards in SQL that you actually want to use.

[laughter]

Mark: During that process, you specify what the target servers are that you want. It will connect to those, it will enlist them in your master. For future instances that you want to add, you can do it through SSMS or there’s a procedure that you can run on the target to enlist it.

Carlos: I guess in thoughts which one are you using, that you currently use.

Mark: We have so much stuff to manage that we do everything through scripts, and if any time we can write a script and automate it, we’ll do that.

Carlos: Good practice. Very good. That sounds super nice, I can keep consistent with my jobs, almost sounds like group policy for agent jobs.

Mark: That’s a good way to think about it.

Carlos: In a sense. As anything, there’s no free lunch. There’s a couple of issues…

Mark: There’s really no free lunch with MSX.[laughter]

Mark: One thing to mention too, really quick though, about it, not only can you create a central repository for your jobs but from the MSX server you can also…When you start a job on your MSX, it will start the job on all of your instances, all of the target-setter that are specified. You can actually start, stop, and disable jobs across all of your instances at the same time if you wanted to.

Carlos: If I wanted to piecemeal that…I guess one example that I’m thinking about, for example, is a [inaudible 19:26] on the backup ride jobs. I run my backups all the same time, [inaudible 19:30] .

Mark: Yes. It’s some of the caveat of MSX but it’s definitely something to look out for. The way that we manage that is we have like a poor man’s hashing algorithm that we were on. A random number is generated for each execution on each instance. I don’t want to get too complicated, I’m sure you’re into it, but we’ve written a way where when each backup fires off, it waits for a random amount of time based on an algorithm that we wrote.

Carlos: So all the jobs start at the same time, but then nothing start…it’s the tier when the actual function start putting into place. That’s good, that’s interesting. I have seen that…actually [inaudible 20:19] , a local user group from CarMax did something very similar to it. You don’t have to code all of that. I think doing this route would be much better.

Mark: Back to the no free launch. Even though you’re using this to keep all your jobs in sync, jobs can be out of sync. [laughs] There’s a few tables that you can use to see how out of sync things are. There’s a sys download list I’ve used, what it’s called, on the master server. When you query that, it will list all of your target servers, the last time instruction was posted to the server, and the last time it downloaded that instruction.If you check that table out and for the date downloaded column, there’s no values, that means that that instruction never got to the target. But there is a procedure that you can run to re-synchronize. It will try to push all those instructions out to the targeting…

Carlos: Give me some ball park numbers where you have a couple hundred instances. How often are you saying that happens?

Mark: I don’t know. Do you know why I don’t know? Because I’ve automated the re-synching.[laughter]

Mark: It’s fairly trivial to write something, a small script that runs every once in a while.

Carlos: It’s another job…[crosstalk][laughter]

Mark: But the job…

Carlos: Down the rabbit hole we go.

Mark: Yes, so it checks to see if there’s any instances that have got instructions that have not downloaded, and not have then cued up for download for more than 10 minutes or so. If it sees that, it will re-synchronize the target, and if it continues to see that, then we’ll get paged, but I have not got paged about that in a long time. It seems to work pretty well for us.

Carlos: Very good. I have curiosity, how often are you changing your jobs? I guess I think a lot. We’re going to caveat that and say, so administration or maintenance-type jobs, and obviously there are jobs that you could…reporting jobs or whatever, those things would change little bit more often. Those core jobs, are they changing often?

Mark: No, not really. It’s more of a jobs going away and new jobs replacing them. Maybe there was a job to do some backfill because of a new table that got created that needs to be filled up with data, but there’s so much data that we can’t just run a procedure. [laughs]

Carlos: OK, so maybe I misunderstood that. In that case, if I wanted to change my job, are you not updating the job? Are you actually disabling job number one, and then introducing job number two?

Mark: No. You can certainly update the job, and that will push out the updates. We just tend to do more…jobs get deprecated and new jobs get added.

Carlos: OK. These things change…policies or whatever. Functionality gets added to the system.

Mark: One thing this is really good for, and this actually came up in my session, is for your maintenance scripts. Your index defrags, your index rebuilds. Things like that, that you would want to run every instance.

Carlos: [inaudible 23:24] administration-type jobs. That’s where is the bread and butter of these environments. OK, very good. I think it’s always good to get some different thoughts or opinions to what other people are doing, and we’ve definitely been able to do that. There are a couple of things around the show in the show notes.Again, those are going to be available at sqldatapartners.com/podcast. This is episode 16, and you’ll be able to get some of those things. Before we switch gears, are there any last thoughts that you want to put in about managing your…?

Mark: I would just say that if you do manage a lot of instances and you’ve got a lot of common jobs, I would definitely look into MSX. There’s a lot of weird caveats. It takes a little getting used to but it really is worth it. It makes managing things a lot easier.

Carlos: Very cool. Compañeros, you can use some of that. If you implement that, I’d love to hear about your experience. You could hit me up on twitter, @carloschacon, or by email, [email protected].Mark, thanks again for being here. Before we let you go, we would want to ask a couple of questions. We want to provide some additional value. Now, sure, maybe MSX is that tool, and that’s OK, earlier we’ve talked about it, but what’s your favorite SQL tool? [inaudible 24:49] pay tool, free tool, but what’s your favorite tool, why do you like it and how do you use it?

Mark: I don’t know if this really falls on the definition of a tool, [laughs] but I find myself using PowerShell every day. If I want to do something quick and I want it to hit a lot of instances, I want to automate something, PowerShell is my go-to. I even use PowerShell to prototype stuff.If I’m going to write a C# app, I’ll use PowerShell for us just to get the logic out of my head, just test and see how it’s running. If it runs fast enough in PowerShell, I’ll leave it there. If not, I’ll develop something else. Yeah, I use PowerShell a lot.

Carlos: We are original chapter leaders, Ron [inaudible 25:30] in Richmond, Virginia, he said this about a year ago, that his thinking was is that if you’re DBA and you’re not using PowerShell, within five years, you won’t be a DBA anymore. He thought from administration perspective, the divide is coming. PowerShell [inaudible 25:50] .

Mark: Yeah, it’s a major tool. I couldn’t imagine doing my job without it.

Carlos: Very good. OK, so now we’re looking for your favorite story, right? It’s an illustration or example that tells us why you enjoy being a data professional and doing what you need to do.

Mark: I’m a problem solver, so this really fits my personality well. I like to develop but I don’t like working for customer requirements, so I’m not really going to be a developer. But I like being able to dive into something, a big problem that’s happening and fix it or make it run better.I get an opportunity to do that at my job where there’s a piece of code that’s dragging a server down, SQL code, I can dive in and optimize it, and make everything run better. That’s the stuff I really like to do.

Carlos: Very cool. Being able to provide, show that value, right? That’s something that was in this and now you improved it to do well.

Mark: And there’s the education piece of that, too, to show people what they might do differently next time they write some code.

Carlos: Very good. Mark, we had a great time with you here. Before we ask you our last question, let’s hear about one additional way our listeners can learn more about SQL Server.

Recorded Voice 2: Hello there, compañeros. I want to tell you about a unique SQL Server training opportunity unlike anything you’ve encountered. As a listener of this podcast, you’re entitled to a special offer. SQL Cruise is a premiere training experience set in the most unlikely places, a cruise ship. [inaudible 27:24] forward have created a wonderful culture on SQL Cruise.With Fiber Six technical leads from various industry sectors, you and roughly 20 other students will talk shop in classes while at sea, and while you are in port, you can enjoy the wonderful world we live in, either by yourself or with a group. The beauty of SQL Cruise is the unique relationship you will form with other SQL Cruisers.

Worried about leaving your spouse? Bring them along. I did, and she had a great time with the group. In fact, I have been on SQL Cruise twice, and I’m looking to go at third time. You may be wondering if this is serious training, and I can assure you it is as good as any conference you will attend on land.

It never crossed my mind that I could be talking about SQL Server with several Microsoft MVPs on the beaches of St. John’s. I know you must have other questions, so I suggest you head over to sqlcruise.com. Check out the site. Once you are satisfied and you want to sign up, when you get to the question, would you be so kind to share the name of who referred you and what source material led you to us?

You enter SQL Data Partners, and Tim will ensure you get $100 off the training registration. This is a great offer, and I hope this $100 saving will entice you to check out the site. I did, and went on a Carribean cruise and had a great experience, and I know you will, too. Maybe I’ll see you on board. Head on over to sqlcruise.com and use the code SQL Data Partners to save $100.

Carlos: OK, Mark. If you could have one superhero power, what would it be and why would you want it?

Mark: [laughs] It’s probably kind of lame, but I think I would just like the ability to fly.

Carlos: That’s mine.[laughter]

Mark: Go check out the Grand Canyon. You wouldn’t get a better view of that. It could come in handy.

Carlos: Yeah, so I think that’s one of my thoughts around folks that say they want to be able to teleport, is that I envision the Superman portion like, will you take Louis Lane? Takes Louis Lane up there dancing for the moon type of thing. I guess you could maybe teleport to middle of the Grand Canyon, and then while you’re falling, teleport out. I want to stay and sit there for a while.

Mark: I’m a fan of a journey, so I think flying would be what I want. [laughs]

Carlos: Very good. Mark, thanks again for being here, for telling us about MSX. It’s something you definitely have to go out and try. We appreciate you being here.

Mark: Thanks for having me.

Carlos: OK, compañeros. We’ll see you in SQL trail.[closing music]

Recorded Voice 1: SQL Data Partners.