Episode 68: SSIS Administration

For all of us working with data, we know that we work no only with SQL Server but the suite of tools around the SQL Server engine.  In this episode, we share some thoughts around SSIS administration, which is the most popular ETL tool in the Microsoft stack.  Just like the SQL Server engine, many times the defaults are taken and in this episode we give some thoughts around logging, package execution, and notification with our guest Ravi Kumar.  With the every expanding functionality of SSIS, Steve and I found this discussion insightful as we think about supporting an SSIS environment and ways we can educate package developers.

Episode Quote

That’s one of the things I preach to everybody that, keep your data types as narrow as possible. It’ll make SSIS Package run a lot faster.

Listen to Learn

  • Why you want to consider SSIS Buffer sizing
  • The differences between package deployment vs project deployment
  • The benefits of package deployment
  • The various logging types for SSIS packages
  • Two parameters to tweak for SSISDB cleanup
  • The validations steps a package goes through for execution.

About Ravi Kumar

SSIS AdministrationBesides being a former SQL Cruiser, Ravi is the BI architect for LMO advertising in Baltimore Maryland.  He is the co-leader of the BI PASS chapter and frequent contributor to SSIS related discussion on the east coast.


Ravi on Twitter
Buffer Sizing
SSISDB database maintenance
Project Deployments


Transcription: SSIS Administration

*Untranscribed introductory portion*

Carlos:       Okay. Ravi welcome to the program.

Ravi Kumar:       Thank you sir.

Carlos:       It’s great to have you. One of the things that you’ve been talking and are doing I guess professionally is working quite a bit with SSIS. I guess shame on us. We’ve never actually had a topic on SSIS in the history of the SQL Data Partners podcast. I don’t know why its been that long but I thought that you would be a great person to bring on and talk a little bit about … From the DBA perspective what we need to know about SSIS. Some of those moving parts and maybe some of the catches that people can kind of get to get stuck with. Why don’t you start us off and talk a little bit about, can you give us the overview of SSIS and what people are mostly using it for.

Ravi Kumar:       SSIS is an enterprise-level tool that has to do with moving data in an out. It’s an ETL to extract transform and load the data. People have been using it for a while. It’s been around since 2005. Before 2005 back into SQL server 2000 it was called DTS. It became a lot more interactive and enterprise-level 2005 and have been improving it ever since then. Most of the organizations use it for moving there in and out and to load the data warehouses and to do migrations from maybe Oracle to SQL server or some [inaudible 00:02:14] to server or SQL server to some other data base.

Currently how we use it, we have at our organization data coming from probably 15 different sources and it goes out to about 5 different other companies. We ingest all the data, we do manipulations for that. We apply business logic to it and it gets sent out. We have certain processes that bring in data every 5 minutes, 24/7. It could be SSIS, it could be very minor where you have 1 or 2 processes running a month and it could be really highly skilled process, where it’s running 24/7 constantly.

Steve:         Okay. Ravi could you share with us a little bit about your background and how you got started and then how you’ve ended up working with SSIS today?

Ravi Kumar:       Sure. I started as a SQL database developer back in 2004 I think. Then it slowly moved in to BI world. Business intelligence usually is 3 parts. As reporting, which is SQL server reporting services, our data warehousing which is SQL server and [inaudible 00:03:31] services. Then moving the data ETL, which is a SQL server integration services. After I started with BI, I started with ETL, moved to data warehousing, moved to reporting, moved back to data warehousing and reporting, now I’m back to reporting and integration services again. It’s the ETL part. Currently I’m a BI manager and I manage a team of 3 other developers who exclusive work on SSIS.

Carlos:       1 additional area in thinking about SSIS and some of the ways you can use it there, is it gets baked in. It’s actually kind of baked into SQL server management studio in a sense as well. Even something as simple as I want to export an Excel spreadsheet into a table or move data from one database to another, I can … They have that, you can right-click import export data. There was 1 for import 1 for export. You actually go to the process and that will actually create a little SSIS package that doesn’t get stored anywhere or saved unless you take that option. Folks can actually use that to move data in and out

Ravi Kumar:       Again, SSIS comes with a SQL server right? You’ve got a SQL server licensing, standard or enterprise it comes with it. Once you go into that option it actually creates an SSIS package in the background and executes it and then gets rid of the package. You can save that and execute it again and again and again if you wanted to. Actually I use it for the same thing. Let’s say if somebody says “hey, I just want the data from this table to go to this table and I want to do it every week”.

If I try to manually create something like that, it’s going to take me maybe 20 minutes where it says if I go through SSIS and go through the visceral, it could take me 3 minutes and after that I automate it and I’m done. It has it’s place but it’s a really really small part of SSIS.

Carlos:       Sure. Fair enough.

Ravi Kumar:       Yes.

Carlos:       As we start using those safe packages and we want to put them into our processing or into our systems, there now are 2 modes or 2 ways in which we can deploy or save those packages.

Ravi Kumar:       Right. Now we’re starting to talk about package deployment vs. project deployment so there was only 1 way to deploy package up until single Server 2008, R2. In 2012, Microsoft introduced Integration Services Catalog. When you’re developing a package and visual studio, usually you have, starting 2012 you can have package or project. Package that is basically 1 unit of subset. It’s basically in the background as an XML file. You can just deploy it by itself and you can use SQL agent to automate just 1 package itself.

There are lots of benefits that you don’t get using package deployment model because it’s just one package. If you are looking at SSIS from a developers point of view, your looking at … Let’s say you need to download one file, put it in your database, do some transformation on it and do an export out of it. If you’re good developer, how you would develop it is you would have 1 package, just download the file. 1 package just to load the data, 1 package just to transform it somehow, 1 package just to export the data into a file.

If you are using a package deployment model you would have … You can still do it but since there’s different different sub sets of it, it just makes us to just to just deploy everything. Let me go over some of the advantages of project deployment. Benefits. First of all, if you’re using package deployment, you’re going to have store passwords somewhere because you going to move from dev to stage to production right? You got to have some sort of account that needs to access different resources, maybe some file systems. You’re going to have to put that password somewhere. What a lot of people did, they use a configuration file up until 2008 and they put that password in plain text within the file which is a big security hole.

Carlos:       Sure.

Ravi Kumar:       Once you deploy to catalog you can have those passwords or those sensitive information into a parameter and then you can change it directly on the server and they can deploy from server to server. Let’s say you deploy from staging to production. The first time you change it on production, it will remember that next time so you never have to change it again. It uses Alias 256 encryption to encrypt everything on the server so it’s a lot less security holes.

Steve:          Then in that example for a developer working on the project, they could deploy to a development server or staging server, then whoever you have doing your configuration management or your deploys could then push from that staging server directly to your production server?

Ravi Kumar:       Yes.

Steve:         Then you know you’re getting exactly what’s on staging and not some files that might have got mixed up along the way.

Ravi Kumar:       Exactly.

Steve:         Thanks.

Ravi Kumar:       You’re going to need to make sure that developers, anything that can change from environment to environment, they put that into the parameters. What happens is that you have … Your connection managers, they’re already have parameterized but anything, let’s say the name of a file or location of a file. If that’s going to change we just put that in parameter and then it’ll get changed once you move to the dev to stage your production.

Carlos:       Is that exactly a one for one. I almost think of it in very simple terms, almost like an ad hock query versus a store procedure right? Where I can encapsulate some logic and I can do some different things with the parameters that I pass in right? That I can’t just do it, that I have to manually change if I was doing that with an ad hock query right?

Ravi Kumar:       Yes. That’s a good analogy. You don’t need to change the code you just change the parameter within that store procedure.

Carlos:       Right. Now there is some additional overhead, I’ll use that word overhead or additional items that we need to have for the project deployment. I think one of the things it will notice is that new database in the SSIS DB. I think this is where again in there you’ll see the option to set those parameters that you talked about in changing things from environment to environment.

Ravi Kumar:       Right. SSIS DB is a whole other beast and we’ll get to it. Yes, that does need to be managed from same level. That’s one of the benefits of deployment. The other benefit is obviously encryption. That your whole package and everything is encrypted right away. Also one of the huge things that I noticed with project deployment is, when you deploy to the catalog, it automatically creates login for you. Up until 2008, what developers had to do, they had to go in and say “okay this is …” You have to log when the packet’s starting, you have to log when it’s ending.

They kind of know when everything executing. It automatically does that through SSIS DB. When you create the Integration Services Catalog it doesn’t get created automatically on SQL server. Once you create that, it creates the SSIS DB for you. Everything gets logged to the SSIS DB and there are views that it can clear you off of. We actually have a Power VI dashboard for our exercise environment so very clearly exercise DB and we know exactly how everything is running in the background.

Carlos:       Interesting. I think that a full out-of-the-box behavior is that you then have to go into the reporting and also I’ve forgotten the name of what that’s called. I think it’s …

Ravi Kumar:       That catalog?

Carlos:       Is it the catalogs on any SQL server management studio? I go into to the catalog and then I can see my package is there, then I can right-click and view those reports. Like view the execution times or whatever that is.

Ravi Kumar:       Yes. You can go on the catalog and you can see the reports. It’ll tell you what failed and it’s really …

Carlos:       You’re actually using something on top, you’re not using those default reports?

Ravi Kumar:       No. okay I do use those reports when something fails right?

Carlos:       Okay.

Ravi Kumar:       I go in and I say okay what failed then it’ll give me details message but if I want to know from a [inaudible 00:13:26] because you have 20, 30 different packages running, some of them running every 5 minutes, I want to know exactly what’s happening, how many times it’s executed, if certain process is turned off or not, then I have a dashboard as a high level view and I query SSIS DB directly for that. That’s one of the huge benefits of having that login built into the project deployment model.

Carlos:       Okay.

Steve:         Okay. When you talk about that Power VI dashboard query in the database there. Is that something that is specific to your environment? Or is it something that would work with pretty much any SSIS DB?

Ravi Kumar:       That will work pretty much with any SSIS DB but currently I have … It’s just the way you have a setup. I have it setup where I have certain ware clause in it and that ware clause directly relates to the projects I have. You could build it in a way that it will work with any Power VI, I mean any SSIS DB because all the views are same on SSIS DB.

Steve:         Right. Okay.

Carlos:       Other than the jobs failing because sometimes the login can get fairly … Depending on the number of items you have in your package, you can get a quite large … I guess other than package failures, what are you use login for?

Ravi Kumar:       Package failure is just execution that it executed. what time is executed so you know what’s actually happening with the package. There are different levels of login within the project deployment model. We’ll cover that in a later section, which is login in itself.

Carlos:       Okay. I thought we were there. Okay.

Ravi Kumar:       Yes. Since we’re already into it, let’s talk about login since we’re pretty much done with all the benefits. Login automatically comes with it and there 3 levels. There is performance, basic and [inaudible 00:15:43]. In my shop since we have so many packages running. I’ve run it on basic which is normal, which is default. I’m sorry I said that backwards. Okay, I’m going to redo that. There are 3 different login levels, performance, basic and [inaudible 00:16:02].

Performance is the leanest. That’s what you would prefer your production server to be on. Performance only logs your warnings and failures. Basic, it logs more than performance. It logs when a package is starting and then a task within a package is starting and when it’s failing and what happened with it.[inaudible 00:16:26] logs everything. They even logs the role counts of what’s coming through. In 2016, the latest version, they introduced 2 more login levels which are Run time Lineage and Custom. Not going to do much into it but Custom I am excited about because you can customize what exactly log out of the box.

My preferences, keep your production on performance and your dev on verbos and your staging on basic. SSIS DB, if you’re running packages constantly, it creates a lot of login and your SSIS DB usage goes high. It does create a lot of data so even with me being on performance and only 15 days of data, it does create … My SSIS DB is constantly around 45 gigs.

Carlos:       Okay.

Ravi Kumar:       15 gigs days. Go ahead.

Steve:         Then is that 45 gigs for looking out there, is that something that you have to manually clean Up over time or does SSIS clean it up automatically for you?

Ravi Kumar:       That’s another part of this. When you create SSIS Integration Service Catalog, it creates 2 jobs for you within SQL agent and it automatically schedule it to run at 12 am. What that job does it cleans SSIS Db before for you. If you go to integration catalog, right-click on it and go to properties. Within the properties you are going to be able to see different different things. There are 2 properties that you should change. One is the login history, how long you want to keep the login history for. The default is 30 days. In production environment I don’t recommend 30 days because do you really need to look back 30 days?

In production what I use it for is make sure something ran, if it failed. If it failed I need to go look at it so I only kept it for 15 days. If you have only 1 or 2 packages running everyday, it’s not going to create a lot of data so you might be okay with 30 days or even a couple of months.

Steve:         Really depends on the volume it sounds like.

Ravi Kumar:       Yes. It really depends on the volume and how much data or how much space you have.

Steve:         Okay.

Ravi Kumar:       The second property I like to change on there, it’s called how many versions of the package you want to keep. Once you deploy to catalog, it actually saves versions for you. Say it’s up to 10 versions, I have it set to 20 versions just in case. Once you change and what you can do, you can automatically go to that project. Let’s say you just deployed something and it starts breaking. You can easily go on let’s say on versions, click on the version and you can revert it back, go to your production and you’re done and you tell developers “hey, something broke fix it”.

Steve:         With that, the version of the projects does that roll back the entire project or just specific packages?

Ravi Kumar:       It rolls back the entire project.

Steve:         So it’s in 1 step you just undo that last deploy effectively or you could even go back.

Ravi Kumar:       Exactly.

Steve:         1 or 2 or further if you needed to?

Ravi Kumar:       Yes. Up until 2014, you could only deploy the whole project. In 2016, Microsoft gave 1 more option of package deployment where even your project deployment, you can just deploy 1 package to that catalog. Even if you only change 1 package. Up until 2014, you were required to deploy the whole package.

Carlos:       Okay. Got you. I can have a nice new feature then just have to only have to deal with the one if you have lots of packages.

Ravi Kumar:       Yes. Let’s say if you have a project and you have 60 packages within the project, people are saying they don’t want to deploy the whole thing. It just takes too long.

Carlos:       Got you.

Ravi Kumar:       Yes. If you use versioning it goes back to that previous version.

Carlos:       One question I had on the login, I ask because I was met with kind of blank stares. We could talk a little about execution and performance because it ties in. One of the things that it will log. If you’ve used SSIS let’s just say you have a source, table, let’s just say it has 10 columns. Then your destination only has 5. You’ve selected all 10 of the source that SSIS will put in the login, I’ve seen that little warning. It says “hey dummy this column is not being used”. You may want to exclude that from the package so that you’re not using as many resources to do this.

Ravi Kumar:       Yes.

Carlos:       One time when we were … Just like everything else SSIS packages sometimes experience performance issues. They want us to take a look to say “hey, will you take a peak at this and see if you can fix it?” When I pointed that out, nothing seemed to change. I guess my question is, should we be taking that warning with a little bit more severity or is that truly just like ehh, forget about it, just another record?

Ravi Kumar:       You don’t need to but it’s not good that development practice because what happens, let’s say developer’s developing that’s. It’s using their system to bring in all the 10 columns into the memory and only write 5 but once you deploy to SSIS, SSIS is smart enough saying “okay I’m not using these columns. Why do I need to bring the data into the memory” and it just ignores it.

Carlos:       The error warning really is then a red herring?

Ravi Kumar:       Yes.

Carlos:       I see that in the error messages says “hey, you selected it”. I’m actually not impacting my SSIS packages by it?

Ravi Kumar:       Yes. It’s more of what development annoyance and developers shouldn’t be doing this anyways. 1 other parts I do is SSIS internals. This is one of the things I stress and exercise internals is that, don’t bring in these columns because it’s just going to hinder your development. Let’s say one of the columns says a blob, it’s going to slow down your packets considerably.

Steve:         Just to confirm with that. If that column is a blob, some type, you’re going to bring that back to the SSIS server. It’s going to see the data and ignore it?

Ravi Kumar:       No. What SSIS is going to do, when you’re developing it, it’s going to bring it but it’s going to bring it to your machine in development. Once you deploy to SSIS server, SSIS server is going to say wait I don’t need to use this data. It’s just going to ignore it and not bring that data into the memory.

Carlos:       I think we should be more specific even. When I have to run it, when I run the package locally. I’m in SQL server manager studio or visual studio and I hit the run button I’m going to have the effect of having this extra columns hit my memory buffer.

Ravi Kumar:       Right.

Carlos:       When I save the package out and have it execute through an automated process, the process of saving the package out to the server if you will, that’s when the columns get ripped out of the package?

Ravi Kumar:       Correct. Steve let’s say if you open visual Studio on a server, your production server if you want to, that blob is there and you execute on visual studio, that data will be brought into the memory.

Carlos:       A double whammy, you’re on the production server and then you running packages that you shouldn’t be. Yikes!

Ravi Kumar:       Exactly.

Steve:         When it’s running that mode then it sounds like it’s very similar to the age old debate over select star versus selecting explicit columns. SSIS is smart enough when it’s running in production mode to rip that out and strip it down to what it needs it sounds like. I think that answers my question. Thank you.

Ravi Kumar:       Okay.

Carlos:       I guess let’s go in and talk a little bit about that, about packages execution. We’ve talked a little bit about setting up parameters, we can change the context under who it executes as, what connections it makes, things like that. Other considerations when we think about executing the packages?

Ravi Kumar:       As far as executions from the point of view I usually suggest, we’re going to tackle it 2 ways. First is that what do you need in order to execute it and what does when it says go time. It’s a little bit about internals. We use SSIS … You deploy to SSIS catalog then you have to execute the packages using SQL agent. What I suggested that you create a credential with it, so you have an active directory account and have an active directory account for each business area.

Let’s say if you are executing package for sales or if you’re executing a package support finance, they all need to have different resources. I would suggest, for part business area, have execution account for SSIS and create a credential for it and give it proper access to any file shares or any servers that you need. Then create a proxy with SQL agent for that credential and then It’ll be available under run as account when you go do a schedule job.

That way SSIS package would run a lot smoothly. Many times, everybody runs into this problem. They deploy a package and they’re saying “okay wait. It can’t execute” because well your executing it as SQL agent service account.

Carlos:       Sure. They made it elevated privileges to get out, attach the files or other systems things like that. If your SQL agent account won’t do that, that’s when the proxy account will come into play.

Ravi Kumar:       Right. You shouldn’t be giving SQL agent account access to everything.

Steve:         I’ve seen that done where because it’s running as a SQL agent they just give that agent account access to everything. Then you’ve got a huge security hole. The proxy option seems like a much better way to do it.

Ravi Kumar:       That’s how we have a setup at our … We have a part business area. We have 1 active directory account and it has access to those particular resources. You create the credential and now it’s time to execute. kind of what SSIS does in the background, as soon as you hit execute, it loads everything into the memory. With SSIS, everything is about memory. It reads XML, it decrypts the XML. Here’s 1 very interesting thing, when it’s loading it, it checks. Let’s say if you’re running all the like SSIS 2008 package on SQL server 2012. It’ll check the version and automatically upgrade it and load it into the memory without telling you anything. Go ahead.

Carlos:       I remember this. The version of the execution of the container might play a role in changes to your package.

Ravi Kumar:       Yes. It doesn’t tell you that there’s a problem. It just upgrades it and drags it. That’s it.

Steve:         That upgrade step, does it take a while? Does that slow things down or is it a pretty quick thing compared to the rest of it?

Ravi Kumar:       It’s pretty quickly for the most part but if you have a package that’s like 22,000 line and somebody made one of those into one package to rule them all. Then it could take a while but I haven’t seen many problems with that and then it …

Steve:         To get around that then you’re probably best off using the same version to edit the packages with as you are when you deploy it, where you’re deploying them to.

Ravi Kumar:       Correct.

Steve:         Okay.

Ravi Kumar:       Then you apply any configuration on any expression. This is where your configurations comes in play, if you have any. Remember we talked about parameters, that if you change from server to server you have different parameters? The next step is, it loads a parameter and then it says “okay just go validate yourself”. The packet goes through validation so you must have seen it in the past that a package fills that says it couldn’t validate and then everybody’s wondering “what data got executed? I don’t know”. Well, nothing got executed because it was just at a validation step. It never read any data at all.

Steve:         Then it’s validated itself and is that where it starts to process the data next?

Ravi Kumar:       Yes. Once it’s done validation, that’s when it says execute action time. That’s when it goes into execution. That’s kind of how the package gets started. Once it’s running, that’s when we start getting into buffers of how the data is brought in and how much memory it’s taking, things like that.

Carlos:       What happens to the common issues that you’re seeing there. [inaudible 00:31:54] at this point we’re talking about performance. How much work can I do in as little time as possible.

Ravi Kumar:       Sometimes validation does take long but everything needs to get validated. One of the issues I’ve seen there is that sometimes the package needs to create a file half way through before another component can access it. The file is not there, it’s going to fail validation and it’s going to say “the file is not there” because package hasn’t created it. There’s a property that you always need to say it’s delayed validation. That’s one of the got you’s I’ve seen many times, where they wouldn’t set that property and the dev they would never be able to figure out why it’s failing.

Carlos:       I see.

Ravi Kumar:       It can kind of go and load. Steve, any questions?

Steve:         No. I think that makes a lot from the experiences I’ve had, understanding how those steps go through now.

Ravi Kumar:       Then SSIS goes and does it’s thing and then it moves to … One other big thing is data flow. Most data flow and said okay now I’m going to bring in some data. This is where we kind of get a little bit of performance tuning and we start talking about buffers. There different kind of buffers, physical buffers, virtual private flap but what it basically comes down to it is, it’s taking up memory on to your server. We touched on this a little bit when you were talking about if you have 10 columns and you’re only bringing in 5 columns.

This is one of the examples for that. Also, if you want package to run really really quickly, use proper data types. This is when you can go in and tell developers that “you’re bringing in data as [inaudible 00:34:06] 255 when you’re only loading 2 digit estate code. It should be charged to. What SSIS does, if you’re reading it as [inaudible 00:34:21] 255, your data type is that then it’s going to take up 255 bytes onto your buffer. It creates a lot of overhead and it reserves that memory for itself.

Steve:         It sounds like it will take up the maximum size of those variable columns?

Ravi Kumar:       Exactly.

Carlos:       Then for each record that comes in, it will take up that size?

Ravi Kumar:       Exactly. It can inflate your data a lot.

Carlos:       I think it’s so easy to overlook that because it happens a lot even in stored procedures. You create a temp table of [inaudible 00:35:05] or an [inaudible 00:35:06] and then you’re comparing or updating the opposite data type. You get a data type mis-match on the engine site then of course the engine has to do that conversion for you. It doesn’t do that very efficiently. The SSIS site we have that same kind of problem except now we’re actually going to get hit a little bit harder potentially with memory allocations.

Ravi Kumar:       Right. Just imagine if you only have 10 columns and you have [inaudible 00:35:38] 255, how much data it will take even though you don’t have that much data.

Carlos:       Sure.

Ravi Kumar:       That’s one of the things I preach to everybody that, keep your data types as narrow as possible. It’ll make SSIS Package run a lot faster. This problem gets inflated a lot more later on when I start talking about buffer sizing. Within 2016, there is a new property call Auto Size Buffers. Before that you could tune the buffers if you wanted to and there’s a whole thing on … On internet you can read about tuning the buffer sizes but it comes down to usually each buffer size to about 10 max.

You can increase it too if you want but it’s a balancing act. What it does, let’s say you can only put 10 megabytes in 1 buffer at a time, while the pack is going to run for a longer time because it can only fit so many rows in 10 megabytes. It’s going to read the row and it’s going to apply transformations and it’s going to write the rows. It’s got to do that again. If you can fit more rows into the memory at one point, at one time then it’s going to go, your package is going to run a lot faster.

That property could be changed but there’s another caveat to it. If you increase it way too much and it’s going to say “well, I don’t have that much memory” it’s going to start driving to the disk. That’s when things slow down considerably. There is a performance counter that you can use. It’s called catalog.DM underscore execution underscore performance underscore counters. DM execution performance counter and they’ll tell you if the data is spilling to the disk. That’s one of the things you don’t want SSIS to be spilling to the disk, especially if you have a SQL server and you have 64 gigs on it, 45 gigs allocated to SQL server, the rest is for the OS.

Your SSIS doesn’t really have a lot of memory to execute and it’s going to start writing to the disk.

Carlos:       I was going to bring that up as well in the sense that then you need to start taking a peek at what your SQL server wants to read for memory as well

Ravi Kumar:       Exactly.

Carlos:       As well as SSIS, I think a lot of times at least in the beginning stages they’re the same box because there’s no licensing implication to make that happen. Then all of a sudden, you could have a lot of contention at the same time and kind of have to have a marriage there, decide who’s going to go get the main resources while those processes are happening? They’ll step on each others toes.

Ravi Kumar:       Exactly. SSIS can take up a lot of memory one time because we know about execution parts but SSIS also has, it’s what’s called execution tree. Within one execution tree you can have up to 5 buffers and within one data flow you can have multiple execution tree, executing at the same time. It could be 5 buffers x 5 or 10. That could easily start hogging up your server. There is a property that I usually suggest my DBA’s to have the developers change it. It’s called Blob Temp Storage Path and Buffer Temp Storage Path. SSIS doesn’t handle blob data really well and most time it ends up writing it to the disk.

What I suggest is you DBA’s have some sort of fast disk preferably a SSD on SQL server. Have this path, Blob Temp Storage path and Buffer Temp Storage Path mapped to that drive because needs to done be done with the data flow or it could be written into a configuration property as well if you want it to. Then you can have that path be directed to the fast disk. It’ll start writing there. Sometimes you want to be able to figure out that okay SSIS is taking up too much memory, it’s writing to the desk and you will see drivers starting to fill. It’s a nightmare for a lot of DVA’s

Carlos:       It should clean up after itself. You’re not going to see the implications. You might see that like the warning come through but then after the package is finished those files will get thrown away.

Ravi Kumar:       Right. There’s 2 caveats to it. What if your C drive gets full and you don’t have that much memory on it. The second thing is that it’s going to be slow. Unless you’re writing it to SSD. That’s one of the things that such as developers to change those 2 particular pack.

Steve:         It sounds like at a minimum you want to make sure that you get those off of the C drive, so that you’re not going to take the system down if it runs out of space and while you’re doing that you should get them on fast storage like an SSD as well to get the best performance out of it?

Ravi Kumar:       Correct.

Steve:         Very similar to the temp DB type recommendations that we have with SQL server as well.

Ravi Kumar:       Yes. Very similar.

Steve:         Okay. Great.

Carlos:       As we’re kind of wrapping up here Ravi. You mentioned a couple different flags. We’ve talked about different scenarios and ways people can take a peek at their environment. I guess last thoughts about best practices or things that … I’ll start, I’m going to say that again. As we wrap up here we’ve talked about different parameters, ways that administrators can take a look at their environments. Other best practices or things that the DBA’s can review on their SSIS environments to make sure they’re not shooting themselves in the foot.

Ravi Kumar:       Carlos that is a great question and I do have an answer for it. I’ve put together a small list of things I’ve learned over time for SSIS. Number 1. App developers make multiple packages. Don’t let them do 1 ring to rule them all, that practice. Not 1 package do everything. You have 1 package do 1 single thing so they can have about 10, 20, 30 different packages within a project.

Carlos:       Now I’m going to have to stop you because then this idea of chaining. I’ve even seen when that doesn’t happen, I feel like the natural instinct is that they break that up into jobs. I’ve actually seen environments where like well these set of packages, we expected to take an hour. We’re going to start that at 1 am. Then these next set of packages will just schedule at 2 am because we have to with for the first ones to start.

Then there’s that day, things are running slow, something happens you have [inaudible 00:43:26- 00:43:27]. Run a little bit long until my 1 am job goes until 2:15. Then that second wave ends because of the “oh hey was it ready?” Then I’m going to get yelled at, jobs fail, the warehouse isn’t loaded and now all of a sudden everybody is like “Ravi!”

Ravi Kumar:       I’m going to finish what I was saying Carlos. Thanks for interrupting me. Once they have multiple packages, then you have them 1 package that executes the rest of the packages.

Carlos:       That sounds like the one package to rule them all.

Ravi Kumar:       No. One package to rule them all is when you have 1 package to do everything. Here you have multiple packages do 1 same thing but you have 1 package to execute rest of the packages. There a couple of benefits to it. One of the benefits is you only need to schedule 1 package. Let’s say you deploy to your catalog and you just execute 1 packet and it will execute rest of the packages. The second huge thing about this is that you can reorganize the packages that how they are executed.

You can take advantage of parallelism like crazy. There are certain packages that are dependent on each other, there certain packages who are not depending on each other. You can have packages that say “these packages I’m going to execute parallelry”. They’ll execute at the same time. Then you can have what’s called presidence constraints and then you can have all those packages connecting to one package that is finished or all those package needs to be finished before moving on. You can paralyze it.

Let’s say you deploy it and you say “well it’s taking up too to much resources and we cannot have parallel. All we need to do is you go in 1 package and you rearrange how everything executed. Now it’s not taking up so many resources any more.

Steve:         Interesting. That 1 main package, it just knows what set of parallel tasks can run. Then you can use that to really throttle what’s going on to get the best use or not overload your system at that point?

Ravi Kumar:       Right. I’m going to use an example of one of my friends. I know him through our local user groups, he works for Pandora jewelry in Baltimore. They have a huge data warehouse that runs around 2 terabytes. They had one package just like this for execution and it was just bringing the server to the knees and just stuff would just stop. It was taking so much resources. All they did, they just put it in 1 package, re arranged it so things we’re not so much paralyzed, redeployed it, took care of the problem. It makes things very flexible.

Steve:         I like that and I think I will certainly use that next time I’m working with SSIS.

Ravi Kumar:       Awesome. Also sometimes DB’s are not sure what SSIS package is doing. They’re like okay, “I’m not sure how it’s running into my system”. Well if you don’t know how it’s running into the background and what it’s doing on the database, run SQL profiler. I’ve had to do that many times before but somebody deployed something and I’m like “okay I don’t know what it’s doing”. Instead of going into the package it was like okay “let me look at it, what it’s doing on the server side” which makes it a lot quicker. Obviously you can open a package and figure out what it’s doing but that’s going to take a lot longer.

Steve:         I can definitely see how the profiler would really simplify figuring out what the package is doing because the package can certainly be confusing if it’s your first time looking at it.

Ravi Kumar:       Yes. Also implement …

Carlos:       I would think, that seems [inaudible 00:48:02] to me. I like the visual of the package. It at least will point me in the right direction. The concern I would have with the profiler is being able to window that down. Is there like an event class or application name I guess that we would just filter on to make sure that it’s only SSIS packages that get in there?

Ravi Kumar:       I think it’s the user that executes it. If you have one of the user executing it, then it’ll login that user and you can filter by the user.

Carlos:       Then it should be our proxy account?

Ravi Kumar:       It should be a proxy account but if you’re just, let’s say you are, you’re not sure what you doing in your development. Somebody wants to know what it’s doing in development. Usually I don’t try to run profiler on production but this is one of the tips and tricks for development, is that if you want to know what it’s doing, you user who’s executing or maybe you’re executing it at 1 point, you can see what its doing in the background.

There certain things that sometimes happen in SSIS that are hidden in expressions. Different different expressions that it wouldn’t be apparent to you why exactly it’s doing that. Sometimes you need to go in the background to see what it’s doing. SSIS by nature is that you can program it away so that everything will be hidden and you want to know what’s going on. It really depends on the developer, how they developed it. This has come in handy before.

I’m like okay, I have looked through everything and I don’t know why it’s happening this way. Let me go look at the profiler. It’s more of a development tool than anything or troubleshooting. Moving onto the next HR is bad you can pray everything was going okay I have everything and I don’t know why it’s happening this way let me look at the profile development troubleshooting. Moving onto the next point, 1 of the things I have found huge is implementing standards within SSIS. Your naming standards or how you do things. One of the things earlier I talk about is having multiple packages for a project. That could be in a standard having naming conventions for standards.

That helps a lot. Also SSIS doesn’t do everything out of the box. Something as simple as zipping up a file, it doesn’t do that. It doesn’t have a task for it. I came across many times to developers “oh I’m a developer. I can just build that task”. They’ll go spend days on trying to build a task then US DBA might have to run like EXC on your server, and have to be able to deploy in on every server just do that one thing. It takes development costs really high and it’s a pain in the butt trying to deploy it.

It’s better to buy like a standard SSIS extensions so they’re provided by … One is task factory from Progmatic works. Other is COZYROC. Those 2 connectors, install it every server and you’re done. It covers almost everything that they can think of and you know it’s been tested by the company.

Carlos:       Full disclosure COZYROC has been a previous sponsor of this podcast. We’re familiar with those guys.

Ravi Kumar:       COZYROC tend to be a bit cheaper than Task Factory but at our work we use both of them because I like the Salesforce connector for COZYROC but there are other connectors that are TASK factory that I really like. I have both of them and it pays for itself over time.

Carlos:       Ravi’s just rolling in the dough. Over there in the advertising.

Ravi Kumar:       The last point I have is for DVA’s is how you annoy the BI team. What you do is you create email group for BI team. For us we have [email protected] It could be [email protected] whatever. Have an email group for them then have that email group as database mail also. Any time a package fails, it sends everybody an email like “hey this package failed”. Have that email go to the BI manager also so everybody freaks out that something failed and production.

Carlos:       What would you have us do instead Ravi?

Steve:         I think what you’re saying is do that so that there’s the peer pressure on that team to go fix it.

Carlos:       I thought don’t send that email. I see.

Ravi Kumar:       Make sure you have a database group for the BI team. Everybody is responsible for their own task. If something failed, they need to know that something failed. It’s better for them also because if some of their reports are relying on that process to run, they don’t want to send that email automatically to the business when it’s the wrong email. They need to be notified that something has failed or not.

Steve:         Basically expose all the success or failure of the jobs the entire team so that everyone knows what’s happening and what needs to be fixed?

Ravi Kumar:       Exactly. If it’s a success don’t worry about it. If it’s a failure, everybody needs to know.

Carlos:       I feel like you have a little bit of culture potentially in there. Who fixes that might vary but at least having everybody on board I can say that would be helpful. Ultimately you do want, I think particularly in the data warehouse environments, if those jobs do fail, you’re going to want to engage those guys because they’re going to be the ones. Hopefully, they’ll be fixing the packages for you. Very good. Should we do SQL family?

Steve:         Yes. Let’s do it. A new question I think we’ve added this week is on SQL saturdays. Where was the very first SQL Saturday that you’ve attended?

Ravi Kumar:       It was actually Richmond, was the first SQL Saturday I ever attended.

Carlos:       [inaudible 00:54:35]Ravi Kumar:       It was funny. I have story about that. I went on SQL cruise not knowing that there’s a huge SQL server community there. Why I went on SQL cruise is because I heard Chris Bell, I met him one time at a SQL user group and he mentioned SQL cruise training, that stuck. I talked to him, he convinced me to go to SQL cruise. Then I learned all about how big deal SQL Saturdays were. We went on cruise in January and SQL Saturday was coming up in March in Richmond. That was the first one I attended.

Carlos:       Very nice. Technology is constantly changing. We’ve talked a little bit about you and some of the 2016 features that have changed in SSIS. How do you normally keep up with technology?

Ravi Kumar:       SQL saturdays, Path summit and if there’s a problem, I try to go find the answer online and Twitter. I usually follow all the people who are leaders and great in technology who tweet about these things. Even the SQL family then I kind of keep watch on it, what’s coming in, what’s not coming.

Steve:         Okay. Great. If you could change 1 thing about SQL server, what would it be?

Ravi Kumar:       If I can change 1 thing about SQL server what would it be? I wish they would increase the SQL Express database size from 10 gigs to at least 50 gigs. That includes SQL agent.

Steve:         They have that. It’s called standard edition right?

Carlos:       There you go. We’ll see what happens with the SQL server on Linux. Maybe you can get your gigs that way.

Ravi Kumar:       We’ll see what happens.

Carlos:       Bets piece of career advice you’ve received?

Ravi Kumar:       Best piece of career advice I’ve received? Keep your resume’ updated and go interview at least once a year even if you don’t have intentions of taking a job. Just to know how the job market is out there.

Steve:         Interesting. That’s the first time I’ve heard that one. I like that. If you could have 1 super hero power, what would it be and why would you want it?

Ravi Kumar:       That’s a tough one. What’s the 1 super hero power I can have and why would I want it?

Steve:         Just 1.

Ravi Kumar:       I think, does Dr. Manhattan counts as superhero?

Steve:         I don’t know.

Carlos:       Yes. We’ll count him.

Ravi Kumar:       Dr. Manhattan is this blue guy from the watchman. He’s basically a God. He can do anything by … He can close his eyes and in a blink he’ll be on Mars. I would like to have his super powers so I can visit Mars and any other planet. Be next to the sun and not get burned.

Carlos:       There you go. You want to explore space?

Ravi Kumar:       Explore space.

Steve:         Nice.

Carlos:       Well, Ravi thanks so much for being with us on the program.

Ravi Kumar:       Thanks for inviting me guys.

Episode 59: Mirroring, Availability Groups, Replication, and Log shipping


Welcome to episode 59, Compañeros! Have you ever wondered what the best option is for making your data available in your environment? Listener Chris Hendon asked us to talk about differences between Availability groups, mirroring, log shipping, and replication. Today on the SQL Trail Steve and I chat with Andy Mallon and Mariano Kovo about these topics. We cover each topic in detail and explain the positives and negatives of each one.  Join us, for another great episode.

Episode 59 Quote:
“I would say that some of the best advice I ever got is that the more you screw up, the more you’re going to learn. But, when you screw up, admit it, fix it, ask for help if you need it.” – Andy Mallon on his best piece of career advice

Listen to Learn…

  • How to use Mirroring to save money for your shop
  • The difference between Mirroring and Availability Groups
  • Why would I consider using Mirroring
  • The unique benefit of Replication
  • Why Log Shipping and Disaster Recovery work well together

Show Notes

About Mariano Kovo
Mariano Kovo is an Argentinian SQL Server professional with Precision IT in Argentina. Read his blog (in Spanish) on SQLPass.org. Follow him on Twitter @Mariano_Kovo.

About Andy Mallon
Andy Mallon is a SQL Server DBA. He’s also the co-organizer of SQL Saturday Boston and co-leader for the New England SQL Server User Group. Follow him on Twitter @AMtwo and LinkedIn. His blog The Impatient DBA covers DBA careers, SQL Server, and T-SQL topics.

MSDN: SQL Server Replication
MSDN: Always-On Availability Groups
MSDN: Log Shipping
MSDN: Database Mirroring
Log Shipping FAQ
The Impatient DBA
Mariano Kovo: @Mariano_Kovo
Andy Mallon: @AMtwo
Steve Stedman: @SQLEMT
Carlos L Chacon: @CarlosLChacon

Transcription: Mirroring, Availability Groups, Replication, and Log shipping

*Untranscribed introductory portion*

Carlos Chacon: Gentlemen, thanks for coming on the show today with us. Why don’t we take just a minute and introduce ourselves? Andy, why don’t you go ahead and go first?

Andy Mallon: Sure. I’m Andy Mallon. I’m a database architect at a large management consulting firm. I’m been using SQL Server now for about 13 or 14 years for tech support, and eventually becoming a DBA.

Carlos Chacon: Mariano?

Mariano Kovo:
[00:01:00] My name is Mariano Kovo. I work as a database consultant of SQL Server since [inaudible 00:00:48] five, I think, or oldest. I’ve been working in Microsoft three years, here in Argentina subsidiary, and basically work with replication since the first [date 00:01:04].

Carlos Chacon: Very nice. The way we’re approaching this, is we’re each going to talk about one of the different options of Mirroring, Availability Groups, replication, and log shipping. Those are where our focus will be. Then, each one of us will introduce that topic. Then, we’ll get into the pros, cons, differences between each of them. Let’s go ahead and start with Mirroring. Andy, you want to take that one?

Andy Mallon:[00:02:00] Sure. Mirroring has been around for a while. It’s deprecated, starting in SQL Server 2016, but it’s still fully functional, up to and including 2016. It’s not going away any time soon. What it is, is it’s a high-availability option between two servers. It’s database-scoped, so you get everything inside the database, and nothing more, kept in sync between two servers. It’s synchronous, or asynchronous. Asynchronous is an Enterprise-only feature. Even without Enterprise edition, you can use it in synchronous mode. You can have it as automatic failover, or manual failover. It’s a really flexible option for your DR and HA needs. The best part is, from a dollars-and-cents perspective, is if you have software assurance, you don’t have to pay for that second server, if all it’s doing is being a secondary mirroring server.

Carlos Chacon: There you go. A way to save a few pennies there.

Andy Mallon: Absolutely. Everyone’s looking to save few pennies.

Carlos Chacon: [00:03:00] That’s right. I’ll take Availability Groups. The idea of Availability Groups is like mirroring. You have multiple replicas. These replicas have their own data stores. They have their own copy of the database, which means if you have 100 gigabyte database, now you have 200 gigabytes of space. You have to allocate one for each copy of the database. We’re going to add a Window clustering layer on top. You have the database, you have the Windows clustering component. There’s some integration with that, that you’d need. That cluster component has to be there, before you can create the Availability Groups. Then, like the name implies, you have this group so you can actually put databases together, in a group, so they’ll failover, or all be on the same server at the same time. A good example is SharePoint. SharePoint has multiple databases. If, for whatever reason, you’re using multiple drives, and one of those drives were to come available, you’d want all of those databases to move over. You can create a group, or multiple groups, on that server to move them back and forth. Then, you also get an enhancement. You have some enhanced options for those secondary replicas. Now, you can take backups from those secondaries. You can also do some reporting, or querying on them, so it almost makes this second copy. You can offload some of that CPU, or memory, if you have intensive queries for that application. This was introduced in 2012, as an Enterprise-only feature. In 2012, you could have four replicas, so four total nodes. In 2014, they increased that. You can go to eight replicas, although all of a sudden, that sounds like a lot of extra data. They also introduced the wizard that will enable you to replica to the cloud, to Azure, the IAS, the Infrastructure As a Service offering. You have a VM in Azure. You could then run a replica, in asynchronous mode, obviously, because you wouldn’t want that delay, over the internet. You can have that available.

[00:06:00] Then, in 2016, they introduced, you can have nine replicas now. I’m not sure why they only added just the one. They also introduced a feature for standard edition. As Andy mentioned, mirroring has been deprecated. Availability Groups is going to be taking their place. You can have now, in the Standard edition, a replica, but it has some of the same limitations as you do with mirroring. You can’t take backups from that replica. You can’t do reporting, you can’t run DBCC commands, things like that. That’s Availability Groups. Replication, let’s talk about replication, Mariano.


Mariano Kovo: Okay. I think it’s one of the oldest technologies that we have in SQL Server, for copying and distributing data, and various objects. Basically, there are three types of replications. There is the very initial, and the first one you have to do. There’s natural type, when you copy all the object and the data. When you move on, you have to do transactional replication that each transaction made, and the publisher. It’s replicated to the subscribers. It’s like when you buy a magazine, the architecture. You have the publisher that makes the magazine, and all the subscriber that holds the data to itself. I think the replication is not a mechanism for HA or DR. I think it’s a great mechanism to distribute [DHM 00:07:20], but not being considered as a high-availability mechanism.

Mariano Kovo:[00:08:00] A highlight point of replication, that is you can replicate data to SQL servers and database servers that are not from Microsoft, like Oracle. [inaudible 00:07:52]. I think that the negative side of replication is that you have to maintain and keep an eye on it every time it is working. It usually happens that the applications do modifications of data, and there, perhaps, is not, how can I say it? It’s not supported by replication. You’ll have to maintain those changes.

Carlos Chacon: Ultimately, I think replication has adopted that, they call it “the newspaper mentality”. You have a publisher, a distributor, and a subscriber.

Mariano Kovo:[00:09:00] And a subscriber. You can have multiple subscribers, and you can handle with different [inaudible 00:08:46] to push the data to those subscribers, so you can manage the delays between those subscribers. The most complex scenario is the merge replication that allows you, that subscriber can make changes to data, and that changes go back to the publisher. Every server in the architecture becomes a publisher, basically. You have to maintain all possible changes with the data.


Carlos Chacon: That’s probably one of the unique facts of replication, from some of the other items that we’ve talked about, is that you have two sources where you can make changes. Two, or more, where you can make changes. That doesn’t apply to any of the other areas.

Mariano Kovo:[00:10:00] Exactly. The other great thing that you have, was that all the servers participating in this architecture, publishers and subscribers can be different versions of SQL Server on different editions. You don’t have to maintain all the servers like [inaudible 00:10:02].

Mariano Kovo: You can have a SQL Server 2016 and a publisher, and a 2008 as a subscriber.

Log Shipping

Steve Stedman:[00:11:00] Log shipping is one of probably the least glamorous ways of moving data, in a redundant scenario. It’s also one of those that’s fairly robust, and it’s been around for a very long time. It was first introduced, I think, in SQL Server 7, unofficially, through some undocumented calls. Then really, in SQL 2000 and beyond, it was a regular feature. The way log shipping works, is you go through a process where you start with a full backup. Then, you have transaction logs that are being run at a specific time interval. It just continues to copy those transaction logs, and then replay them on the destination server, or multiple destination servers. With this, you get an entire database. There’s no option to pick just a subset of tables, like you could with replication. There’s a couple different ways to do it. Really, an automated way, or a manual way. The choice there really depends on your specific environment.


[00:12:00] The automated way, you just go through the SQL server dialogues, and setup through SQL Server Management Studio, you can turn on log shipping, and it does the full backup for you. Then, it creates the jobs to do the copy of all the transaction log files. The scheduling is all done through that configuration. There’s also an option where, if you just have full backups, and regular transaction logs that are running, you can do a manual process, where you could just start by restoring a full backup on another server, and then have a job that copies and restores the transaction logs on a regular schedule. Effectively, that’s the manual version of log shipping. With this [crosstalk 00:12:02], most people start off with the automated one. There’s some special cases, which I’ll get to in a second, where the manual process may make sense. With this, there’s little-to-no overhead on the source database, with log shipping, once the shipping has been established.


[00:13:00] If you’re already taking full backups and transaction log backups, there’s no more work going on here, other than a tiny bit to keep track of what needs to be copied, and a little bit of bandwidth. Those would be copied off the server already, if they were backups. It’s very light on the primary server. Then, because of that process of just copying those backup files, and getting them restored on other servers, you can effectively do that to any number of SQL servers. Oftentimes, it’s just done from one source to one destination. It is possible you could do that from one source to 25 destinations, if you wanted to. Part of the process with log shipping, since it uses your backups, is that you can use your existing backup location, if it’s somewhere on the network that’s shared between both the source and the destination servers, as the place to hand off, or move, the data between the servers.


Once you get the log shipping database onto the destination server, there’s two options for how that database will look to the people looking at that secondary server. One, is that it’s going to be in a standby option, where you have to actually do a restore with recovery to get that database to come online on the second server. The other, is you can have it copy and go with a read-only option. One of the negatives on that read-only option, though, is that while the next restore is going, that database goes away, and it’s bot read-only. You can’t see it at all.

Carlos Chacon: Yeah. You’d get blocked, basically, when you try to connect to it, because it says, “Hey, I’m in the middle of a restore.”

Steve Stedman:
[00:14:00] Yep. Now, with log shipping, it’s not a great high-availability option, or even medium high-availability, but it is a great DR scenario, or DR option, so that if something catastrophic happens to a primary data center, and you need to get the SQL server running on an secondary server, it’s a great way to bring it back online, which a little bit a downtime.

Carlos Chacon: Sure.

Steve Stedman: One of the scenarios that, I know it’s been around for a long time, and it’s not that glamorous, but one of the more interesting scenarios, that I was part of recently, was a project where we used log shipping to move from SQL 2008 servers, up to Azure VMs that were running SQL 2014. That’s one of the nice things with log shipping. You can go from the version of SQL Server, to the same version of SQL Server, or to any newer version of SQL Server, with the log shipping. You could go form 2008, up to 2016, if you wanted to.

Carlos Chacon: Right. It will take care of any updates. It’s basically just adding data, at that point, with the transaction.

Steve Stedman:
Yeah. Then, part of that move to Azure, where we used log shipping to get the databases there, was it took us more than a week to get all the data copied over on those initial backups. There were lots of databases that we were doing. We were able to use log shipping to keep it caught up, over time, while all those other databases were being copied. It was really, really, incredibly useful to get into an Azure VM, and do a limited-down-time switch over.


Carlos Chacon:

[00:16:00] Right. As we’re talking about these different options, I see this trend. Mirroring and Availability Groups is at the transaction level. That’s the smallest delta, if you will, that it has. Log shipping can, if there are transactions there as well, but then you get to specify a certain time period. I want 15 minutes’ worth of transactions, an hour, a day. There’s some wiggle room, if you will, as to how many transactions are going to be [separate 00:16:04] your source and destination. Then, replication, that same idea-


Mariano Kovo: You have to pray to the gods that the data is replicated.

Steve Stedman: You know? I’ve had that experience.

Carlos Chacon: Yeah. In fact, do that first.

Mariano Kovo: I know that, perhaps in the United States, it’s not a problem. Here, in Argentina, the links between the subscribers and the publishers, perhaps, the internet connection is not quite good. Sometimes, we’ll have days without connecting. The data becomes not updated, and you have to restart all the process.

Carlos Chacon:
[00:17:00] Yeah. It can be not very forgiving there, either. We talk about things go “oops”, and you have to get back to where you were. I have found the replication to be difficult, sometimes, to make that happen. Is that a fair assessment? Don’t everybody jump in at once.

Steve Stedman:

[00:18:00] I would say that replication is probably the hardest to dig yourself out of a hole, when something goes wrong, out of all of the options. Additionally, replication is probably the most expensive, time-wise, on an ongoing basis, to keep it stable. Whereas, with the log shipping option, and I’m not an evangelist for log shipping here, I’m just filling in the details, log shipping is pretty robust, when it comes to a flaky internet connection. If it’s copying a log file, and the connection goes down for a few minutes, to a few hours, or even a day, when it comes back up, it’ll just get caught up. May take a while to get everything caught up, but it will eventually get caught up, and bring all that data up-to-date, as long as you have all those transaction log files still around at the time the connection comes back.


Carlos Chacon: Yeah. That would be the big caveat there, is making sure you have all the transaction logs still available. If your clean up is sometimes, that window is too tight, then if you deleted them, they basically, you have to start again.

Steve Stedman: Yep, absolutely.

Carlos Chacon:

[00:19:00] One of the things, I guess let’s continue with this replication conversation. Mariano made the comment that he didn’t think replication was not a high-availability, disaster-recovery type scenario. I’m going to mix it up here a little bit. My thinking, and maybe this has changed with Availability Groups, but at least in the older versions, my thinking was that replication was really the only way to provide those four nines, five nines if you will, if you wanted your data to truly be available, because you had so many different options with it. Thoughts?


Andy Mallon:

[00:20:00] I would say that it’s not necessarily the only one that you can achieve five nines with. When I worked an e-commerce company, we use mirroring for our high-availability. Ultimately, we had … Our teams’ goal was to hit five nines, but it wasn’t necessarily the goal of the company. We came pretty close. Five nines gives you about five and a quarter minutes of downtime per year. We did a bi-yearly patching. Or, I should say semi-yearly patching. Twice a year, we took production down to patch the production databases. We kept the site up and running within that time frame, using database mirroring.


Carlos Chacon: Now, I’m interested. In that mirroring option, how long were your normally seeing the failover process take?

Andy Mallon:

[00:21:00] That’s the catch process. With mirroring, it will basically constantly be rolling forward and rolling back the transactions on the secondary, to try to keep the secondary up and going for failover. What we found was that each individual database would failover in a few seconds. It depends on how busy it is at the time, how many transactions there to roll forward, to roll back. We found that how recently the database has been checkpointed affected how quickly it would failover. We added a stat for plant failovers, where we would checkpoint the database. Then, do the failover. Even at, say, five or 10 seconds per database to failover, if you’ve got a dozen databases, you very quickly turn that 10 seconds into a couple of minutes.


Carlos Chacon:

[00:22:00] Sure. That seems pretty fast. Of course, I am going from memory here. For mirroring, it seemed like I was in the 15 to 25-second range. Some of those points that you made, as far as checkpoint, I think it would be worth discussing again what that means. Basically, you’re telling SQL Server that if I haven’t written this out to disk, now is the time to go ahead and do so.


Andy Mallon: Exactly. We never quite figured out why that would affect the failover, since in synchronous replication, regardless of whether it’s written to disk, whether it’s been checkpointed, if it’s committed, then that transaction exists on both servers. It’s synchronous. We couldn’t quite figure out how being in synchronous mode depended on it being checkpointed, but we did find that our overall failover time was just a little bit quicker. That could be one of those side effect-type of things, where checkpointing also make X, Y, Z happen, and X, Y, Z was the reason why it was faster. That was our shortcut for figuring it out, just as anecdotal evidence.

Steve Stedman: Back on the five nines comment there, Carlos, one of those with log shipping, is that it’s very unlikely that you would be able to keep a five nines environment with log shipping.

Carlos Chacon: That’s necessarily the purpose for log shipping. If you’re looking for that, that’s not really a candidate.

Steve Stedman: Right. The reason for that is that normally, when a failure occurs with log shipping, it’s a manual process to do that activation on the secondary server. Or, you could have it scripted, but it usually requires human intervention to step in and trigger the fact that we’re going to turn on that second server. Most of the time, when a server goes down, somebody has to call you. Or, you get a notification. You have to connect, and make all that happen. Even in the best scenarios, you’re going to be three to four minutes before you can even start the switch over. Sometimes-

Steve Stedman: Sometimes, it’s more like 15 to 20 minutes. Then, that doesn’t even cover the case for being able to have maintenance windows, and things like that, where you have to switch to the other server. Log shipping is certainly not something you want to consider in a five nines environment.

Carlos Chacon: I know we really haven’t brought the application into play either, because the log shipping doesn’t provide, necessarily, any logic on top for an application to say, “Oh, I’m A. Oh, now I’m on B. I’m over here.” Then, there’s that logic that you’d have to bake in, as well, if you were going to have an application failover, or switch over, switch databases in that event.

Steve Stedman: The way I’ve made that work is through DNS entries, where you have a single DNS entry that you can change to point to server A, or server B when you do the switch over. Again, that takes time, when you do the DNS change, for that switch over to occur, as well.

Carlos Chacon: Right. You have to wait for the timeout, or whatever, for it to flush and pick up the new address.

Steve Stedman:
Yep. Definitely not a five nines options with log shipping.


Mariano Kovo: I’m sorry. [inaudible 00:25:10] was talking about [siv 00:25:12]. In replication, the unit is a table. You have to think that the developers, or the application, creates new objects. You have to add them to the replication scheme. In the case of a failover, or the server go down, you won’t have those new objects, or tables, synchronized.

Mariano Kovo: You have to not forget the security logins, users, permissions. All that, you don’t have it on replication.

Andy Mallon:

[00:26:00] Yeah. I think that brings up your point about just the maintenance of replication. That’s one of the differences from replication, from the others, is that the others are at the database level. Once you make a change in the database, any change, it’s going to get propagated out. With replication, you’re looking at the object. You get to pick and choose what goes and what doesn’t go.

Mariano Kovo: Exactly.

Why Consider Mirroring?

Carlos Chacon: We’ve brought up mirroring. Actually, Andy, one of the reasons I invited you on was a tweet you had put a long time ago. There was some discussion on Twitter about high-availability options. You [inaudible 00:26:34] for mirroring. What would we consider, again, I’m assuming obviously, it’s going to be deprecated in 2016. Beyond 2016, I’ll be deprecated. Under what circumstances, or why would why still be considering using mirroring, if I’m using an older version of SQL Server?

Andy Mallon:


One of the reasons that … There are few scenarios that Availability Groups just don’t fit the bill, or don’t quite fit the bill yet, unless you’re on 2016. Availability Groups require Enterprise edition, or “Expensive edition”, as many people like to all it. If you don’t have deep pockets, then Availability Groups are out. Availability Groups also require a Windows Server failover cluster, which are also a bit more complex. If you’re in a smaller shop, if you don’t have a team of sys admins to help you with getting that Windows cluster setup. In those scenarios, I think having the Windows cluster, whether you’re doing a failover cluster instance, or an Availability Group. This cases, if you don’t have the expertise to support a Windows cluster, you’re actually going to get more downtime than you’d benefit from. There’s lots of switches to flip, lots of knobs to turn. I certainly am no expert on that part of things. From those regards, you might just not have the manpower to support it.


[00:29:00] Another spot where mirroring really shines is cross-domain, or domainless. You can mirror between two servers that are not on a domain, or two servers that are on separate domains. Dependent, again, not having a domain is one of those smaller shop type of scenarios, where you’ve got less infrastructure. Getting mirroring setup is really straightforward in those scenarios, and offers pretty good failover time. Again, I mentioned that, at the eCommerce company, we were doing failovers at about 10 or 15 seconds per database. A little bit longer on busier databases, a little shorter on less-busy databases. If you’re talking failover cluster instances, or log shipping, some of these other options people might look to, the failover time is always going to be much higher. That was actually the primary driver at the eCommerce company, was being able to get completely failed-over in seconds, rather than minutes. If they had used a failover cluster instance, then you’re restarting an entire instance. If you’ve got a lot of memory, starting up the instance can take a while.


Carlos Chacon:
Another little caveat there, is that if a sys admin is connecting and running some query, it will wait for that to complete, before it starts to failover. You could increase your time, potentially, there.


Andy Mallon: Right. You can have scenarios where your looking to a problem can actually make an outage worse.

Steve Stedman: Yes. The watcher effect.

Andy Mallon: Yes.

Steve Stedman: That’s right.

Carlos Chacon:

[00:31:00] I think you make a good point about the Windows cluster component. I won’t claim to have vast Availability Group experience, but I’ve setup several of them now. I agree that the majority of my problems from the Availability Group has come from the Windows side. Or, if you have something going wrong with the cluster, and it freaking out. Then, my database is affected. Versus, my database having something wrong, and it the reason for causing the outage.


Andy Mallon: Right. I know, particularly in, I think it was in 2012, that first version with Availability Groups, they were particularly fussing about if you lose quorum on the Windows cluster, for any reason, then all of the databases go offline. Even your primary database that was up and running that server, still up and running, but your databases go into a non-writable state, because you lost quorum on your cluster, and everything panicked and shut itself down, just to be safe.

Carlos Chacon:

[00:32:00] It thinks there’s a problem, and it wants to eject. Eject, and then [the other 00:31:53] cluster comes down. Now, you’re stuck. That’s where you have to have good relationships with the Windows folks, because that’s not necessarily a database, per se, problem, if they come together. Traditionally, I have seen lots of shops where those two groups haven’t always seen eye-to-eye.


Andy Mallon:

[00:33:00] Yeah. With Availability Groups, in particular, you have such tight integration. In mirroring, if you want to have automatic failover, you just need a third instance that acts as your witness. That instance is basically the same thing as the quorum configuration in a Windows cluster. In Availability Groups, instead of it being a SQL instance, it’s within the DBA’s control. Now, it’s the cluster quorum. It’s a little bit further out of our touch. In order to really understand how that works, you really have to be a system admin, or a clustering expert. Not a lot of DBAs are, so we have to really lean on sys admins. We have to learn more ourselves, but we also have to lean on a sys admin that understands how clusters work, and how Windows work to make that work for everyone.


Carlos Chacon: Yeah. Or, even at a minimum, making sure that we have the right permissions to be able to create the objects, active directory, and all that good stuff, if they’re not going to help us do that. There can be some definite roadblocks there.

Andy Mallon: Exactly.

Carlos Chacon: I want to circle back to replication, just for a minute. We hit it a little bit, and picked on it. Are there good scenarios why we would consider replication now, particularly, [now 00:33:56] that we have got these enhanced options?

Mariano Kovo:


That’s a nice question. I try to avoid it in [every client 00:34:07], when I [heard 00:34:08] replication. There are still a bunch of client here that are really using replication. I was working last week, for [inaudible 00:34:21]. They are a telecommunication company that have multiple different units, like mobile phones, like land lines, and different applications working. Every department owns their servers. They do not share data. When they have to ask for some information between them, they [may 00:34:48] replicate those SQL servers between them. They are different versions, different editions of data. Perhaps, the models of the databases are different, too. When you have to distribute those information between different models of tables, or applications, I think there is a little chance to use replication still.


Carlos Chacon: Sure.

Steve Stedman: You know, one area-

Carlos Chacon: Go ahead.

Steve Stedman: One area with replication, that I’ve seen it used recently, successfully, is with SQL Server, standard edition. You don’t have all the other options available that you would have with Enterprise edition. You need to offload some of your reporting work to a secondary server. Replication can be setup to move just a subset of the tables, off to a secondary reporting-type server, to remove that load from your primary production server.

Carlos Chacon:

[00:36:00] That was my thought, as well. If there was, for whatever reason, you could pinpoint … You have a large database, let’s say 1,000 tables, but your main reporting was only 10 tables, then replication could be pretty attractive, potentially, to just handle that data.


Steve Stedman: Yep. That’s one that, as long as that reporting system can somehow deal with the time it takes to re-initialize one of the publishers, if it needs to be re-initialized. If you can’t live with that, then replication is not a good option.

Andy Mallon:

[00:37:00] The other spot where replication stands out to me as being a great option, is the scenario where maybe, again, in a reporting environment, or for some sort of additional read activity, you want to maintain a longer history of data. With replication, you can replicate just inserts and updates, and not replicate deletes across to it. In a scenario like that, you can have maybe a reporting database, or a data warehouse, that has an indefinite history of data, but on production, you keep just 90 days, or a year. That’s a scenario where you can’t really do that with any of the other technologies that we’re talking about.


Carlos Chacon: That’s an interesting point. I think, however, I would still potentially attack that as moving over the data. I’m assuming, let’s just say it’s at the transactional level. I’m moving it at least every day, and I only need to keep five days’ worth on the source, or something, I’m thinking that I would just do those deletes … I guess not transfer them over. Log shipping, the deletes would come over. Availability Groups, it would come over. Mirroring, it would come over as well. Yeah. You’d have to have two separate … If you’re moving that data over, to your point.

Steve Stedman:
Another way to accommodate that would be with change data capture, or something like that. That’s always a mess to use.


Carlos Chacon: Right. Introduce another functionality there.

Steve Stedman: Yep.

Mariano Kovo: I have some client that replace replication with some [ETLs 00:38:20] to make this offload of the data, and don’t have to use the replication anymore. That’s an [inaudible 00:38:31] that we have, I think, in the standard edition, and enterprise as well. [crosstalk 00:38:36] to work with ETL and [inaudible 00:38:42]. Of course, you have to maintain it, too, and develop the ETL packages.

Carlos Chacon:

[00:39:00] Right. Going back to Andy’s point, if you have the infrastructure, and could support the ETL, then yeah, it makes a lot of sense. If you’re looking for something out-of-the-box, if you will. That’s true.


Steve Stedman: One of the things this really comes down to, is none of these options are really particularly good, or really particularly bad, generally speaking. Some of them may be great, and some of them may be horrible, based off your specific scenario, your environment, or what it is you’re trying to accomplish. It really just-

Carlos Chacon: [crosstalk 00:39:20]Steve Stedman: Go ahead.

Carlos Chacon: No, finish your thought. I’m sorry.

Steve Stedman: They’re really just tools in your tool belt that you can choose from. Depending on what you need, Availability Groups might be the best. Replication might be the best. Log shipping might be the best, and mirroring. Any of them might be the best, depending on that specific environment.

Carlos Chacon: Having said that, I think it was Mariano, or somebody said, when they hear replication in the client environment, “Ugh!” They internally groan. “Ugh!” True or false: that is the way you feel about replication. Steve?

Steve Stedman:
I’ll cringe a little bit, and get some goosebumps thinking about replication sometimes, and not want to deal with it. In the end, I’ll jump in and deal with it, no matter how much I like it or not.


Carlos Chacon: Of course. Andy?

Andy Mallon: True. Replication is definitely not high on my list of things that I want to look at, personally. When it breaks, it’s always a learning experience.

Carlos Chacon: Also true. Mariano, you agree?

Mariano Kovo: Yeah, totally agree with that.

Carlos Chacon: There you go. That might be worth something. We have seen situations where it has worked well. I guess I’d just say that my experience, mostly it’s been yeah, when those bumps in the nights do come, it’s a big bump, and it’s going to take a lot of my time.

Male: That’s probably the biggest issue.

Carlos Chacon:
Awesome. Last thoughts? Things that anybody wants to come back around to, before we jump into SQL family?


Steve Stedman:

[00:42:00] The other thing that I’ll say, advocating from my corner, and talking about mirroring, I do have to say that mirroring is my favorite way to do a migration to new hardware, or to a new version, because you can get the server setup, initialize mirroring to get your data moving across. You can mirror to a higher version. Once you failover, there’s no failing back, but you can mirror to that new version. Then, when it comes time to do that cut over, the amount of work that I have to do, as the DBA, is really just to do that failover, and make sure that the connection strings get updated, or DNS entry gets updated to that new server. The amount of work that I have to do, in that migration window, is really minimal. I can have all of my logins created, all of the data there, all of the jobs staged. I just have to turn them on. It’s just a matter of hitting that failover button. That’s the other spot that I think all of the HA technologies are able to [help 00:42:18], is with those one-time server failovers, when you want to move to another server. If downtime is a consideration, I always pick mirroring as my first choice.


Carlos Chacon:

[00:43:00] That’s a good point. Even in the Availability Group side, the OS has to be the same. SQL Server, of course, has to be the same. In 2016, they did introduce features where you could use a different version. Again, you have to have Windows 2016 to be able to do that. It really hasn’t even come out yet. I think it’s supposed to come out here, at the Ignite Conference. That’s really in the future, so to speak.


Andy Mallon: Right. If you’re on a current version, and using Availability Groups, there’s really no good way to get to a new cluster, and a new version. There’s a little bit of pain that’s going to be involved there, until that new version of Windows comes out, and you can get to that version. Then, you’ll have the tools to migrate to new hardware after that.

Carlos Chacon: That’s right.

Steve Stedman: Similar to what Andy said, on migrating to new hardware, if I’m migrating to new hardware a distant location from the primary, with possibly flaky internet connection in between, that’s where I would lean on log shipping to get there, where it may take a week to get all of the data copied over, before you can activate it on the other end, and have it catch up along the way.

Carlos Chacon: Sure. You could potentially have pauses in the middle there, as well.

Steve Stedman:
Yep. Just my preference, and it’s worked well in the past.


Carlos Chacon: Okay, very good. Should we do SQL family?

Andy Mallon: Let’s do it.

Carlos Chacon: As technology changes so fast, and even part of the discussion, as things continue to evolve and migrate, how is it that you guys keep up with technology, stay engaged, and understand what’s going on? Mariano, let’s start with you.

Mariano Kovo:

[00:45:00] Okay. I usually find myself constantly watching [inaudible 00:44:40] podcast on trying [inaudible 00:44:42]. Unfortunately, I don’t have the time to do as many as I want. I think the online resources are a very good point to start. With Azure, or platforms like that, or [inaudible 00:44:57] from Microsoft, or something like this, you can try new technologies. Of course, I am a user from a local chapter of [PASS 00:45:10]. It’s a great service of information, when you can share these technology changes, and discuss them.


Carlos Chacon: Very good. Andy?

Andy Mallon:

[00:46:00] For me, I really keep up probably primarily through Twitter. I follow lots of people that I think are much smarter than me. I see what they’re talking about, and I see what they’re posting about. Even if I don’t learn necessarily how to do things from their blog posts, I can always learn who is an expert in what area. Then, when something does come up, I have that bare minimum of know that if I want to find something on Azure, I know Grant [inaudible 00:46:00] has an article where he lists out other experts in subareas of Azure. Things like that, where I can learn a little bit about everything in little snippets. Slowly, all those little snippets that I get from Twitter, from linking to blog posts, and just seeing what other people are talking about, over time, those start to add up. Before you know it, you’ve actually learned something.


Carlos Chacon: There you go. There is a downside there, compañeros. If you’re too active in Twitter, then I’ll reach out to you and invite you on the podcast.

Steve Stedman: The next SQL family question would be if you could change one thing about SQL Server, what would it be? Let’s start with Andy on this one.

Andy Mallon: Oh, man. There are so many things that I would change.

Carlos Chacon: Keep mirroring around?

Andy Mallon:
You know? I would keep mirroring around for a little bit longer, definitely. If I had to pick one thing, I would change the default isolation level to be recommitted snapshot, what some people refer to as “recommitted with real versioning”. I think that that’s really the best way to get consistent reads. With a little bit of luck, if you’re using recommitted snapshot, you can convince fewer people to use No-Lock, which is one of my many pet peeves.


Steve Stedman: Nice. I like that one. That is one my peeves as well, with No-Lock. If you listen to, I think, last week’s episode, we ranted about that a little bit. Mariano, as far as changing one thing about SQL Server.

Mariano Kovo:

[00:48:00] Okay, there are many things, too. I was working in the [BI 00:47:51] side in the last months, too. I think that there are too many products in the same product. I don’t know if you mean what I am saying. In SQL Server 16, there are so many choices. You have to install one database server that you get crazy about, “What I have to tell? The R server, [inaudible 00:48:14] server, the integration server, the reporting server.” We have to limit that [things 00:48:23].


Steve Stedman: Maybe focus a little bit more on the core offering, and then spin some of those off into a separate product?

Mariano Kovo: Exactly.

Carlos Chacon: Or, even a more dynamic install. Almost like the [web 00:48:39]. “I want to install the database, I only see database stuff.” It’s not going to ask me about SSIS.

Mariano Kovo: Exactly.

Carlos Chacon: What’s the best piece of career advice you’ve ever received?

Mariano Kovo:
Difficult one. Any thoughts?


Andy Mallon:

[00:50:00] I have one. I would say that some of the best advice I ever got is that the more you screw up, the more you’re going to learn. But, when you screw up, admit it, fix it, ask for help if you need it. If you’re always timid about making a mistake, or always timid about not being the best, you never really get anywhere. Sometimes, you just have to get out there, whether you’re trying something new, or trying something different, get out there. Whether it’s a new type of job, if you want to be a DBA, and you’re not yet. If you want to use a new technology, try it out. If it breaks, it breaks. Admit it. Say, “I messed up. I’m working on it. This is the plan to fix it.” I think you get a lot further, both of this things. Making mistakes, and then owning up to it.


Carlos Chacon: Yeah. There’s no question.

Mariano Kovo: [inaudible 00:50:14]Carlos Chacon: Our last question today, gentlemen. If you could have one superhero power, what would it be, and why do you want it? Andy?

Andy Mallon: I would definitely be able to fly. I am incredibly impatient. I can’t stand waiting in traffic. I don’t like crowded subway platforms. If I could fly, I could get where I want to be quick, over the tops of the crowds, and never had to wait in traffic again.

Carlos Chacon: Very nice. Mariano?

Mariano Kovo:
I have my favorite superhero, that is Flash. Speed, it would be my best super power.


Carlos Chacon: There you go. For those who are afraid of heights, right? Speed is the alternative to flying. Gentlemen, thanks so much for being with us today. I appreciate the discussion. I think we had some good insights. We appreciate you taking some time to chat with us.

Mariano Kovo: Okay. Thank you very much.

Andy Mallon: Thanks for having me.

Steve Stedman: Have a good day, guys.

Carlos Chacon: We’ll put an ending on that, and we’ll wrap that bow up very nicely.

Episode 18: BIML

1400I learn stuff every day.  In this episode I tackle a new subject for me and one I am hearing a lot about–BIML.  BIML is a markup language you can use to build your SSIS packages.

I met up with Cathrine Wilhelmsen and Bill Fellows in Portland and they shared with me some insights on BIML, what they use if for, and some tips on getting started.  If you find your self doing more SSIS packages and haven’t yet tackled BIML, this episode is for you!

Transcription: BIML

Carlos L. Chacon:  This is the, “SQL Data Partners Podcast” My name is Carlos L. Chacon, your host, and this is Episode 18. Today we’re going off the beaten path a little bit, at least for my experience, and we’re talking about BIML.

BIML is not something I’ve worked a ton with. Why would I lie? I really haven’t worked at all with it. I’ve heard a lot about it. Those of you in the SSIS environment will find it very interesting.

We talk today with Bill Fellows from BlueGranite and Cathrine Wilhelmsen from Storebrand. They share their experience about getting involved with BIML. Of course, some of the resources that you can use to get started, if that’s not something that you’re currently doing.

Ultimately, we want to talk about lots of different topics, introduce them, in maybe a different way, and discuss some scenarios that you aren’t currently using. If there’s topics you’d like to hear about, you can hit me up at [email protected], always interested in getting your feedback. I’m also on Twitter @CarlosLChacon.

I am still basking in the enjoyable experience that was the Summit, in Seattle. I happened to go out to Portland, my mother‑in‑law’s in Portland, I was there the weekend before the event and that’s where I ran into Cathrine and Bill. They agreed to chat with me and I’m grateful that they did.

Not only did I pick up some BIML information but I also caught up on the great game of Rugby. I met some guys from New Zealand and Australia, and the World Cup happened to be going on, so they indoctrinated me, or showed me some things about Rugby, so Hamish and Warwick , if you’re listening, thanks for letting me tag along and be part of that exciting event.

Let’s get to it. It’s good to have you and as always compañeros, welcome to the show.

Children:  SQL Data Partners.

[music]Carlos:  Bill, Cathrine, welcome to the show.

Cathrine Wilhelmsen:  Thank you.

Bill Fellows:  Howdy, howdy.

Carlos:  Thanks for being here. First things first. Let’s define, “What is BIML.” BIML’s an acronym.

Bill:  BIML stands for the Business Intelligence Markup Language. I think the thing I always hear it described as is it is basically setting your operating system essentially for your business intelligence infrastructure, so it can handle everything from soup‑to‑nuts types of things, if it’s a business intelligence related item, the concept behind BIML is it can handle that.

Cathrine:  Yep.

Carlos:  Cathrine, you made the point that ultimately people who would be interested in BIML are people who do…

Cathrine:  SSIS.

Carlos:  Why is that so important?

Cathrine:  Because you have a free tool you can use right now to develop in BIML. It’s called, “BIDS Helper,” it’s an add‑on for our Visual Studio. That gives you the ability to generate SSIS packages from your BIMLScripts.

Instead of hand coding, hand creating all your SSIS packages, you can use BIML and BIMLScript to generate those packages automatically for you. Since that’s what’s in the free version of BIML and the tools, that’s where most people get started with BIML.

Carlos:  Let’s back up even a little bit further. I think that the scenario that at least clicked a little bit for me, so, I don’t do 100 percent of my work in SSIS, but I have done data transformations, from time to time, taking data from A, I want it to get to B. A lot of that is coming from tables, so I have a transformation. I want to do a transformation on, let’s just even say, 10 tables.

I have to have 10 unique objects in my package to do that. I’ve got to go in, set each individual table to do that. Then set the destination 10 times. The idea of that framework with BIML, is that there is something that can help me do that easier?

Cathrine:  Yes. What you do with BIML is you specify the pattern. You have that pattern where you pick up some data from a table, and you want to do some transformations and load it into a destination. That’s one pattern. Then you write a BIMLScript for that pattern and then you just go into your database, pick up the metadata with a list of all your tables and columns.

Then you can use that metadata in BIML to create one package for each table or one data flow for each table, anything you’d like to do. Instead of actually creating all of those tasks or packages by hand, you just went into our metadata store, pull out the data, loop over that and generate the packages from that.

Carlos:  Data transformation is the only thing I can do with BIML, or the most common thing?

Bill:  Definitely the most common thing to begin with. BIML itself can describe more than just your SSIS. You can describe your entire database and you can specify your tables, your schemas and actual database itself. It has the ability to emit, actually go ahead and create the table if it doesn’t already exist, create your indexes. You can use it to fully model that.

You can also specify, I just learned this the other day, you can have an initializer set of data, so if you have a set of lookup values you could also model that within the context of BIML. I think it’s really awesome just how expressive it is.

I’ve been doing this for I think two years now and I keep discovering. Today in Cathrine’s session, I learned, “Oh I didn’t realize I could do that,” and I have a to‑do to get even more knowledge out of her head. It’s impressive, to me, how much you can do with it beyond just the SSIS. That’s the easiest one for most people to grab onto. But it goes so much deeper.

Carlos:  We talk about that that it’s helping us with that data transformation. I’m familiar with a package, that package. There’s a file there, I can open it up. Is my BIML creating my SSIS package? Will I have two files then? Do I have one SSIS package, one BIMLScript?

Cathrine:  Yes. You will have a BIML file. It’s just XML code describing your package. The whole point of BIML is that it’s readable, it’s writable. It’s really easy to see how it’s going to be translated into your package. You read the code from top to bottom and that’s what’s going to happen in your SSIS package.

They use the same names for their BIML tags as the SSIS objects and tasks are named as well. That makes it quite easy to see. You can have an execute SQL task, you can have a data flow, you can have a source or an account destination, things like that. It’s named like that so it’s readable and it’s writable.

What happens is that XML code is run through what’s called the, “BIML compiler,” and that generates the SSIS packages for you. It basically just translates that readable and writable XML to the SSIS XML which is more verbose.

Carlos:  In the case where I’m going to create my destination, does the destination get created in the BIMLScript before the package gets created? Or is the BIML going to create some kind of T‑SQL script in my package that will then create the destination? Does that make sense?

Bill:  People really love that and the short answer is, “Only if you tell it to do so”

[laughter]Bill:  Within SSIS, initially when you connect to your OADB destination and you pick your table name, you can also click that Create button. That’s an enhancement that only exists in the Visual Studio tool. When you were in your package it won’t automatically create that table for you.

Carlos:  It has to already exist then.

Bill:  Exactly. If you wanted to properly, “Properly,” I say, if you wanted to do that, you would have to turn on, “Delay validation,” because when the package starts up it’s going to say, “Hey, this table doesn’t exist. We’re going to stop right now”

But you could have a precursor step, an execute SQL task that says, “Hey, if the table doesn’t exist go ahead and create it,” and then your data flow task will work.

We have our static BIML running through the engine as Cathrine described. You could also have, “Hey, as part of this, go ahead and create that table now for us” You could have a separate task fire‑off that’s C# basically that happens to run at same time.

I don’t know that I would do that because I think that would be very clunky. Unfortunately, you can probably do just about anything in BIML, so I think between the two of our minds, “Yeah, you could do that”

[laughter]Bill:  But that’s really not the workload that you necessarily want to take.

Cathrine:  You can do anything in BIML and BIMLScript as long as you tell it to. But at least for people who are just starting to learn BIML from the SSIS world, think of BIML as just another way to create your SSIS package.

Anything you can do in SSIS, you can do in BIML as well. If you want to create just a basic staging package where you load data from A to B, you would have to first create those tables in SSMS. That’s the workflow you would do while developing in SSIS, so it’s the same thing.

You would create those tables first, make sure all of that is good, and then write your BIML code to look up and get that data from A to B. You can write scripts and you can write really advanced C# code if you want to in BIMLScript to do all of that for you, but you wouldn’t really want to.

You want to have your database and source control in a controlled environment, and then just have SSIS read and load data. It’s the same thing with BIML.

Bill:  A really great thing, while you’re listening to my wonderful voice here…

[laughter]Bill:  …a great reason why you might be interested in BIML is if you’ve ever made a change to a package and you’ve tried to show it to someone else, it’s a nightmare to try and figure out what has changed between it.

A great story for why you might be interested in BIML is that it’s just XML. It’s not the just XML that is an SSIS package that is thousands and thousands of lines long. It’s going to be 30, 50 lines. You can use any text diffing tool to try and discern, “You made a change to this field”

It’s much, much easier. You can check that into your version control systems and discover, “Here’s what’s different between version zero and version one of this,” versus trying to do a diff between the SSIS packages and then crying tears for…

[laughter]Carlos:  Interesting. Now that scenario, would I have had to create my packages in BIML to have that difference evaluation? Let’s say I’ve created a package. I’m not using source control.

Bill:  Bad, bad man.

[laughter]Cathrine:  Use source control.

Carlos:  Use source control. That’s right. There’s been a change. I have two versions and I’m not sure what the difference is. I didn’t create them with BIML. Can BIML help me identify where the differences are?

Cathrine:  It will be a lot easier because in BIML the code is simplified, it’s readable and writable. You can compare line to line. If you have a destination, with some properties and you change that, it will just add another string to do that.

Carlos:  But I didn’t create these with BIML. Can I do that with BIML or would I have needed to create the packages with BIML to get that functionality?

Cathrine:  Yes. You would need to create the BIMLScript first and then generate the packages off of that.

Bill:  Although…

[laughter]Bill:  …there is a tool out there that you’re welcome to sample. It is called, “Mist” It will soon be re‑branded as BIMLStudio. There is a 14‑day trial. As part of that 14‑day trial you can reverse engineer up to five packages. In your situation, you could take reverse engineer package one, and then here’s the second package, package two and that would get you back to your original BIML.

That could be an option for you. The other plug I always have to give, the way all of this works, in the free version, is you need, as Cathrine said, BIDS Helper, bidshelper.codeplex.com. It’s a free thing that you install to work with your install of [inaudible 12:52] Visual Studio.

It has a feature that’s called, “Smart Diff” Smart Diff attempts to ignore the layout, separate the presentation part of your SSIS package from your actual physical I’m‑doing‑work part of it.

That can help you. You can only see what’s different but heaven help you trying to merge those differences between package zero versus package one.

Carlos:  You’ve said script, so SSIS. I’m not a developer. I came, actually, from the network. I was going to be in network, a CCNA guy. I started getting into databases. T‑SQL, as far as languages go, that’s my max.

I’ve done some SSIS. The GUI nature of it makes it very easy to get in there and start plugging along. What would a person need, from a scripting perspective, to understand, to start using BIML?

Cathrine:  To start using BIML, it’s actually really easy because you get the Intellisense in Visual Studio with BIDS Helper. Bill mentioned that in his session. Just play with it. Do an open bracket, and then see what comes up in the Intellisense. Press, “Tab,” see what happens and what comes up.

That makes it really easy to learn BIML, the flat BIML, because it’s just XML and it’s easy. Then you can take that one step further and start learning BIMLScript which is C# code nuggets that you use to manipulate your BIML code. That’s the next step where you need to look into C# development. C# code.

But by just knowing a couple of basic C# things like doing foreach loops, doing if‑else statements, simple, simple control blocks, you can get some really powerful features in just a couple of hours by learning that. There are some great tutorials on especially bimlscript.com.

They have walk‑throughs, they have the whole learning path now on the new website where you can start with the basic, “What is BIML?’ All the way through C# code and automated development. Absolutely go there if you want to learn BIML. That will take you from where you need to start learning the basics to a full staging environment and creating that.

Carlos:  Who owns BIML? BIML is its own company? Is it a Microsoft thing? Is it like BIMLScript? You have BIDS Helper on Codeplex, so that’s free. But who’s maintaining bimlscript.com?

Cathrine:  It’s a company called Varigence based in South Carolina. They created BIML and BIMLScript. They also created the IDE called Mist. That’s the licensed tool you need to use for the more advanced features. Then they put their BIML compiler into BIDS Helper and free BIDS Helper.

Varigence teamed up with BIDS Helper guys, gave them their full engine, so you can start using the BIML compiler and start learning that. They also run the bimlscript.com website which is more of a community website.

They have varigence.com as well, which is their business website. Bimlscript.com, yes it’s for the community, but it’s also by the community. I know we’ve posted blogs, external content, things like that, walk‑through snippets. Everyone can contribute to that website.

Bill:  Useless historical fact. BIML actually stems from Project Vulcan, which is something that as I understand it came out of Microsoft research. It’s, “Here’s this cool thing that we thought of,” and then it went away. But the guy, Scott Currie, who started Varigence, said, “I like that,” and he took it and ran with it and now my life is much easier.

[laughter]Carlos:  OK, very good. That is ultimately the reason why you would want to consider using BIML is to get out of some of those manual processes and being able to automate…

Cathrine:  Yes!

Carlos:  …with those components.

Cathrine:  For SSIS when you start learning SSIS, the GUI is great. It’s really intuitive, it’s easy to learn but when you need to create hundreds of packages that follow the same pattern, you need to drag‑and‑drop and align and connect, and drag‑and‑drop and align and connect, and create the same thing over and over and over again, you don’t want to do that. You want to be lazy developers.

[laughter]Cathrine:  You want to write it once, and then have everything automated and created for you. That’s what BIML and BIMLScript allows you to do with SSIS development. The cool part about this is, even for SSIS when you start creating your packages, you can create SSIS templates, which makes it really easy to create more packages based on that template and you don’t need to create everything from scratch.

That is a nice feature in SSIS. The problem is when you need to update that template because you can’t just update the template and apply that to all your packages. You would have to go back and change every single one of the packages you created. While in BIML, you can just change one line of code and have that regenerate all your packages and you’re done.

Bill:  For me, my use case and the reason I really do love BIML, I answer a lot of questions on forums. People asking about, “How do I do X in SSIS?’ If you ask the question on Stack Overflow I’m probably going to answer your question, unless it touches Excel, because no one likes to deal with Excel in SSIS.

[laughter]Bill:  But, it used to be if I were giving an answer, not only would I have to write up all the text, then have to take lots of copious screenshots of getting every single thing in there, now I can maybe get the relevant screenshots put out and then say, “Here’s the BIML. Copy‑paste this into your thing,” assuming you’ve installed BIDS Helper, “You can see exactly how I’m showing you to solve your problem”

You’re trying to learn how to do X. Here this simple 15, 20 lines of code, you’ve got a working example. Otherwise I’d have to upload an SSIS package to someplace and you have to download it and trust that this random code I’ve downloaded off the Internet is not going to toast my machine.

For me it’s a wonderful mechanism for sharing. A very concise way of, “How do I solve this problem?’ “Here” Then you can take that pattern and apply it throughout whenever you need to use it.

Cathrine:  That’s a really cool use case, actually.

Carlos:  Very cool. Now, I once had someone say that if you didn’t learn PowerShell within five years, you weren’t going to be a DBA.

[laughter]Carlos:  We’re going to separate the haves from the have‑nots. Is BIML of that same category? From an SSIS developer perspective, not that they won’t exist, because the GUI is not going to go away, but from a being‑good‑at‑what‑you’re‑doing, will BIML be required knowledge?

Cathrine:  I’m not sure it will be required, but it will make your workday a lot simpler, a lot easier, and that’s the great part about this. You don’t even have to tell anyone you’re using BIML to create your SSIS packages.

You can’t tell the difference between a package you created by hand and the one you created in BIML. BIML is, for SSIS packages at least, it’s just a tool to create those packages for you.

There’s this whole thing that when you start learning and using BIML you finish your work for the week before lunch on Monday, and you take a rest of the week off. Just don’t tell your boss you’re doing that. You’re on your, “BIML break,” is what they call it. It’s actually on my T‑shirt as well.

That’s the fun part about BIML. You don’t have to tell anyone you’re using it. It’s not a very big install, nothing like that. If you’re working heavily with SSIS, start learning BIML as well because you can really save up a lot of time to get more focus on the business value instead of just the plumbing of creating the same package over and over and over again.

Bill:  Completely agree with all that. I can also add in, at least, my personal experience. I use the Base.net libraries to create SSIS packages. If I was heavy in the C# realm, maybe that makes sense, but I think most people that are doing SSIS probably aren’t that heavy into it.

I’ve used Easy API to do it. It’s a little bit better than the Base libraries, but it left me wanting. I’ve been doing this for two years. If Microsoft comes out with the coolest new thing tomorrow to help me make my ability to create packages so much simpler, I would totally drop BIML. But I haven’t seen that.

I haven’t seen anything coming that has made me say, “BIML is great, but..” I don’t have any buts yet for my BIML. I don’t know what your experience is, but I still love this thing, and I’m still learning cool things because they keep adding more features into it.

To me, it’s really awesome. I don’t have any things like, “I just hate it,” about this thing. It’s like, “Give me more,” I wish I was doing more things with it.

Cathrine:  Absolutely, and it’s not just for SSIS, as well. You can do all your relational database things. You can get that model. You can get your cubes and dimensions and facts model and you can use it to generate things like T‑SQL scripts, instead of using dynamic SQL, you can just write BIML scripts to generate that for you.

There are so many use cases where you can use BIML when you start digging into the more advanced features, and you can build up a whole toolbox or a tool belt of BIML scripts that you can run ad hoc. I’ve seen examples of things that are super useful for DBAs for example or into the BI world, that they can use one BIML Script to automate things and check things, and change things by using BIML Script.

When you start digging into it it’s like, “I haven’t found any buts yet” It’s just really awesome.

Carlos:  That example, because I was not aware that there were scenarios where DBAs could be using that for automation purposes, for monitoring or maintenance type. Is that all on the bimlscript.com as well? Is that where those examples would be?

Bill:  Think about what you do. You probably need to, “I need to operate on this set of servers” You know what your servers are, “For all these servers, I need to perform task X” Maybe you’re going to use central management server to fire off queries across all those or, “I need to do [inaudible 23:17] ”

If you don’t want to do those things, you can use BIML Script, so you have the ability to run a SQL script, “Cool, I’m going to define a connection and I’m going to process my first server, change that, go to the next server, go to the next server” You can very easily define a package that’s going to go do that task.

Carlos:  Interesting. Where would some of those examples be? Good scenarios to…? [laughs]Bill:  Right here in my head.

Carlos:  In Bill’s head. Very good. We’ll hit up Twitter. If your Twitter feed blows up…

[laughter]Carlos:  …with requests…

Bill:  The best place to start is bimlscript.com. Again, that’s the community resource. If you don’t see what you need on there say, “Hey,” just ask a question, “Does anyone have one of those?’ Then people will come out of the woodwork, because that’s the beautiful thing about the SQL community is people share.

It’s like, “Oh, you want to see how I do it?” “Oh, I don’t do that. I do it a little bit differently.” You’re going to get flooded with people trying to share their knowledge of, “I’ve scratched that itch and here’s how I went about solving that problem” If not, I guarantee Scott will say, “Oh yeah, I can do that.”

It was great. One night, Jamie Thomson asked on Twitter…

Cathrine:  I loved that.

Bill:  …he was like, “How do I do this?” We used to do this tabular data file thing. I spent about an hour mucking around and I was like, “It’s late. I’m tired” Scott went out and live recorded, “Here’s me solving the thing,” did it in like an hour. He’s like, “Yeah, I probably could have done this faster except I changed my mind in the middle of it,” or something like that.

[laughter]Bill:  The owner is so invested in trying. You give him curious problem, he’s a [snaps fingers] super sharp guy. He comes up with things that I’m like, “Oh God, I wish I had a tenth of your brain power.”

Carlos:  Oh, very cool.

Bill:  They are very engaged in promoting the community aspect, the community side of the organization. I have great respect for them.

Carlos:  You’ve almost persuaded me to… [laughs]Bill:  Give up your SQL ways?

Carlos:  …take BIML a little more seriously [laughs] almost. I’ll have to spend a little more time on bimlscript.com. Great. Thanks for being here. I do appreciate it. I think it’s been a great little discussion. There will be, of course, potentially some additional information to what has been presented.

We happen to be here in Portland today, where both of them presented at a, “SQL Saturday” We’re going to provide links to their presentations. It will be in the show notes at SQLdatapartners.com/podcast. That information will be available there, as well, of course with all the links to the things we’ve talk about today.

Before I let you go, and I know we’ve been hot and heavy into BIML today, but one of the things we like to do is that idea of sharing so that people become aware of other tools that are available is to talk about your favorite SQL tool. It’s OK if that’s BIML.

[laughter]Carlos:  But we’re looking for your favorite SQL tool, free tool, paid tool but why you like it and how do you use it?

Cathrine:  My favorite tool is Red Gate SQL Prompt in SSMS. It is a paid tool, but it saves me so much time. I don’t have to write all my SQL scripts by hand. They can save snippets so you can start using them instead of writing your entire code. It has a lot of great features, auto‑formatting, auto‑completion, things like that. That’s the one tool I cannot live without. It’s SQL Prompt.

Just to add to that one, because that’s a paid tool. A free tool, as well, is SQL Search, Red Gate SQL Search that allows you to search through all your databases across different servers. You can search in tables, use functions, whatever, and anything in your database just search for it and you will find it immediately. One paid tool, one free tool.

Carlos:  Very good. Bill?

Bill:  Definitely, I concur with Cathrine that Red Gate SQL Prompt is absolutely the best thing in the whole wide world.

[laughter]Bill:  I taught a class yesterday and I had already installed it on the machine and they’re like, “How are you doing that?’ I’m like, “You paid for this class…

Carlos:  “Funny you should ask”

Bill:  “But go ahead and spend a little bit more, because this tool is the best thing ever.” The efficiency I get out of that thing is well beyond what I paid for it.

The free tool plug I will give will be, if I take off my developer hat and go more towards my trying‑to‑figure‑out‑what‑the‑heck‑is‑going‑wrong hat, I’ve got to give my love for SQL Sentry’s…

Carlos:  Plan Explorer.

Bill:  …Plan Explorer. Thank you. My brain has stopped working.

[laughter]Bill:  Got to love that tool. I don’t use it often enough because, blessedly or unfortunately enough I don’t do enough on that side of stuff, but I really enjoy what I can do with it.

Carlos:  Great tool?

Cathrine:  Yeah.

Carlos:  Great. Again, thanks so much for being here. We do have one last question for you. But before we do that let’s hear one additional way folks can learn about SQL Server.

[commercial break]Carlos:  Cathrine, Bill thanks again for being here. Before we let you go, if you could have one superhero power, what would it be and why would you want it?

Cathrine:  I want to fly.

Carlos:  Fly. That’s mine.

Cathrine:  Yes, that’s my favorite superhero power. I can travel anywhere at any time, and just that freedom of not being stuck on ground.

Carlos:  Yes, I’m with you. Bill?

Bill:  My favorite superhero was always Green Lantern. I want a magic ring that lets me materialize whatever I’m thinking about.

Carlos:  There you go. OK. Very nice. Great. Thanks again for the conversation, and for informing us a little bit more about BIML. Compañeros, we hope you enjoyed the conversation as well, and we’ll see you on the SQL trail.

[music]Children:  SQL Data Partners.

Episode 09: Database Automation

1400Joey D’Antoni has been doing quite a bit with database automation and I caught with him to discuss his experiences about how database automation has helped him be more efficient and have fewer support problems.  We discuss strategies for starting your automation techniques and he gives a few pie in the sky examples if you are running a large environment.  Compañeros, I hope you enjoy the program.


Show Notes:

Central Management Server
Building Perfect SQL Servers, Every Time
Build scriptsT-SQL Tuesday: Managing the Enterprise
September 2015 T-SQL Tuesday
Joey’s Personal Blog

Transcription: Database Automation

Carlos L. Chacon: Welcome to the SQL Data Partners podcast, my name is Carlos L. Chacon, your host. This is episode 9, and today we’re talking about automation.Interestingly enough, this September 2015 T-SQL Tuesday theme was, “Strategies for Managing an Enterprise.” I ended up writing a little bit about automation, and SQL server setup in particular, and we’re going to continue with that topic or that theme today.

If you’ve never heard about the T-SQL Tuesday theme, this was brainchild of Adam Machanic, where several years ago, he thought of the idea of asking bloggers to host a topic or a theme, and bloggers could submit to this page, and the community could increase their knowledge about that specific theme.

The September theme was hosted by MidnightDBA, and again, they chose strategies for managing an enterprise. Today, we have Joey Dantoni with us. I’m super excited to have Joey. He is well-versed in performance tuning and blogs, over at Joeydantoni.com.

He talks about several items, or several scripts that he makes available, several sessions that are available from some of his different conferences that he attends. All of those will be available on the show notes, at sqldatapartners.com/podcast. You can go there and get information about today’s show.

As always, thanks for tuning in. If you like what you hear on today’s episode, we invite you to give us feedback via iTunes or Stitcher. You can also leave some comments on the website. We’d love to hear from you.

If there’s some potential topics you want us to cover, please let us know about those as well. You can hit me up on twitter at @carloslchacon. Lots of information in today’s session, and as always, compañeros, welcome to the show.

Children: SQL Data Partners.[music]

Carlos: Joey, welcome to the show!

Joey: Thanks, Carlos.

Carlos: One of the things we wanted to talk a little bit about today was automation. I know you’ve shared some stories, and I’ve heard some things about one of your previous employers, working at Comcast. Kind of a large environment.

Joey: Yeah, we had roughly close to 1,000 servers total. I was the architect, but I had 4 DBAs working on a team under me. We didn’t have a whole lot of people to maintain that kind of environment. Realistically, when you get into larger environments, you never do, because people are in scale, and your budget is in scale.We had to automate a lot, just so we didn’t have guys working 24-7. We had our build process, and our trade deployment process, fully automated. Trade deployment was interesting. We used SSIS for that. From a build perspective, it was pretty straightforward.

As part of our personalization infrastructure, we built a service management layer, where business users or developers could come in and request a new virtual machine. We had that process build that from end to end, from the Windows perspective.

Using a lot of power shell, particularly in and around Windows, to do things like do an offline domain join, to get the server joined to the domain, grab an IP address. The only issue we had with that was, we couldn’t get…The DS is fully automated. It’s a long story.

Carlos: [laughs]

Joey: It was a long political story. SQL-wise, what we did was, we automated the installation. It’s a two-faced approach that I took to doing SQL automation. You do the install. SQL has a ton of add defaults. We all know that.The install process itself is pretty vanilla. The only thing that matters before you start using SQL, is that you get the right drives configured. You want the right volumes configured for [inaudible 04:33] sets, and where AF files are going to be set.

After, what we did was…and I still use this with my clients, have the script that does the best practices after the fact, and goes in and sets max memory, sets max [inaudible 04:51] parallelism. All plays on what exists in the server, and what is [inaudible 04:54] the DMVs.

Carlos: The ability to automate, because there is lots of things that we can automate. The idea of why automate, you hit on a little bit of that. Simply your scale that require that you do that. Another reason I can think of to help us automate, or why you would automate, is to have a consistent experience.As we get into, you mentioned 1,000 servers, I don’t want to have to have my data files, for example, even in three different places over 1,000 servers.

Joey: Totally. I don’t even like it with 10.

Carlos: [laughs]

Joey: That was really the other goal of our whole project, client project. We had our build process, for physical servers, pretty automated. There were only a couple of manual steps. Even then, still you had weird things happen, on the Windows side, some hash cache doesn’t get applied, or server has the wrong DNS address. We went through a DNS migration at some point.The other part of automation is, you need to build in some intelligence in your health check process, to make sure everything…I read an article about this the other day, and it really wasn’t related directly to what we’re doing in terms of SQL service in Windows, but was more talking about automation, using [inaudible 06:26] and things like that.

One of the messages that the article said, was you still need to have a process to go back, and make sure nothing’s changed, everything is good, everything is happy in your environment.

Carlos: As much time as we spend, trying to set up our automation process, we still need to have someone to check in, and make sure that everything has done what we think it has.

Joey: We were using a system center configuration manager when I was at Comcast. It was really good, except for one problem. It does not know about the existence of SQL server clustered, several clustered instances, which were 80 percent of my production environment.It can only connect to a default instance, which is a shame, because it worked great for Windows. The tracks were very easy to build, for any kind of Windows task, anything you need to look in the registry, anything you get from [inaudible 07:20] line, about the [inaudible 07:21] .

Since we couldn’t connect to SQL clusters, it was virtually useless to do that. What we ended up doing was, and I like this as the core of any real automation strategy. We had an infrastructure server that the DVA team used, and that was the hub for all our scripts. They were all mounted off of there.

There was also our metadata repository for our SQL servers. Since everything went back to that, that’s where my health check ran. I had a process that ran every morning, that would grab a list of all of our instances, and then dynamically create a batch file that would run SQLCMD.

Which if you’re working on automation, you need to get familiar with SQLCMD, and then run that batch file, and I put the results. As part of that process, it would dynamically create a link server back to our infrastructure server.

This was done that way. I didn’t log, doing that, but we had some network issues. We have different networks in various things, and that infrastructure server was…Any SQL server can always talk back to it. It was clear. I didn’t have to worry about it.

It ran through all of our instances, run our health check script that we put together, and then returned it. The other [inaudible 08:42] was we have a nice create a tabular report that the managers and VP’s to look at and say, “Hey, we’re in compliance.”

We created a scorecard. It gave something that was a measurable goal, the kind of crap I used to have to do when I worked in big companies.


Carlos: That’s an interesting process. Even add to that, that automation process is then providing a way to report on that, so that you can ensure that, “Yes we’re in compliance.”

Joey: Yeah.

Carlos: Interesting. You described what I’m going to consider a fairly complex, or at least robust environment. Some of our listeners are thinking, “Wow! That could be a little bit tough.”Before we get to some of the specific things that we could do to help start our automation processes, let’s talk a little bit first about, what it is that we want to automate.

Joey: There’s a few things, to be honest, to go a little bit on the side, we were stuck down that path, strictly because of our scale. There’s a lot of great tools that are built in, like central management server and policy-based management, that worked really well.None of these will scale beyond 200 servers. Don’t be scared off if you have to build your own thing. Microsoft has some stuff that’s built in. The number one thing I always tell people to automate is…I do a lot of health checks with a consultant, and I do a lot of the same write ups every time.

Automate your build process, and automate the “best practices” script. It’s not that hard. You can download it off my blog, or watch my presentation from Summit, last year, it’s on YouTube. You don’t even have to pay for it.

It’s pretty easy. SQL server generates an INI file for you. If you go through the install process, and set everything up the way you’d want it and click “Cancel” at the end, there’s an INI there. All you have to do at the point is run setup.exe from the command line, and it passes the config file.

If people did that, they would eliminate many percent findings I find in a lot of health checks, the common ones. The temp [inaudible 11:06] is going away in 2016, but one temp.dat file maxes off at zero, cross ratio [inaudible 11:12] at the five.

Data files and temp.dat on the same drive that one is…If you want, this really depends on your SAN configuration, but a lot of the common ones. You hear the guys say to make sure they’re friends with their Windows team.

Because a lot of these automation things derive from making sure you’re a Windows guys. Even if it’s inconsistent, it’s harder to automate. You want to make sure you’re drives are always configured the same way on every server.

You want, typically like some of the other things that are associated with build, that are Windows related, or configuring an instant file on initialization, and a lot of pages and memory if you’re wanting to do that.

If you can get those things configured AD level, that’s one last thing you have to worry about when you’re configuring your servers.

Carlos: Compañeros, listen to that. Go grab your Windows guys, grab them, sing “Kumbaya” and make it happen. [laughs] That’s mostly on the build side, then you mentioned putting in some process from SQL Server Standards, or the default settings.When I tend to think about this as, “These are all my jobs and things as well that I’m going to put out there.” For example my backup routines and things like that. After my build process, I have another set of scripts that run, that will go and do all of that index maintenance, things of that nature.

Joey: I throw all that in my build script, because the one I did at Summit last year, I messed around with it. I use a lot of online scripts. I heard there are a few other ones out there, many, and a couple other things. I’ve been really happy with all the stuff.What I do with that is, I have a part in my build script. It’s a batch file. The batch file does a few things, a call setup that EXE with the config file. Then I ran a PowerShell script, that’s an always website, grabs the latest version of these scripts, which he promises me is a consistent URL.

Writes these to a SQL file, and then after I ran the best practices script, that it sets max memory and a lot. I do all the script. The one thing that’s a little challenging with that, I haven’t taken the time to do. I do it for one point, but I haven’t done is, automating the schedules.

It’s all a dozen schedule which are…he doesn’t schedule the jobs by default. Building a script for that. That’s one of those things that you have to customize for your environment, because not everybody is going to be doing their index rebuild on Sunday morning at 3:00 AM. Most people aren’t.

Carlos: That’s an important process, is then building in your own, if you will, customizations into that environment. Even in something like that, like a schedule, your larger environments might be different from the single instance. The ones it uses.

Joey: Definitely. Then you have things like that that may or may not be consistent across the environment. You may have to get a different places, or different servers, and things like that.

Carlos: My thought there is that I generally have…it is a couple of buckets for those schedules, and then I can then deploy those schedules depending on, “This is a single install. It’s going to be for some small department,” and you’re looking at deploying schedule one.If this is the cost for my production environment, where the whole company uses it, LR, HR, then I am on bucket three, and I can use those processes there.

Joey: That’s a really good approach. You can easily deploy or lay as many schedules as you like. There’s no real negative to that, other than making the table in a CDL a bit bigger. I really have bought it in.I like that approach because you can have…and really from an automation perspective, if you had something in your naming convention, and tie back to the environment. I hate naming convention debates. They’re my least favorite thing in IT, because everybody is going to get point.

This is one of those places where it matters, because if you’re using some sort of regular expression in your naming convention, even in the case you’re talking about. I like the number idea. That you want that number to be like save a third character, or whatever the server was [inaudible 15:50] .

You can parse on that, and then as part of your automation you can apply those schedules likewise, because you string out the server name, and a case statement, and then ruin the following [inaudible 16:03] .

Carlos: You’ve talked a little bit about some of the things that you’ve used. You mentioned the SQLCMD, and of course, PowerShell has come up. The thoughts on getting started with that process, I’m not the PowerShell guy, and you’ve probably forgotten more about PowerShell than I know currently.If I was starting out, start with the automation process, and you’ve mentioned even the install process of getting an INI. What would be some of the steps, or what was your experiences going through that process of, “Here’s how I’m going to start automating some things?”

Joey: Good question. First of all, PowerShell, especially with SQL Server, is like this weird religion, and I’m definitely not a part of. The guys that really like to do PowerShell, really like to do PowerShell. I really like to do it from a Windows perspective, I find to me it’s a painful process, the SQL Server.I do it when I have to. Microsoft does a really good job with instrumenting SQL Server, with T-SQL to do about everything. It tends to be, especially for DBAs, you can go to T-SQL and live in T-SQL every day. It tends to be a lot easier for them to live in that world.

What I typically tell people, and Michael and Nelson is if you do something easily in T-SQL, do it in T-SQL. It’s pretty easier to automate there. However, for tasks that you’re doing with the operating system, or you’re making calls…

One of the things that I worked on last year, was this script for a client, where we would dynamically extend a volume. Call that using a PowerCLI, which is VMware’s implementation. VMware set up command lines for PowerShell.

Basically if we detected that a database volume was getting too full, making OS call, extend the volume, which would in turn make a [inaudible 18:14] call. These kinds of things, are the things I focused on. It’s figure out a task that you…It’ll tell you the exact way on PowerShell.

I forget what it was, but I had to do some pass on about 150 servers when I was at Comcast. It was…


Carlos: [laughs] I don’t want to do this 150 times.

Joey: Yeah. It was like changed DNS, or something stupid like that. Yeah, I went and figured out how to do it. The language for the basics is pretty simple. Then the other thing I’d say, especially with cloud becoming so prevalent, and this is slightly changing. The new Azure portal doesn’t really work for half the stuff they say it does.A lot of the stuff in Azure isn’t exposed through the portal, and you can only do it in PowerShell. I will say Azure’s PowerShell is probably a little bit simpler than regular Windows PowerShell. It’s the same language, but a list of the things I had to do with Azure or PowerShell, are pretty basic.

The basic syntax of PowerShell is, get dash something. Get process will get you a list of process for your desk services, will get you a list of services. There’s a crazy language involved there. It’s really finding a couple of tasks that you want to do.

One of the things I do that will be a nice entry to PowerShell is, I wrote a script to automate patching. Applying service to [inaudible 19:47] is the SQL server. There’s two conditions that Microsoft tells you that you need to meet for a service pack to be applied successfully, or sealed.

It says the build number of SQL server matches the expected build, and that the service is online. I wrote a quick script that sets the expected build number as a variable. You have to pass that [inaudible 20:16] , because it had time to grab it. I’m sure I can get it from setup somehow.

Take that as a build number, run the patch, I did it really inelegantly. I had it sleep for roughly how long the patch takes to apply, and then do a check again. SQL Server, make sure the service is running, if the service isn’t running, email me.

Skip that check if the service is running, and then go on to do a build check, and make sure it all matches, and then email me the builds, and the expected builds. Find a task you want to do. That’s the best way to learn a language I find. Is find something you need to do, and then do it.

I forget a lot. It’s funny, I did a lot of 1.x and Oracle, and UNIX. That whole scripting is a little bit more intuitive to me. They’re [inaudible 21:11] and stuff. I was working on Nobel UNIX client the other day, and I had hadn’t been on a UNIX for like eight months, and I was pretty quick to give out the scripting. PowerShell, I had to do crazy time.


Joey: I knew the concepts, but there’s still a little bit of subtleties.

Carlos: Anytime you have a problem to solve right, it makes it a little bit easier, because you know that there is something that you’re solving. There’s a goal there.Something that you said was important, is that you want to do that process once, figure out how the process can be done in a language like T-SQL, which a lot of our listeners are going to be using, or wanting to be using.

Even inside something as simple as SQL Server Management Studio, you can set that up, script out that process, take a look at it, and start to pull it a part to say, “OK. What are these pieces doing here?” Now that you have that script, you can save it and use it again.

Joey: Definitely. That’s the whole thing with scripting. It’s the repeatability that you don’t have to deal with that going forward. One of my favorite scripting techniques, if you want to do a quick little automation task, is building three SQLs.Using [inaudible 22:33] Nation to build SQL statements. The example I showed in the presentation, I do have [inaudible 22:37] , a real basic back up automation script. I say, “Select ‘Back up data base.'” Plus sign is the [inaudible 22:47] automation operator in SQL Server.

The only thing that gets a little dicey there, is you have to convert any [inaudible 22:54] values to…


Joey: …because SQL Server will try to add the [inaudible 22:59] to the…[crosstalk]

Joey: You create a script, or you create a file that’s a bunch of back up data base statements that are successful. With SQLCMD having an output flag, you can create a quick back strip to say, “SQLCMD run generate backup. SQL, output the results to run backup.SQL.”Then in your next script you say, “Run backup.SQL,” and output those results to a log file. If you want to take that one step further, you can use PowerShell to read the log file, and look for any errors.

Carlos: [laughs]

Joey: That’s a real pretty quick example, but…

Carlos: A step through that. I’m taking the script to create the backup statements. I have another script that executes those statements, and then puts them into a log.

Joey: Right. No. See, you write the scripts that would generate the statements, you output that result to a SQL file, you then run that SQL file, and then you output that result. It’s two lines of SQLCMD. SQLCMD line one, gen.backup.SQL, and that outputs to run backup.SQL. Then the second line is run backup.SQL, and that backup that log.

Carlos: To start automating, we talked a little bit about SQLCMD. Assistance from the community, you mentioned you had some scripts. Are those scripts available to share? You mentioned one was on your website.

Joey: Yeah. I’ll find the URL for that. Definitely my bill process is on my website. Like I said, the presentation, I did a Pass last year on automating your bill process. It’s on Pass TV, which I forget if you have to register for. I know you don’t have to pay for it.

Carlos: We can put those in the show notes. We’ll make sure that those get available.

Joey: Allen Light’s got some pretty good sessions on doing some automation with PowerShell. I’m not sure if any of those are available online for free, but he’s helped me a lot with some of the PowerShell stuff. I still docile or still winded. [laughs]

Carlos: There you go.

Joey: I still use batch files. Anytime you’re combining three or four things, like SQLCMD. I don’t run these through PowerShell. One of the things that SQLCMD does, that say for example we are talking about, one of the challenges to it is headers, and counts, and things like that.Because if you’re automating the building of a file, and you don’t want to have headers in your results, when you are going to run that, SQL is going to show up, it’s like it’s got nothing to do. In SQLCMD, you can pass a -h -1 command to strip the headers out, and there’s also a no count option.

You add that no count in your SQL strip. You’re getting the results you’re looking for. In those, SQLCMD doesn’t have that remove headers option.

Carlos: Ah.

Joey: Some people have rewritten it. There up in like SQLCMD four that people have written themselves, but I generally don’t like to recommend people third-party stuff, because you never know what you can do in client [inaudible 26:28] . I try to make stuff work with the core products stuff.

Carlos: It’s another thing you have to keep up with in your environment.

Joey: Yeah. If Microsoft’s auto updating it, that’s fine, but if it’s something I have to go download from somebody, it’s not as good.

Carlos: Great. Joey, I appreciate the conversation. It’s been some interesting information. We’ll make sure we can share some of those ideas in the scripts that you mentioned there in the show notes.We have a couple more questions for you, one of the things we want to try and do is provide a little bit more value for folks who are listening. We’ve mentioned some great ideas now, but one of the things we’d like to talk a little bit about is your favorite SQL tool. This could be a paid tool, a free tool, but why do you like it, and why are you using it?

Joey: This is dicey, because I’ve done work for every vendor this year.

Carlos: [laughs]

Joey: I really like SQLCMD, for the automation value.

Carlos: There you go.

Joey: I’ll name two other ones, Red Gate…I’m going to fall down at this, they’re steam and compare tool. I don’t think it’s called steam and compare. The tool that I use more than any other tool would be SQL Sentry Plan Explorer. It’s really nice, especially the complex execution plans.

Carlos: Much easier to digest.

Joey: Yeah.

Carlos: Very good. Thanks for sharing that. One other thing, you’re now working with Denny Cherry and Associates…

Joey: Yeah.

Carlos: …or with them. This doesn’t all have to be with them, but I we’re looking for an experience that you’ve had that helps illustrates why you enjoy being a data base administrator, or a data professional.

Joey: It’s helping deliver business value to my client’s. The feeling you get when you take a query down from…Actually specifically around this warehouse we architecture, the client’s seeing their run time speed a 100 to 1,000 times better for some things.They got under our hardware, but some of it’s…we’re using 2016 with column index, and it’s really fast. Seeing those kinds of benefits. This a data company, they sell data as their business. Them being able to deliver results to their customers faster, is adding value.

Carlos: Very good. Before we ask you the last question, let’s take a look at one more opportunity folks have to learn about SQL Server.

Announcer: Hello there compañeros. I want to tell you about a unique SQL Server training opportunity that’s unlike anything you’ve encountered. As a listener of this podcast, you’re entitled to a special offer. SQL Cruise is a premier training experience set in the most unlikely of places, a cruise ship.Tim and Amy Ford have created a wonderful culture on SQL Cruise, with five or six technical leads from various industry sectors, you and roughly 20 other students will talk shop in classes while at sea, and while you’re in port you can enjoy the wonderful world we live in, either by yourself or with a group.

The beauty of SQL Cruise is, the unique relationships you will form with other SQL Cruisers. Worried about leaving your spouse? Bring them along! I did, and she had a great time with the group. In fact, I have been on SQL Cruise twice, and I’m looking to go a third time.

You may be wondering if this is serious training, and I can assure you it is as good as any conference you would attend on land. It never crossed my mind that I could be talking about SQL Server, with several Microsoft MVPs on the beaches of Saint John’s.

I know you must have other questions, so I suggest you head over to sqlcruise.com, and check out the site. Once you are satisfied and you want to sign up, when you get to the question, “would you be so kind to share the name of who referred you, and what source material led you to us?”

You enter Sequel Data Partners, and Tim will ensure you get $100 off the training registration. This is a great offer, and I hope this $100 savings will entice you to check out the site. I did, and went on the Caribbean cruise and had a great experience, and I know you will to. Maybe I’ll see you on board.

Head on over to sqlcruise.com, and use the code “SQL data partners” to save $100.

Carlos: Joey, our final question. If you could have one super hero power what would it be, and why would you want it?

Joey: To bend the laws of physics so data goes faster.[laughter]

Carlos: That’s very interesting. You’re the first one…

Joey: Actually, my other super hero power that I’d want would be a private jet. That’s not really a super hero power, I know, but to greatly simplify aviation.

Carlos: [laughs] There you go. You’ve been the first one who’s indicated that their power would go to helping the client, with the getting data moved faster. That shows your commitment there. Hey, thanks for another great episode. Joey, it’s been great having you on the show, we’ve really appreciated it.

Joey: Cool.

Carlos: Compañeros, thanks again, and we’ll see you on the SQL trail.[music]

Children: SQL Data Partners…

T-SQL Tuesday #70 – Managing an Enterprise

Tsql TuesdayThis month’s T-SQL Tuesday is hosted by Jen McCown (b|t), and is on strategies for managing an enterprise.  I had the chance to meet Jen and her husband on SQLCruise and they really have a down to earth style about the idea of managing an enterprise and they helped me think quite a bit differently about how to tackle the enterprise monster.  This post has made me realize I have a few more blog posts to write about this topic.

Where to begin?

There are number of concepts that could be discussed and I think the links to Jen’s site will run the gambit on ideas.  My concept is a simple one, but one I think worth pursing as I continue to see a variety of environments–inconsistent SQL Server setups.  There are two facets of consistent setups.  The first being the settings and options you choose when you install SQL Server.  This could be the way tempdbs are setup, your service accounts, naming conventions, and what features you choose for an installation.  Some of this can be changed after you install SQL Server–like default parameters, but some of it can’t be changed–or at least not changed very easily.  Each of these are specific to SQL Server and important; however, I wish to address a second facet of SQL Server setups–the pieces that are specific to your organization and the tools/scripts you use to manage your environment.

Consistent Setup

After SQL Server is installed, this is a great time to run through a few other setup items before letting in the environment loose.  Backup jobs and other maintenance tasks are a given, but here are few others to consider.

  • Database Mail setup
  • Extended Events traces
  • A job to recycle the SQL Logs each day.
  • The number of SQL Server logs to keep
  • Creating operators for notifications
  • Adding Alerts Error Number 823, 824, and 825

Screen Shot 09-08-15 at 11.08 PMNeed help starting your script collection?  Don’t forget to script out your options as you create them through the GUI.  This is a great way to start building your configuration nest egg.

Junk Drawer

While there may still be some debate about whether you should store items in the master database, you will need to have a place to stick your stuff.  Remember, we are talking about creating some consistency here and you will need to have a bit of order to make that work.  Growing up, my mom reserved a little pull out drawer in the kitchen for my dad.  Only his stuff was in it and if something was left on the counter, it would wind up in the ‘junk drawer’, but then my dad would always know where to go look for something when he couldn’t find it.  I suggest you do the same with your database environments–create a database dedicated to ‘admin’ stuff.  This stuff isn’t for use in other database; however, I have caught myself using my admin database for a staging table, but that should not be the norm.

This database will be for your scripts you run in your enterprise.

What’s in there?

Examples of scripts I put in my admin database include:

  •  backup scripts
  • certain common Dynamic Management View I use for performance monitoring
  • sp_whoisactive
  • System Log files–If I am auditing something or getting backup history, those tables go in the admin database.

Having all the admin utilities in a single database just makes it easier for when other folks come into the environment and you need to collaborate.  Good luck to you and I hope to see you on the SQL trail.

Abbreviations for SQL Server Components when Installing with PowerShell

I have (finally) been introduced to installing SQL Server with PowerShell and it has made those installations much easier.  Each of my installs; however, may require different SQL Server components so one of the items in my configuration file I change is the Features= section of the INI file.  To help keep all this information in one place, here is a list of items you can choose to install.  Remember, most of these will require other parameters to be set–this is only for the Features= section.  Each feature should be separated by a comma.


SQL Server Component Install Options

This list specifies the items you can choose to install.

Database engine = SQLENGINE
Replication = REPLICATION
Full-text and semantic extractions for search = FULLTEXT
Data quality services = DQ
Analysis services = AS
Reporting services – native = RS
Reporting services – sharepoint = RS_SHP
Reporting services add-in for sharepoint products = RS_SHPWFE
Data quality client = DQC
SQL Server data tools = BIDS
Client tools connectivity = CONN
Integration services = IS
Client tools backwards compatibility = BC
Client tools SDK = SDK
Documentation components = BOL
Management tools – basic = SSMS
Management tools – advanced = ADV_SSMS
Distributed replay controller = DREPLAY_CTLR
Distributed replay client = DREPLAY_CLT
SQL client connectivity SDK = SNAC_SDK
Master data services = MDS

Happy installing. 🙂