Episode 68: SSIS Administration

For all of us working with data, we know that we work no only with SQL Server but the suite of tools around the SQL Server engine.  In this episode, we share some thoughts around SSIS administration, which is the most popular ETL tool in the Microsoft stack.  Just like the SQL Server engine, many times the defaults are taken and in this episode we give some thoughts around logging, package execution, and notification with our guest Ravi Kumar.  With the every expanding functionality of SSIS, Steve and I found this discussion insightful as we think about supporting an SSIS environment and ways we can educate package developers.

Episode Quote

That’s one of the things I preach to everybody that, keep your data types as narrow as possible. It’ll make SSIS Package run a lot faster.

Listen to Learn

  • Why you want to consider SSIS Buffer sizing
  • The differences between package deployment vs project deployment
  • The benefits of package deployment
  • The various logging types for SSIS packages
  • Two parameters to tweak for SSISDB cleanup
  • The validations steps a package goes through for execution.

About Ravi Kumar

SSIS AdministrationBesides being a former SQL Cruiser, Ravi is the BI architect for LMO advertising in Baltimore Maryland.  He is the co-leader of the BI PASS chapter and frequent contributor to SSIS related discussion on the east coast.


Ravi on Twitter
Buffer Sizing
SSISDB database maintenance
Project Deployments


Transcription: SSIS Administration

*Untranscribed introductory portion*

Carlos:       Okay. Ravi welcome to the program.

Ravi Kumar:       Thank you sir.

Carlos:       It’s great to have you. One of the things that you’ve been talking and are doing I guess professionally is working quite a bit with SSIS. I guess shame on us. We’ve never actually had a topic on SSIS in the history of the SQL Data Partners podcast. I don’t know why its been that long but I thought that you would be a great person to bring on and talk a little bit about … From the DBA perspective what we need to know about SSIS. Some of those moving parts and maybe some of the catches that people can kind of get to get stuck with. Why don’t you start us off and talk a little bit about, can you give us the overview of SSIS and what people are mostly using it for.

Ravi Kumar:       SSIS is an enterprise-level tool that has to do with moving data in an out. It’s an ETL to extract transform and load the data. People have been using it for a while. It’s been around since 2005. Before 2005 back into SQL server 2000 it was called DTS. It became a lot more interactive and enterprise-level 2005 and have been improving it ever since then. Most of the organizations use it for moving there in and out and to load the data warehouses and to do migrations from maybe Oracle to SQL server or some [inaudible 00:02:14] to server or SQL server to some other data base.

Currently how we use it, we have at our organization data coming from probably 15 different sources and it goes out to about 5 different other companies. We ingest all the data, we do manipulations for that. We apply business logic to it and it gets sent out. We have certain processes that bring in data every 5 minutes, 24/7. It could be SSIS, it could be very minor where you have 1 or 2 processes running a month and it could be really highly skilled process, where it’s running 24/7 constantly.

Steve:         Okay. Ravi could you share with us a little bit about your background and how you got started and then how you’ve ended up working with SSIS today?

Ravi Kumar:       Sure. I started as a SQL database developer back in 2004 I think. Then it slowly moved in to BI world. Business intelligence usually is 3 parts. As reporting, which is SQL server reporting services, our data warehousing which is SQL server and [inaudible 00:03:31] services. Then moving the data ETL, which is a SQL server integration services. After I started with BI, I started with ETL, moved to data warehousing, moved to reporting, moved back to data warehousing and reporting, now I’m back to reporting and integration services again. It’s the ETL part. Currently I’m a BI manager and I manage a team of 3 other developers who exclusive work on SSIS.

Carlos:       1 additional area in thinking about SSIS and some of the ways you can use it there, is it gets baked in. It’s actually kind of baked into SQL server management studio in a sense as well. Even something as simple as I want to export an Excel spreadsheet into a table or move data from one database to another, I can … They have that, you can right-click import export data. There was 1 for import 1 for export. You actually go to the process and that will actually create a little SSIS package that doesn’t get stored anywhere or saved unless you take that option. Folks can actually use that to move data in and out

Ravi Kumar:       Again, SSIS comes with a SQL server right? You’ve got a SQL server licensing, standard or enterprise it comes with it. Once you go into that option it actually creates an SSIS package in the background and executes it and then gets rid of the package. You can save that and execute it again and again and again if you wanted to. Actually I use it for the same thing. Let’s say if somebody says “hey, I just want the data from this table to go to this table and I want to do it every week”.

If I try to manually create something like that, it’s going to take me maybe 20 minutes where it says if I go through SSIS and go through the visceral, it could take me 3 minutes and after that I automate it and I’m done. It has it’s place but it’s a really really small part of SSIS.

Carlos:       Sure. Fair enough.

Ravi Kumar:       Yes.

Carlos:       As we start using those safe packages and we want to put them into our processing or into our systems, there now are 2 modes or 2 ways in which we can deploy or save those packages.

Ravi Kumar:       Right. Now we’re starting to talk about package deployment vs. project deployment so there was only 1 way to deploy package up until single Server 2008, R2. In 2012, Microsoft introduced Integration Services Catalog. When you’re developing a package and visual studio, usually you have, starting 2012 you can have package or project. Package that is basically 1 unit of subset. It’s basically in the background as an XML file. You can just deploy it by itself and you can use SQL agent to automate just 1 package itself.

There are lots of benefits that you don’t get using package deployment model because it’s just one package. If you are looking at SSIS from a developers point of view, your looking at … Let’s say you need to download one file, put it in your database, do some transformation on it and do an export out of it. If you’re good developer, how you would develop it is you would have 1 package, just download the file. 1 package just to load the data, 1 package just to transform it somehow, 1 package just to export the data into a file.

If you are using a package deployment model you would have … You can still do it but since there’s different different sub sets of it, it just makes us to just to just deploy everything. Let me go over some of the advantages of project deployment. Benefits. First of all, if you’re using package deployment, you’re going to have store passwords somewhere because you going to move from dev to stage to production right? You got to have some sort of account that needs to access different resources, maybe some file systems. You’re going to have to put that password somewhere. What a lot of people did, they use a configuration file up until 2008 and they put that password in plain text within the file which is a big security hole.

Carlos:       Sure.

Ravi Kumar:       Once you deploy to catalog you can have those passwords or those sensitive information into a parameter and then you can change it directly on the server and they can deploy from server to server. Let’s say you deploy from staging to production. The first time you change it on production, it will remember that next time so you never have to change it again. It uses Alias 256 encryption to encrypt everything on the server so it’s a lot less security holes.

Steve:          Then in that example for a developer working on the project, they could deploy to a development server or staging server, then whoever you have doing your configuration management or your deploys could then push from that staging server directly to your production server?

Ravi Kumar:       Yes.

Steve:         Then you know you’re getting exactly what’s on staging and not some files that might have got mixed up along the way.

Ravi Kumar:       Exactly.

Steve:         Thanks.

Ravi Kumar:       You’re going to need to make sure that developers, anything that can change from environment to environment, they put that into the parameters. What happens is that you have … Your connection managers, they’re already have parameterized but anything, let’s say the name of a file or location of a file. If that’s going to change we just put that in parameter and then it’ll get changed once you move to the dev to stage your production.

Carlos:       Is that exactly a one for one. I almost think of it in very simple terms, almost like an ad hock query versus a store procedure right? Where I can encapsulate some logic and I can do some different things with the parameters that I pass in right? That I can’t just do it, that I have to manually change if I was doing that with an ad hock query right?

Ravi Kumar:       Yes. That’s a good analogy. You don’t need to change the code you just change the parameter within that store procedure.

Carlos:       Right. Now there is some additional overhead, I’ll use that word overhead or additional items that we need to have for the project deployment. I think one of the things it will notice is that new database in the SSIS DB. I think this is where again in there you’ll see the option to set those parameters that you talked about in changing things from environment to environment.

Ravi Kumar:       Right. SSIS DB is a whole other beast and we’ll get to it. Yes, that does need to be managed from same level. That’s one of the benefits of deployment. The other benefit is obviously encryption. That your whole package and everything is encrypted right away. Also one of the huge things that I noticed with project deployment is, when you deploy to the catalog, it automatically creates login for you. Up until 2008, what developers had to do, they had to go in and say “okay this is …” You have to log when the packet’s starting, you have to log when it’s ending.

They kind of know when everything executing. It automatically does that through SSIS DB. When you create the Integration Services Catalog it doesn’t get created automatically on SQL server. Once you create that, it creates the SSIS DB for you. Everything gets logged to the SSIS DB and there are views that it can clear you off of. We actually have a Power VI dashboard for our exercise environment so very clearly exercise DB and we know exactly how everything is running in the background.

Carlos:       Interesting. I think that a full out-of-the-box behavior is that you then have to go into the reporting and also I’ve forgotten the name of what that’s called. I think it’s …

Ravi Kumar:       That catalog?

Carlos:       Is it the catalogs on any SQL server management studio? I go into to the catalog and then I can see my package is there, then I can right-click and view those reports. Like view the execution times or whatever that is.

Ravi Kumar:       Yes. You can go on the catalog and you can see the reports. It’ll tell you what failed and it’s really …

Carlos:       You’re actually using something on top, you’re not using those default reports?

Ravi Kumar:       No. okay I do use those reports when something fails right?

Carlos:       Okay.

Ravi Kumar:       I go in and I say okay what failed then it’ll give me details message but if I want to know from a [inaudible 00:13:26] because you have 20, 30 different packages running, some of them running every 5 minutes, I want to know exactly what’s happening, how many times it’s executed, if certain process is turned off or not, then I have a dashboard as a high level view and I query SSIS DB directly for that. That’s one of the huge benefits of having that login built into the project deployment model.

Carlos:       Okay.

Steve:         Okay. When you talk about that Power VI dashboard query in the database there. Is that something that is specific to your environment? Or is it something that would work with pretty much any SSIS DB?

Ravi Kumar:       That will work pretty much with any SSIS DB but currently I have … It’s just the way you have a setup. I have it setup where I have certain ware clause in it and that ware clause directly relates to the projects I have. You could build it in a way that it will work with any Power VI, I mean any SSIS DB because all the views are same on SSIS DB.

Steve:         Right. Okay.

Carlos:       Other than the jobs failing because sometimes the login can get fairly … Depending on the number of items you have in your package, you can get a quite large … I guess other than package failures, what are you use login for?

Ravi Kumar:       Package failure is just execution that it executed. what time is executed so you know what’s actually happening with the package. There are different levels of login within the project deployment model. We’ll cover that in a later section, which is login in itself.

Carlos:       Okay. I thought we were there. Okay.

Ravi Kumar:       Yes. Since we’re already into it, let’s talk about login since we’re pretty much done with all the benefits. Login automatically comes with it and there 3 levels. There is performance, basic and [inaudible 00:15:43]. In my shop since we have so many packages running. I’ve run it on basic which is normal, which is default. I’m sorry I said that backwards. Okay, I’m going to redo that. There are 3 different login levels, performance, basic and [inaudible 00:16:02].

Performance is the leanest. That’s what you would prefer your production server to be on. Performance only logs your warnings and failures. Basic, it logs more than performance. It logs when a package is starting and then a task within a package is starting and when it’s failing and what happened with it.[inaudible 00:16:26] logs everything. They even logs the role counts of what’s coming through. In 2016, the latest version, they introduced 2 more login levels which are Run time Lineage and Custom. Not going to do much into it but Custom I am excited about because you can customize what exactly log out of the box.

My preferences, keep your production on performance and your dev on verbos and your staging on basic. SSIS DB, if you’re running packages constantly, it creates a lot of login and your SSIS DB usage goes high. It does create a lot of data so even with me being on performance and only 15 days of data, it does create … My SSIS DB is constantly around 45 gigs.

Carlos:       Okay.

Ravi Kumar:       15 gigs days. Go ahead.

Steve:         Then is that 45 gigs for looking out there, is that something that you have to manually clean Up over time or does SSIS clean it up automatically for you?

Ravi Kumar:       That’s another part of this. When you create SSIS Integration Service Catalog, it creates 2 jobs for you within SQL agent and it automatically schedule it to run at 12 am. What that job does it cleans SSIS Db before for you. If you go to integration catalog, right-click on it and go to properties. Within the properties you are going to be able to see different different things. There are 2 properties that you should change. One is the login history, how long you want to keep the login history for. The default is 30 days. In production environment I don’t recommend 30 days because do you really need to look back 30 days?

In production what I use it for is make sure something ran, if it failed. If it failed I need to go look at it so I only kept it for 15 days. If you have only 1 or 2 packages running everyday, it’s not going to create a lot of data so you might be okay with 30 days or even a couple of months.

Steve:         Really depends on the volume it sounds like.

Ravi Kumar:       Yes. It really depends on the volume and how much data or how much space you have.

Steve:         Okay.

Ravi Kumar:       The second property I like to change on there, it’s called how many versions of the package you want to keep. Once you deploy to catalog, it actually saves versions for you. Say it’s up to 10 versions, I have it set to 20 versions just in case. Once you change and what you can do, you can automatically go to that project. Let’s say you just deployed something and it starts breaking. You can easily go on let’s say on versions, click on the version and you can revert it back, go to your production and you’re done and you tell developers “hey, something broke fix it”.

Steve:         With that, the version of the projects does that roll back the entire project or just specific packages?

Ravi Kumar:       It rolls back the entire project.

Steve:         So it’s in 1 step you just undo that last deploy effectively or you could even go back.

Ravi Kumar:       Exactly.

Steve:         1 or 2 or further if you needed to?

Ravi Kumar:       Yes. Up until 2014, you could only deploy the whole project. In 2016, Microsoft gave 1 more option of package deployment where even your project deployment, you can just deploy 1 package to that catalog. Even if you only change 1 package. Up until 2014, you were required to deploy the whole package.

Carlos:       Okay. Got you. I can have a nice new feature then just have to only have to deal with the one if you have lots of packages.

Ravi Kumar:       Yes. Let’s say if you have a project and you have 60 packages within the project, people are saying they don’t want to deploy the whole thing. It just takes too long.

Carlos:       Got you.

Ravi Kumar:       Yes. If you use versioning it goes back to that previous version.

Carlos:       One question I had on the login, I ask because I was met with kind of blank stares. We could talk a little about execution and performance because it ties in. One of the things that it will log. If you’ve used SSIS let’s just say you have a source, table, let’s just say it has 10 columns. Then your destination only has 5. You’ve selected all 10 of the source that SSIS will put in the login, I’ve seen that little warning. It says “hey dummy this column is not being used”. You may want to exclude that from the package so that you’re not using as many resources to do this.

Ravi Kumar:       Yes.

Carlos:       One time when we were … Just like everything else SSIS packages sometimes experience performance issues. They want us to take a look to say “hey, will you take a peak at this and see if you can fix it?” When I pointed that out, nothing seemed to change. I guess my question is, should we be taking that warning with a little bit more severity or is that truly just like ehh, forget about it, just another record?

Ravi Kumar:       You don’t need to but it’s not good that development practice because what happens, let’s say developer’s developing that’s. It’s using their system to bring in all the 10 columns into the memory and only write 5 but once you deploy to SSIS, SSIS is smart enough saying “okay I’m not using these columns. Why do I need to bring the data into the memory” and it just ignores it.

Carlos:       The error warning really is then a red herring?

Ravi Kumar:       Yes.

Carlos:       I see that in the error messages says “hey, you selected it”. I’m actually not impacting my SSIS packages by it?

Ravi Kumar:       Yes. It’s more of what development annoyance and developers shouldn’t be doing this anyways. 1 other parts I do is SSIS internals. This is one of the things I stress and exercise internals is that, don’t bring in these columns because it’s just going to hinder your development. Let’s say one of the columns says a blob, it’s going to slow down your packets considerably.

Steve:         Just to confirm with that. If that column is a blob, some type, you’re going to bring that back to the SSIS server. It’s going to see the data and ignore it?

Ravi Kumar:       No. What SSIS is going to do, when you’re developing it, it’s going to bring it but it’s going to bring it to your machine in development. Once you deploy to SSIS server, SSIS server is going to say wait I don’t need to use this data. It’s just going to ignore it and not bring that data into the memory.

Carlos:       I think we should be more specific even. When I have to run it, when I run the package locally. I’m in SQL server manager studio or visual studio and I hit the run button I’m going to have the effect of having this extra columns hit my memory buffer.

Ravi Kumar:       Right.

Carlos:       When I save the package out and have it execute through an automated process, the process of saving the package out to the server if you will, that’s when the columns get ripped out of the package?

Ravi Kumar:       Correct. Steve let’s say if you open visual Studio on a server, your production server if you want to, that blob is there and you execute on visual studio, that data will be brought into the memory.

Carlos:       A double whammy, you’re on the production server and then you running packages that you shouldn’t be. Yikes!

Ravi Kumar:       Exactly.

Steve:         When it’s running that mode then it sounds like it’s very similar to the age old debate over select star versus selecting explicit columns. SSIS is smart enough when it’s running in production mode to rip that out and strip it down to what it needs it sounds like. I think that answers my question. Thank you.

Ravi Kumar:       Okay.

Carlos:       I guess let’s go in and talk a little bit about that, about packages execution. We’ve talked a little bit about setting up parameters, we can change the context under who it executes as, what connections it makes, things like that. Other considerations when we think about executing the packages?

Ravi Kumar:       As far as executions from the point of view I usually suggest, we’re going to tackle it 2 ways. First is that what do you need in order to execute it and what does when it says go time. It’s a little bit about internals. We use SSIS … You deploy to SSIS catalog then you have to execute the packages using SQL agent. What I suggested that you create a credential with it, so you have an active directory account and have an active directory account for each business area.

Let’s say if you are executing package for sales or if you’re executing a package support finance, they all need to have different resources. I would suggest, for part business area, have execution account for SSIS and create a credential for it and give it proper access to any file shares or any servers that you need. Then create a proxy with SQL agent for that credential and then It’ll be available under run as account when you go do a schedule job.

That way SSIS package would run a lot smoothly. Many times, everybody runs into this problem. They deploy a package and they’re saying “okay wait. It can’t execute” because well your executing it as SQL agent service account.

Carlos:       Sure. They made it elevated privileges to get out, attach the files or other systems things like that. If your SQL agent account won’t do that, that’s when the proxy account will come into play.

Ravi Kumar:       Right. You shouldn’t be giving SQL agent account access to everything.

Steve:         I’ve seen that done where because it’s running as a SQL agent they just give that agent account access to everything. Then you’ve got a huge security hole. The proxy option seems like a much better way to do it.

Ravi Kumar:       That’s how we have a setup at our … We have a part business area. We have 1 active directory account and it has access to those particular resources. You create the credential and now it’s time to execute. kind of what SSIS does in the background, as soon as you hit execute, it loads everything into the memory. With SSIS, everything is about memory. It reads XML, it decrypts the XML. Here’s 1 very interesting thing, when it’s loading it, it checks. Let’s say if you’re running all the like SSIS 2008 package on SQL server 2012. It’ll check the version and automatically upgrade it and load it into the memory without telling you anything. Go ahead.

Carlos:       I remember this. The version of the execution of the container might play a role in changes to your package.

Ravi Kumar:       Yes. It doesn’t tell you that there’s a problem. It just upgrades it and drags it. That’s it.

Steve:         That upgrade step, does it take a while? Does that slow things down or is it a pretty quick thing compared to the rest of it?

Ravi Kumar:       It’s pretty quickly for the most part but if you have a package that’s like 22,000 line and somebody made one of those into one package to rule them all. Then it could take a while but I haven’t seen many problems with that and then it …

Steve:         To get around that then you’re probably best off using the same version to edit the packages with as you are when you deploy it, where you’re deploying them to.

Ravi Kumar:       Correct.

Steve:         Okay.

Ravi Kumar:       Then you apply any configuration on any expression. This is where your configurations comes in play, if you have any. Remember we talked about parameters, that if you change from server to server you have different parameters? The next step is, it loads a parameter and then it says “okay just go validate yourself”. The packet goes through validation so you must have seen it in the past that a package fills that says it couldn’t validate and then everybody’s wondering “what data got executed? I don’t know”. Well, nothing got executed because it was just at a validation step. It never read any data at all.

Steve:         Then it’s validated itself and is that where it starts to process the data next?

Ravi Kumar:       Yes. Once it’s done validation, that’s when it says execute action time. That’s when it goes into execution. That’s kind of how the package gets started. Once it’s running, that’s when we start getting into buffers of how the data is brought in and how much memory it’s taking, things like that.

Carlos:       What happens to the common issues that you’re seeing there. [inaudible 00:31:54] at this point we’re talking about performance. How much work can I do in as little time as possible.

Ravi Kumar:       Sometimes validation does take long but everything needs to get validated. One of the issues I’ve seen there is that sometimes the package needs to create a file half way through before another component can access it. The file is not there, it’s going to fail validation and it’s going to say “the file is not there” because package hasn’t created it. There’s a property that you always need to say it’s delayed validation. That’s one of the got you’s I’ve seen many times, where they wouldn’t set that property and the dev they would never be able to figure out why it’s failing.

Carlos:       I see.

Ravi Kumar:       It can kind of go and load. Steve, any questions?

Steve:         No. I think that makes a lot from the experiences I’ve had, understanding how those steps go through now.

Ravi Kumar:       Then SSIS goes and does it’s thing and then it moves to … One other big thing is data flow. Most data flow and said okay now I’m going to bring in some data. This is where we kind of get a little bit of performance tuning and we start talking about buffers. There different kind of buffers, physical buffers, virtual private flap but what it basically comes down to it is, it’s taking up memory on to your server. We touched on this a little bit when you were talking about if you have 10 columns and you’re only bringing in 5 columns.

This is one of the examples for that. Also, if you want package to run really really quickly, use proper data types. This is when you can go in and tell developers that “you’re bringing in data as [inaudible 00:34:06] 255 when you’re only loading 2 digit estate code. It should be charged to. What SSIS does, if you’re reading it as [inaudible 00:34:21] 255, your data type is that then it’s going to take up 255 bytes onto your buffer. It creates a lot of overhead and it reserves that memory for itself.

Steve:         It sounds like it will take up the maximum size of those variable columns?

Ravi Kumar:       Exactly.

Carlos:       Then for each record that comes in, it will take up that size?

Ravi Kumar:       Exactly. It can inflate your data a lot.

Carlos:       I think it’s so easy to overlook that because it happens a lot even in stored procedures. You create a temp table of [inaudible 00:35:05] or an [inaudible 00:35:06] and then you’re comparing or updating the opposite data type. You get a data type mis-match on the engine site then of course the engine has to do that conversion for you. It doesn’t do that very efficiently. The SSIS site we have that same kind of problem except now we’re actually going to get hit a little bit harder potentially with memory allocations.

Ravi Kumar:       Right. Just imagine if you only have 10 columns and you have [inaudible 00:35:38] 255, how much data it will take even though you don’t have that much data.

Carlos:       Sure.

Ravi Kumar:       That’s one of the things I preach to everybody that, keep your data types as narrow as possible. It’ll make SSIS Package run a lot faster. This problem gets inflated a lot more later on when I start talking about buffer sizing. Within 2016, there is a new property call Auto Size Buffers. Before that you could tune the buffers if you wanted to and there’s a whole thing on … On internet you can read about tuning the buffer sizes but it comes down to usually each buffer size to about 10 max.

You can increase it too if you want but it’s a balancing act. What it does, let’s say you can only put 10 megabytes in 1 buffer at a time, while the pack is going to run for a longer time because it can only fit so many rows in 10 megabytes. It’s going to read the row and it’s going to apply transformations and it’s going to write the rows. It’s got to do that again. If you can fit more rows into the memory at one point, at one time then it’s going to go, your package is going to run a lot faster.

That property could be changed but there’s another caveat to it. If you increase it way too much and it’s going to say “well, I don’t have that much memory” it’s going to start driving to the disk. That’s when things slow down considerably. There is a performance counter that you can use. It’s called catalog.DM underscore execution underscore performance underscore counters. DM execution performance counter and they’ll tell you if the data is spilling to the disk. That’s one of the things you don’t want SSIS to be spilling to the disk, especially if you have a SQL server and you have 64 gigs on it, 45 gigs allocated to SQL server, the rest is for the OS.

Your SSIS doesn’t really have a lot of memory to execute and it’s going to start writing to the disk.

Carlos:       I was going to bring that up as well in the sense that then you need to start taking a peek at what your SQL server wants to read for memory as well

Ravi Kumar:       Exactly.

Carlos:       As well as SSIS, I think a lot of times at least in the beginning stages they’re the same box because there’s no licensing implication to make that happen. Then all of a sudden, you could have a lot of contention at the same time and kind of have to have a marriage there, decide who’s going to go get the main resources while those processes are happening? They’ll step on each others toes.

Ravi Kumar:       Exactly. SSIS can take up a lot of memory one time because we know about execution parts but SSIS also has, it’s what’s called execution tree. Within one execution tree you can have up to 5 buffers and within one data flow you can have multiple execution tree, executing at the same time. It could be 5 buffers x 5 or 10. That could easily start hogging up your server. There is a property that I usually suggest my DBA’s to have the developers change it. It’s called Blob Temp Storage Path and Buffer Temp Storage Path. SSIS doesn’t handle blob data really well and most time it ends up writing it to the disk.

What I suggest is you DBA’s have some sort of fast disk preferably a SSD on SQL server. Have this path, Blob Temp Storage path and Buffer Temp Storage Path mapped to that drive because needs to done be done with the data flow or it could be written into a configuration property as well if you want it to. Then you can have that path be directed to the fast disk. It’ll start writing there. Sometimes you want to be able to figure out that okay SSIS is taking up too much memory, it’s writing to the desk and you will see drivers starting to fill. It’s a nightmare for a lot of DVA’s

Carlos:       It should clean up after itself. You’re not going to see the implications. You might see that like the warning come through but then after the package is finished those files will get thrown away.

Ravi Kumar:       Right. There’s 2 caveats to it. What if your C drive gets full and you don’t have that much memory on it. The second thing is that it’s going to be slow. Unless you’re writing it to SSD. That’s one of the things that such as developers to change those 2 particular pack.

Steve:         It sounds like at a minimum you want to make sure that you get those off of the C drive, so that you’re not going to take the system down if it runs out of space and while you’re doing that you should get them on fast storage like an SSD as well to get the best performance out of it?

Ravi Kumar:       Correct.

Steve:         Very similar to the temp DB type recommendations that we have with SQL server as well.

Ravi Kumar:       Yes. Very similar.

Steve:         Okay. Great.

Carlos:       As we’re kind of wrapping up here Ravi. You mentioned a couple different flags. We’ve talked about different scenarios and ways people can take a peek at their environment. I guess last thoughts about best practices or things that … I’ll start, I’m going to say that again. As we wrap up here we’ve talked about different parameters, ways that administrators can take a look at their environments. Other best practices or things that the DBA’s can review on their SSIS environments to make sure they’re not shooting themselves in the foot.

Ravi Kumar:       Carlos that is a great question and I do have an answer for it. I’ve put together a small list of things I’ve learned over time for SSIS. Number 1. App developers make multiple packages. Don’t let them do 1 ring to rule them all, that practice. Not 1 package do everything. You have 1 package do 1 single thing so they can have about 10, 20, 30 different packages within a project.

Carlos:       Now I’m going to have to stop you because then this idea of chaining. I’ve even seen when that doesn’t happen, I feel like the natural instinct is that they break that up into jobs. I’ve actually seen environments where like well these set of packages, we expected to take an hour. We’re going to start that at 1 am. Then these next set of packages will just schedule at 2 am because we have to with for the first ones to start.

Then there’s that day, things are running slow, something happens you have [inaudible 00:43:26- 00:43:27]. Run a little bit long until my 1 am job goes until 2:15. Then that second wave ends because of the “oh hey was it ready?” Then I’m going to get yelled at, jobs fail, the warehouse isn’t loaded and now all of a sudden everybody is like “Ravi!”

Ravi Kumar:       I’m going to finish what I was saying Carlos. Thanks for interrupting me. Once they have multiple packages, then you have them 1 package that executes the rest of the packages.

Carlos:       That sounds like the one package to rule them all.

Ravi Kumar:       No. One package to rule them all is when you have 1 package to do everything. Here you have multiple packages do 1 same thing but you have 1 package to execute rest of the packages. There a couple of benefits to it. One of the benefits is you only need to schedule 1 package. Let’s say you deploy to your catalog and you just execute 1 packet and it will execute rest of the packages. The second huge thing about this is that you can reorganize the packages that how they are executed.

You can take advantage of parallelism like crazy. There are certain packages that are dependent on each other, there certain packages who are not depending on each other. You can have packages that say “these packages I’m going to execute parallelry”. They’ll execute at the same time. Then you can have what’s called presidence constraints and then you can have all those packages connecting to one package that is finished or all those package needs to be finished before moving on. You can paralyze it.

Let’s say you deploy it and you say “well it’s taking up too to much resources and we cannot have parallel. All we need to do is you go in 1 package and you rearrange how everything executed. Now it’s not taking up so many resources any more.

Steve:         Interesting. That 1 main package, it just knows what set of parallel tasks can run. Then you can use that to really throttle what’s going on to get the best use or not overload your system at that point?

Ravi Kumar:       Right. I’m going to use an example of one of my friends. I know him through our local user groups, he works for Pandora jewelry in Baltimore. They have a huge data warehouse that runs around 2 terabytes. They had one package just like this for execution and it was just bringing the server to the knees and just stuff would just stop. It was taking so much resources. All they did, they just put it in 1 package, re arranged it so things we’re not so much paralyzed, redeployed it, took care of the problem. It makes things very flexible.

Steve:         I like that and I think I will certainly use that next time I’m working with SSIS.

Ravi Kumar:       Awesome. Also sometimes DB’s are not sure what SSIS package is doing. They’re like okay, “I’m not sure how it’s running into my system”. Well if you don’t know how it’s running into the background and what it’s doing on the database, run SQL profiler. I’ve had to do that many times before but somebody deployed something and I’m like “okay I don’t know what it’s doing”. Instead of going into the package it was like okay “let me look at it, what it’s doing on the server side” which makes it a lot quicker. Obviously you can open a package and figure out what it’s doing but that’s going to take a lot longer.

Steve:         I can definitely see how the profiler would really simplify figuring out what the package is doing because the package can certainly be confusing if it’s your first time looking at it.

Ravi Kumar:       Yes. Also implement …

Carlos:       I would think, that seems [inaudible 00:48:02] to me. I like the visual of the package. It at least will point me in the right direction. The concern I would have with the profiler is being able to window that down. Is there like an event class or application name I guess that we would just filter on to make sure that it’s only SSIS packages that get in there?

Ravi Kumar:       I think it’s the user that executes it. If you have one of the user executing it, then it’ll login that user and you can filter by the user.

Carlos:       Then it should be our proxy account?

Ravi Kumar:       It should be a proxy account but if you’re just, let’s say you are, you’re not sure what you doing in your development. Somebody wants to know what it’s doing in development. Usually I don’t try to run profiler on production but this is one of the tips and tricks for development, is that if you want to know what it’s doing, you user who’s executing or maybe you’re executing it at 1 point, you can see what its doing in the background.

There certain things that sometimes happen in SSIS that are hidden in expressions. Different different expressions that it wouldn’t be apparent to you why exactly it’s doing that. Sometimes you need to go in the background to see what it’s doing. SSIS by nature is that you can program it away so that everything will be hidden and you want to know what’s going on. It really depends on the developer, how they developed it. This has come in handy before.

I’m like okay, I have looked through everything and I don’t know why it’s happening this way. Let me go look at the profiler. It’s more of a development tool than anything or troubleshooting. Moving onto the next HR is bad you can pray everything was going okay I have everything and I don’t know why it’s happening this way let me look at the profile development troubleshooting. Moving onto the next point, 1 of the things I have found huge is implementing standards within SSIS. Your naming standards or how you do things. One of the things earlier I talk about is having multiple packages for a project. That could be in a standard having naming conventions for standards.

That helps a lot. Also SSIS doesn’t do everything out of the box. Something as simple as zipping up a file, it doesn’t do that. It doesn’t have a task for it. I came across many times to developers “oh I’m a developer. I can just build that task”. They’ll go spend days on trying to build a task then US DBA might have to run like EXC on your server, and have to be able to deploy in on every server just do that one thing. It takes development costs really high and it’s a pain in the butt trying to deploy it.

It’s better to buy like a standard SSIS extensions so they’re provided by … One is task factory from Progmatic works. Other is COZYROC. Those 2 connectors, install it every server and you’re done. It covers almost everything that they can think of and you know it’s been tested by the company.

Carlos:       Full disclosure COZYROC has been a previous sponsor of this podcast. We’re familiar with those guys.

Ravi Kumar:       COZYROC tend to be a bit cheaper than Task Factory but at our work we use both of them because I like the Salesforce connector for COZYROC but there are other connectors that are TASK factory that I really like. I have both of them and it pays for itself over time.

Carlos:       Ravi’s just rolling in the dough. Over there in the advertising.

Ravi Kumar:       The last point I have is for DVA’s is how you annoy the BI team. What you do is you create email group for BI team. For us we have [email protected] It could be [email protected] whatever. Have an email group for them then have that email group as database mail also. Any time a package fails, it sends everybody an email like “hey this package failed”. Have that email go to the BI manager also so everybody freaks out that something failed and production.

Carlos:       What would you have us do instead Ravi?

Steve:         I think what you’re saying is do that so that there’s the peer pressure on that team to go fix it.

Carlos:       I thought don’t send that email. I see.

Ravi Kumar:       Make sure you have a database group for the BI team. Everybody is responsible for their own task. If something failed, they need to know that something failed. It’s better for them also because if some of their reports are relying on that process to run, they don’t want to send that email automatically to the business when it’s the wrong email. They need to be notified that something has failed or not.

Steve:         Basically expose all the success or failure of the jobs the entire team so that everyone knows what’s happening and what needs to be fixed?

Ravi Kumar:       Exactly. If it’s a success don’t worry about it. If it’s a failure, everybody needs to know.

Carlos:       I feel like you have a little bit of culture potentially in there. Who fixes that might vary but at least having everybody on board I can say that would be helpful. Ultimately you do want, I think particularly in the data warehouse environments, if those jobs do fail, you’re going to want to engage those guys because they’re going to be the ones. Hopefully, they’ll be fixing the packages for you. Very good. Should we do SQL family?

Steve:         Yes. Let’s do it. A new question I think we’ve added this week is on SQL saturdays. Where was the very first SQL Saturday that you’ve attended?

Ravi Kumar:       It was actually Richmond, was the first SQL Saturday I ever attended.

Carlos:       [inaudible 00:54:35]Ravi Kumar:       It was funny. I have story about that. I went on SQL cruise not knowing that there’s a huge SQL server community there. Why I went on SQL cruise is because I heard Chris Bell, I met him one time at a SQL user group and he mentioned SQL cruise training, that stuck. I talked to him, he convinced me to go to SQL cruise. Then I learned all about how big deal SQL Saturdays were. We went on cruise in January and SQL Saturday was coming up in March in Richmond. That was the first one I attended.

Carlos:       Very nice. Technology is constantly changing. We’ve talked a little bit about you and some of the 2016 features that have changed in SSIS. How do you normally keep up with technology?

Ravi Kumar:       SQL saturdays, Path summit and if there’s a problem, I try to go find the answer online and Twitter. I usually follow all the people who are leaders and great in technology who tweet about these things. Even the SQL family then I kind of keep watch on it, what’s coming in, what’s not coming.

Steve:         Okay. Great. If you could change 1 thing about SQL server, what would it be?

Ravi Kumar:       If I can change 1 thing about SQL server what would it be? I wish they would increase the SQL Express database size from 10 gigs to at least 50 gigs. That includes SQL agent.

Steve:         They have that. It’s called standard edition right?

Carlos:       There you go. We’ll see what happens with the SQL server on Linux. Maybe you can get your gigs that way.

Ravi Kumar:       We’ll see what happens.

Carlos:       Bets piece of career advice you’ve received?

Ravi Kumar:       Best piece of career advice I’ve received? Keep your resume’ updated and go interview at least once a year even if you don’t have intentions of taking a job. Just to know how the job market is out there.

Steve:         Interesting. That’s the first time I’ve heard that one. I like that. If you could have 1 super hero power, what would it be and why would you want it?

Ravi Kumar:       That’s a tough one. What’s the 1 super hero power I can have and why would I want it?

Steve:         Just 1.

Ravi Kumar:       I think, does Dr. Manhattan counts as superhero?

Steve:         I don’t know.

Carlos:       Yes. We’ll count him.

Ravi Kumar:       Dr. Manhattan is this blue guy from the watchman. He’s basically a God. He can do anything by … He can close his eyes and in a blink he’ll be on Mars. I would like to have his super powers so I can visit Mars and any other planet. Be next to the sun and not get burned.

Carlos:       There you go. You want to explore space?

Ravi Kumar:       Explore space.

Steve:         Nice.

Carlos:       Well, Ravi thanks so much for being with us on the program.

Ravi Kumar:       Thanks for inviting me guys.

Episode 18: BIML

1400I learn stuff every day.  In this episode I tackle a new subject for me and one I am hearing a lot about–BIML.  BIML is a markup language you can use to build your SSIS packages.

I met up with Cathrine Wilhelmsen and Bill Fellows in Portland and they shared with me some insights on BIML, what they use if for, and some tips on getting started.  If you find your self doing more SSIS packages and haven’t yet tackled BIML, this episode is for you!

Transcription: BIML

Carlos L. Chacon:  This is the, “SQL Data Partners Podcast” My name is Carlos L. Chacon, your host, and this is Episode 18. Today we’re going off the beaten path a little bit, at least for my experience, and we’re talking about BIML.

BIML is not something I’ve worked a ton with. Why would I lie? I really haven’t worked at all with it. I’ve heard a lot about it. Those of you in the SSIS environment will find it very interesting.

We talk today with Bill Fellows from BlueGranite and Cathrine Wilhelmsen from Storebrand. They share their experience about getting involved with BIML. Of course, some of the resources that you can use to get started, if that’s not something that you’re currently doing.

Ultimately, we want to talk about lots of different topics, introduce them, in maybe a different way, and discuss some scenarios that you aren’t currently using. If there’s topics you’d like to hear about, you can hit me up at [email protected], always interested in getting your feedback. I’m also on Twitter @CarlosLChacon.

I am still basking in the enjoyable experience that was the Summit, in Seattle. I happened to go out to Portland, my mother‑in‑law’s in Portland, I was there the weekend before the event and that’s where I ran into Cathrine and Bill. They agreed to chat with me and I’m grateful that they did.

Not only did I pick up some BIML information but I also caught up on the great game of Rugby. I met some guys from New Zealand and Australia, and the World Cup happened to be going on, so they indoctrinated me, or showed me some things about Rugby, so Hamish and Warwick , if you’re listening, thanks for letting me tag along and be part of that exciting event.

Let’s get to it. It’s good to have you and as always compañeros, welcome to the show.

Children:  SQL Data Partners.

[music]Carlos:  Bill, Cathrine, welcome to the show.

Cathrine Wilhelmsen:  Thank you.

Bill Fellows:  Howdy, howdy.

Carlos:  Thanks for being here. First things first. Let’s define, “What is BIML.” BIML’s an acronym.

Bill:  BIML stands for the Business Intelligence Markup Language. I think the thing I always hear it described as is it is basically setting your operating system essentially for your business intelligence infrastructure, so it can handle everything from soup‑to‑nuts types of things, if it’s a business intelligence related item, the concept behind BIML is it can handle that.

Cathrine:  Yep.

Carlos:  Cathrine, you made the point that ultimately people who would be interested in BIML are people who do…

Cathrine:  SSIS.

Carlos:  Why is that so important?

Cathrine:  Because you have a free tool you can use right now to develop in BIML. It’s called, “BIDS Helper,” it’s an add‑on for our Visual Studio. That gives you the ability to generate SSIS packages from your BIMLScripts.

Instead of hand coding, hand creating all your SSIS packages, you can use BIML and BIMLScript to generate those packages automatically for you. Since that’s what’s in the free version of BIML and the tools, that’s where most people get started with BIML.

Carlos:  Let’s back up even a little bit further. I think that the scenario that at least clicked a little bit for me, so, I don’t do 100 percent of my work in SSIS, but I have done data transformations, from time to time, taking data from A, I want it to get to B. A lot of that is coming from tables, so I have a transformation. I want to do a transformation on, let’s just even say, 10 tables.

I have to have 10 unique objects in my package to do that. I’ve got to go in, set each individual table to do that. Then set the destination 10 times. The idea of that framework with BIML, is that there is something that can help me do that easier?

Cathrine:  Yes. What you do with BIML is you specify the pattern. You have that pattern where you pick up some data from a table, and you want to do some transformations and load it into a destination. That’s one pattern. Then you write a BIMLScript for that pattern and then you just go into your database, pick up the metadata with a list of all your tables and columns.

Then you can use that metadata in BIML to create one package for each table or one data flow for each table, anything you’d like to do. Instead of actually creating all of those tasks or packages by hand, you just went into our metadata store, pull out the data, loop over that and generate the packages from that.

Carlos:  Data transformation is the only thing I can do with BIML, or the most common thing?

Bill:  Definitely the most common thing to begin with. BIML itself can describe more than just your SSIS. You can describe your entire database and you can specify your tables, your schemas and actual database itself. It has the ability to emit, actually go ahead and create the table if it doesn’t already exist, create your indexes. You can use it to fully model that.

You can also specify, I just learned this the other day, you can have an initializer set of data, so if you have a set of lookup values you could also model that within the context of BIML. I think it’s really awesome just how expressive it is.

I’ve been doing this for I think two years now and I keep discovering. Today in Cathrine’s session, I learned, “Oh I didn’t realize I could do that,” and I have a to‑do to get even more knowledge out of her head. It’s impressive, to me, how much you can do with it beyond just the SSIS. That’s the easiest one for most people to grab onto. But it goes so much deeper.

Carlos:  We talk about that that it’s helping us with that data transformation. I’m familiar with a package, that package. There’s a file there, I can open it up. Is my BIML creating my SSIS package? Will I have two files then? Do I have one SSIS package, one BIMLScript?

Cathrine:  Yes. You will have a BIML file. It’s just XML code describing your package. The whole point of BIML is that it’s readable, it’s writable. It’s really easy to see how it’s going to be translated into your package. You read the code from top to bottom and that’s what’s going to happen in your SSIS package.

They use the same names for their BIML tags as the SSIS objects and tasks are named as well. That makes it quite easy to see. You can have an execute SQL task, you can have a data flow, you can have a source or an account destination, things like that. It’s named like that so it’s readable and it’s writable.

What happens is that XML code is run through what’s called the, “BIML compiler,” and that generates the SSIS packages for you. It basically just translates that readable and writable XML to the SSIS XML which is more verbose.

Carlos:  In the case where I’m going to create my destination, does the destination get created in the BIMLScript before the package gets created? Or is the BIML going to create some kind of T‑SQL script in my package that will then create the destination? Does that make sense?

Bill:  People really love that and the short answer is, “Only if you tell it to do so”

[laughter]Bill:  Within SSIS, initially when you connect to your OADB destination and you pick your table name, you can also click that Create button. That’s an enhancement that only exists in the Visual Studio tool. When you were in your package it won’t automatically create that table for you.

Carlos:  It has to already exist then.

Bill:  Exactly. If you wanted to properly, “Properly,” I say, if you wanted to do that, you would have to turn on, “Delay validation,” because when the package starts up it’s going to say, “Hey, this table doesn’t exist. We’re going to stop right now”

But you could have a precursor step, an execute SQL task that says, “Hey, if the table doesn’t exist go ahead and create it,” and then your data flow task will work.

We have our static BIML running through the engine as Cathrine described. You could also have, “Hey, as part of this, go ahead and create that table now for us” You could have a separate task fire‑off that’s C# basically that happens to run at same time.

I don’t know that I would do that because I think that would be very clunky. Unfortunately, you can probably do just about anything in BIML, so I think between the two of our minds, “Yeah, you could do that”

[laughter]Bill:  But that’s really not the workload that you necessarily want to take.

Cathrine:  You can do anything in BIML and BIMLScript as long as you tell it to. But at least for people who are just starting to learn BIML from the SSIS world, think of BIML as just another way to create your SSIS package.

Anything you can do in SSIS, you can do in BIML as well. If you want to create just a basic staging package where you load data from A to B, you would have to first create those tables in SSMS. That’s the workflow you would do while developing in SSIS, so it’s the same thing.

You would create those tables first, make sure all of that is good, and then write your BIML code to look up and get that data from A to B. You can write scripts and you can write really advanced C# code if you want to in BIMLScript to do all of that for you, but you wouldn’t really want to.

You want to have your database and source control in a controlled environment, and then just have SSIS read and load data. It’s the same thing with BIML.

Bill:  A really great thing, while you’re listening to my wonderful voice here…

[laughter]Bill:  …a great reason why you might be interested in BIML is if you’ve ever made a change to a package and you’ve tried to show it to someone else, it’s a nightmare to try and figure out what has changed between it.

A great story for why you might be interested in BIML is that it’s just XML. It’s not the just XML that is an SSIS package that is thousands and thousands of lines long. It’s going to be 30, 50 lines. You can use any text diffing tool to try and discern, “You made a change to this field”

It’s much, much easier. You can check that into your version control systems and discover, “Here’s what’s different between version zero and version one of this,” versus trying to do a diff between the SSIS packages and then crying tears for…

[laughter]Carlos:  Interesting. Now that scenario, would I have had to create my packages in BIML to have that difference evaluation? Let’s say I’ve created a package. I’m not using source control.

Bill:  Bad, bad man.

[laughter]Cathrine:  Use source control.

Carlos:  Use source control. That’s right. There’s been a change. I have two versions and I’m not sure what the difference is. I didn’t create them with BIML. Can BIML help me identify where the differences are?

Cathrine:  It will be a lot easier because in BIML the code is simplified, it’s readable and writable. You can compare line to line. If you have a destination, with some properties and you change that, it will just add another string to do that.

Carlos:  But I didn’t create these with BIML. Can I do that with BIML or would I have needed to create the packages with BIML to get that functionality?

Cathrine:  Yes. You would need to create the BIMLScript first and then generate the packages off of that.

Bill:  Although…

[laughter]Bill:  …there is a tool out there that you’re welcome to sample. It is called, “Mist” It will soon be re‑branded as BIMLStudio. There is a 14‑day trial. As part of that 14‑day trial you can reverse engineer up to five packages. In your situation, you could take reverse engineer package one, and then here’s the second package, package two and that would get you back to your original BIML.

That could be an option for you. The other plug I always have to give, the way all of this works, in the free version, is you need, as Cathrine said, BIDS Helper, bidshelper.codeplex.com. It’s a free thing that you install to work with your install of [inaudible 12:52] Visual Studio.

It has a feature that’s called, “Smart Diff” Smart Diff attempts to ignore the layout, separate the presentation part of your SSIS package from your actual physical I’m‑doing‑work part of it.

That can help you. You can only see what’s different but heaven help you trying to merge those differences between package zero versus package one.

Carlos:  You’ve said script, so SSIS. I’m not a developer. I came, actually, from the network. I was going to be in network, a CCNA guy. I started getting into databases. T‑SQL, as far as languages go, that’s my max.

I’ve done some SSIS. The GUI nature of it makes it very easy to get in there and start plugging along. What would a person need, from a scripting perspective, to understand, to start using BIML?

Cathrine:  To start using BIML, it’s actually really easy because you get the Intellisense in Visual Studio with BIDS Helper. Bill mentioned that in his session. Just play with it. Do an open bracket, and then see what comes up in the Intellisense. Press, “Tab,” see what happens and what comes up.

That makes it really easy to learn BIML, the flat BIML, because it’s just XML and it’s easy. Then you can take that one step further and start learning BIMLScript which is C# code nuggets that you use to manipulate your BIML code. That’s the next step where you need to look into C# development. C# code.

But by just knowing a couple of basic C# things like doing foreach loops, doing if‑else statements, simple, simple control blocks, you can get some really powerful features in just a couple of hours by learning that. There are some great tutorials on especially bimlscript.com.

They have walk‑throughs, they have the whole learning path now on the new website where you can start with the basic, “What is BIML?’ All the way through C# code and automated development. Absolutely go there if you want to learn BIML. That will take you from where you need to start learning the basics to a full staging environment and creating that.

Carlos:  Who owns BIML? BIML is its own company? Is it a Microsoft thing? Is it like BIMLScript? You have BIDS Helper on Codeplex, so that’s free. But who’s maintaining bimlscript.com?

Cathrine:  It’s a company called Varigence based in South Carolina. They created BIML and BIMLScript. They also created the IDE called Mist. That’s the licensed tool you need to use for the more advanced features. Then they put their BIML compiler into BIDS Helper and free BIDS Helper.

Varigence teamed up with BIDS Helper guys, gave them their full engine, so you can start using the BIML compiler and start learning that. They also run the bimlscript.com website which is more of a community website.

They have varigence.com as well, which is their business website. Bimlscript.com, yes it’s for the community, but it’s also by the community. I know we’ve posted blogs, external content, things like that, walk‑through snippets. Everyone can contribute to that website.

Bill:  Useless historical fact. BIML actually stems from Project Vulcan, which is something that as I understand it came out of Microsoft research. It’s, “Here’s this cool thing that we thought of,” and then it went away. But the guy, Scott Currie, who started Varigence, said, “I like that,” and he took it and ran with it and now my life is much easier.

[laughter]Carlos:  OK, very good. That is ultimately the reason why you would want to consider using BIML is to get out of some of those manual processes and being able to automate…

Cathrine:  Yes!

Carlos:  …with those components.

Cathrine:  For SSIS when you start learning SSIS, the GUI is great. It’s really intuitive, it’s easy to learn but when you need to create hundreds of packages that follow the same pattern, you need to drag‑and‑drop and align and connect, and drag‑and‑drop and align and connect, and create the same thing over and over and over again, you don’t want to do that. You want to be lazy developers.

[laughter]Cathrine:  You want to write it once, and then have everything automated and created for you. That’s what BIML and BIMLScript allows you to do with SSIS development. The cool part about this is, even for SSIS when you start creating your packages, you can create SSIS templates, which makes it really easy to create more packages based on that template and you don’t need to create everything from scratch.

That is a nice feature in SSIS. The problem is when you need to update that template because you can’t just update the template and apply that to all your packages. You would have to go back and change every single one of the packages you created. While in BIML, you can just change one line of code and have that regenerate all your packages and you’re done.

Bill:  For me, my use case and the reason I really do love BIML, I answer a lot of questions on forums. People asking about, “How do I do X in SSIS?’ If you ask the question on Stack Overflow I’m probably going to answer your question, unless it touches Excel, because no one likes to deal with Excel in SSIS.

[laughter]Bill:  But, it used to be if I were giving an answer, not only would I have to write up all the text, then have to take lots of copious screenshots of getting every single thing in there, now I can maybe get the relevant screenshots put out and then say, “Here’s the BIML. Copy‑paste this into your thing,” assuming you’ve installed BIDS Helper, “You can see exactly how I’m showing you to solve your problem”

You’re trying to learn how to do X. Here this simple 15, 20 lines of code, you’ve got a working example. Otherwise I’d have to upload an SSIS package to someplace and you have to download it and trust that this random code I’ve downloaded off the Internet is not going to toast my machine.

For me it’s a wonderful mechanism for sharing. A very concise way of, “How do I solve this problem?’ “Here” Then you can take that pattern and apply it throughout whenever you need to use it.

Cathrine:  That’s a really cool use case, actually.

Carlos:  Very cool. Now, I once had someone say that if you didn’t learn PowerShell within five years, you weren’t going to be a DBA.

[laughter]Carlos:  We’re going to separate the haves from the have‑nots. Is BIML of that same category? From an SSIS developer perspective, not that they won’t exist, because the GUI is not going to go away, but from a being‑good‑at‑what‑you’re‑doing, will BIML be required knowledge?

Cathrine:  I’m not sure it will be required, but it will make your workday a lot simpler, a lot easier, and that’s the great part about this. You don’t even have to tell anyone you’re using BIML to create your SSIS packages.

You can’t tell the difference between a package you created by hand and the one you created in BIML. BIML is, for SSIS packages at least, it’s just a tool to create those packages for you.

There’s this whole thing that when you start learning and using BIML you finish your work for the week before lunch on Monday, and you take a rest of the week off. Just don’t tell your boss you’re doing that. You’re on your, “BIML break,” is what they call it. It’s actually on my T‑shirt as well.

That’s the fun part about BIML. You don’t have to tell anyone you’re using it. It’s not a very big install, nothing like that. If you’re working heavily with SSIS, start learning BIML as well because you can really save up a lot of time to get more focus on the business value instead of just the plumbing of creating the same package over and over and over again.

Bill:  Completely agree with all that. I can also add in, at least, my personal experience. I use the Base.net libraries to create SSIS packages. If I was heavy in the C# realm, maybe that makes sense, but I think most people that are doing SSIS probably aren’t that heavy into it.

I’ve used Easy API to do it. It’s a little bit better than the Base libraries, but it left me wanting. I’ve been doing this for two years. If Microsoft comes out with the coolest new thing tomorrow to help me make my ability to create packages so much simpler, I would totally drop BIML. But I haven’t seen that.

I haven’t seen anything coming that has made me say, “BIML is great, but..” I don’t have any buts yet for my BIML. I don’t know what your experience is, but I still love this thing, and I’m still learning cool things because they keep adding more features into it.

To me, it’s really awesome. I don’t have any things like, “I just hate it,” about this thing. It’s like, “Give me more,” I wish I was doing more things with it.

Cathrine:  Absolutely, and it’s not just for SSIS, as well. You can do all your relational database things. You can get that model. You can get your cubes and dimensions and facts model and you can use it to generate things like T‑SQL scripts, instead of using dynamic SQL, you can just write BIML scripts to generate that for you.

There are so many use cases where you can use BIML when you start digging into the more advanced features, and you can build up a whole toolbox or a tool belt of BIML scripts that you can run ad hoc. I’ve seen examples of things that are super useful for DBAs for example or into the BI world, that they can use one BIML Script to automate things and check things, and change things by using BIML Script.

When you start digging into it it’s like, “I haven’t found any buts yet” It’s just really awesome.

Carlos:  That example, because I was not aware that there were scenarios where DBAs could be using that for automation purposes, for monitoring or maintenance type. Is that all on the bimlscript.com as well? Is that where those examples would be?

Bill:  Think about what you do. You probably need to, “I need to operate on this set of servers” You know what your servers are, “For all these servers, I need to perform task X” Maybe you’re going to use central management server to fire off queries across all those or, “I need to do [inaudible 23:17] ”

If you don’t want to do those things, you can use BIML Script, so you have the ability to run a SQL script, “Cool, I’m going to define a connection and I’m going to process my first server, change that, go to the next server, go to the next server” You can very easily define a package that’s going to go do that task.

Carlos:  Interesting. Where would some of those examples be? Good scenarios to…? [laughs]Bill:  Right here in my head.

Carlos:  In Bill’s head. Very good. We’ll hit up Twitter. If your Twitter feed blows up…

[laughter]Carlos:  …with requests…

Bill:  The best place to start is bimlscript.com. Again, that’s the community resource. If you don’t see what you need on there say, “Hey,” just ask a question, “Does anyone have one of those?’ Then people will come out of the woodwork, because that’s the beautiful thing about the SQL community is people share.

It’s like, “Oh, you want to see how I do it?” “Oh, I don’t do that. I do it a little bit differently.” You’re going to get flooded with people trying to share their knowledge of, “I’ve scratched that itch and here’s how I went about solving that problem” If not, I guarantee Scott will say, “Oh yeah, I can do that.”

It was great. One night, Jamie Thomson asked on Twitter…

Cathrine:  I loved that.

Bill:  …he was like, “How do I do this?” We used to do this tabular data file thing. I spent about an hour mucking around and I was like, “It’s late. I’m tired” Scott went out and live recorded, “Here’s me solving the thing,” did it in like an hour. He’s like, “Yeah, I probably could have done this faster except I changed my mind in the middle of it,” or something like that.

[laughter]Bill:  The owner is so invested in trying. You give him curious problem, he’s a [snaps fingers] super sharp guy. He comes up with things that I’m like, “Oh God, I wish I had a tenth of your brain power.”

Carlos:  Oh, very cool.

Bill:  They are very engaged in promoting the community aspect, the community side of the organization. I have great respect for them.

Carlos:  You’ve almost persuaded me to… [laughs]Bill:  Give up your SQL ways?

Carlos:  …take BIML a little more seriously [laughs] almost. I’ll have to spend a little more time on bimlscript.com. Great. Thanks for being here. I do appreciate it. I think it’s been a great little discussion. There will be, of course, potentially some additional information to what has been presented.

We happen to be here in Portland today, where both of them presented at a, “SQL Saturday” We’re going to provide links to their presentations. It will be in the show notes at SQLdatapartners.com/podcast. That information will be available there, as well, of course with all the links to the things we’ve talk about today.

Before I let you go, and I know we’ve been hot and heavy into BIML today, but one of the things we like to do is that idea of sharing so that people become aware of other tools that are available is to talk about your favorite SQL tool. It’s OK if that’s BIML.

[laughter]Carlos:  But we’re looking for your favorite SQL tool, free tool, paid tool but why you like it and how do you use it?

Cathrine:  My favorite tool is Red Gate SQL Prompt in SSMS. It is a paid tool, but it saves me so much time. I don’t have to write all my SQL scripts by hand. They can save snippets so you can start using them instead of writing your entire code. It has a lot of great features, auto‑formatting, auto‑completion, things like that. That’s the one tool I cannot live without. It’s SQL Prompt.

Just to add to that one, because that’s a paid tool. A free tool, as well, is SQL Search, Red Gate SQL Search that allows you to search through all your databases across different servers. You can search in tables, use functions, whatever, and anything in your database just search for it and you will find it immediately. One paid tool, one free tool.

Carlos:  Very good. Bill?

Bill:  Definitely, I concur with Cathrine that Red Gate SQL Prompt is absolutely the best thing in the whole wide world.

[laughter]Bill:  I taught a class yesterday and I had already installed it on the machine and they’re like, “How are you doing that?’ I’m like, “You paid for this class…

Carlos:  “Funny you should ask”

Bill:  “But go ahead and spend a little bit more, because this tool is the best thing ever.” The efficiency I get out of that thing is well beyond what I paid for it.

The free tool plug I will give will be, if I take off my developer hat and go more towards my trying‑to‑figure‑out‑what‑the‑heck‑is‑going‑wrong hat, I’ve got to give my love for SQL Sentry’s…

Carlos:  Plan Explorer.

Bill:  …Plan Explorer. Thank you. My brain has stopped working.

[laughter]Bill:  Got to love that tool. I don’t use it often enough because, blessedly or unfortunately enough I don’t do enough on that side of stuff, but I really enjoy what I can do with it.

Carlos:  Great tool?

Cathrine:  Yeah.

Carlos:  Great. Again, thanks so much for being here. We do have one last question for you. But before we do that let’s hear one additional way folks can learn about SQL Server.

[commercial break]Carlos:  Cathrine, Bill thanks again for being here. Before we let you go, if you could have one superhero power, what would it be and why would you want it?

Cathrine:  I want to fly.

Carlos:  Fly. That’s mine.

Cathrine:  Yes, that’s my favorite superhero power. I can travel anywhere at any time, and just that freedom of not being stuck on ground.

Carlos:  Yes, I’m with you. Bill?

Bill:  My favorite superhero was always Green Lantern. I want a magic ring that lets me materialize whatever I’m thinking about.

Carlos:  There you go. OK. Very nice. Great. Thanks again for the conversation, and for informing us a little bit more about BIML. Compañeros, we hope you enjoyed the conversation as well, and we’ll see you on the SQL trail.

[music]Children:  SQL Data Partners.

Granting SSIS Package Administration Access to Developers

In many instances there is a development team or even a warehouse team that needs to run SSIS packages and you want to store those packages in SQL Server and run them through the SQL Agent Service.  You may not; however, want to grant those users/group sysadmin rights to the machine.  The following describes my experience in setting this up.

SSIS_DB1First, you must add the user/group to the DCOM Configuration for “Microsoft SQL Server Integration Services 11.0” (SQL 2012), which can be found in the Component Services MMC.  Right click and choose properties.



SSIS_DB2You can now add the groups to two security sections–“Launch and Activation Permissions” and “Access Permissions”.  It was my experience I had to have them in both groups.  I granted full rights in each group.

The final component is to add each user/group to the local group “Distributed COM Users”.  It seems like it won’t even authenticate a user for the groups you set earlier unless they are in the local group.  This will allow users to connect the Integration Services Instance.SSIS_DB3

At this point you should restart the Integration Service services.

If you haven’t already, you can go ahead and create your Catalog Database.  We won’t go over those steps here–perhaps in another post.

The last step is to allow them to deploy and manage the packages on the servers.  I add the group to the SSISDB and grant the role of ssis_admin.  Now your teams should be able to function completely within the SSIS parameters and not have permissions to drop database objects or modify security settings.


Importing or Exporting to Access

I had a client request to export some data to Access 2010 for their downstream use.  I have done this in the past and thought it a no-brainer; however, there is always a little trick–hence the post.  🙂

Because I had already installed Office 2010, I assumed the Microsoft Access Database Engine for 2010 would get me what I needed.  Because I was using a new machine, I had to install the 2007 Office System Driver: Data Connectivity Components.  After I did that, I could see the Microsoft Office 12.0 Access Database Engine OLE DB Provider.  I put in the location of the access db, and I was off and running.AccessEngineProvider

Upgrading SharePoint List Adapter in SSIS

A server upgrade prompted me to migrate some SSIS packages.  These packages were going from SQL 2008R2 to SQL 2008R2, so there was no change there; however, the new server had a different version of the SharePoint List Adapters, which allow you to easily write to and get data from SharePoint Lists.  When I opened the package on the new server, the list adapter was not recognized and I got the following error.

ListAdapter Error“Error 1 Validation error. %Job Name% The component metadata for “component “SharePoint List Destination” (529)” could not be upgraded to the newer version of the component. The PerformUpgrade method failed.   %Job Name% 0 0″ 

The ‘version’ of the adapter was same; however,SharePointListAdapter I was using an updated release as shown here by the release date for the adapter.  Upgrading was straightforward, once I found this post on codeplex.  My experience was going to the package in Windows Explorer and opening the file in Textpad.  Now I guess I should lament here that I didn’t do it in PowerShell, but one thing at a time here!  I then replaced the PublicTokenID as mentioned in the post.  I saved the text file and had to re-open the SSIS project.  There is a newer version of the file itself and one would need to update the Version and the PublicTokenID if upgrading to the latest version.

The last step was to add a SharePoint Credential to the List Adapter as this is a new feature.  I right clicked in my connection manager and choose new connection.  I was then NewSharePointConnectionable to choose SPCRED and enter the appropriate information.  My package was converted to the new server and worked well.