For all of us working with data, we know that we work no only with SQL Server but the suite of tools around the SQL Server engine. In this episode, we share some thoughts around SSIS administration, which is the most popular ETL tool in the Microsoft stack. Just like the SQL Server engine, many times the defaults are taken and in this episode we give some thoughts around logging, package execution, and notification with our guest Ravi Kumar. With the every expanding functionality of SSIS, Steve and I found this discussion insightful as we think about supporting an SSIS environment and ways we can educate package developers.
“That’s one of the things I preach to everybody that, keep your data types as narrow as possible. It’ll make SSIS Package run a lot faster. ”
Listen to Learn
- Why you want to consider SSIS Buffer sizing
- The differences between package deployment vs project deployment
- The benefits of package deployment
- The various logging types for SSIS packages
- Two parameters to tweak for SSISDB cleanup
- The validations steps a package goes through for execution.
About Ravi Kumar
Besides being a former SQL Cruiser, Ravi is the BI architect for LMO advertising in Baltimore Maryland. He is the co-leader of the BI PASS chapter and frequent contributor to SSIS related discussion on the east coast.
SSIS Administration Transcript
Carlos: Okay. Ravi welcome to the program.
Ravi Kumar: Thank you sir.
Carlos: It’s great to have you. One of the things that you’ve been talking and are doing I guess professionally is working quite a bit with SSIS. I guess shame on us. We’ve never actually had a topic on SSIS in the history of the SQL Data Partners podcast. I don’t know why its been that long but I thought that you would be a great person to bring on and talk a little bit about … From the DBA perspective what we need to know about SSIS. Some of those moving parts and maybe some of the catches that people can kind of get to get stuck with. Why don’t you start us off and talk a little bit about, can you give us the overview of SSIS and what people are mostly using it for.
Ravi Kumar: SSIS is an enterprise-level tool that has to do with moving data in an out. It’s an ETL to extract transform and load the data. People have been using it for a while. It’s been around since 2005. Before 2005 back into SQL server 2000 it was called DTS. It became a lot more interactive and enterprise-level 2005 and have been improving it ever since then. Most of the organizations use it for moving there in and out and to load the data warehouses and to do migrations from maybe Oracle to SQL server or some [inaudible 00:02:14] to server or SQL server to some other data base.
Currently how we use it, we have at our organization data coming from probably 15 different sources and it goes out to about 5 different other companies. We ingest all the data, we do manipulations for that. We apply business logic to it and it gets sent out. We have certain processes that bring in data every 5 minutes, 24/7. It could be SSIS, it could be very minor where you have 1 or 2 processes running a month and it could be really highly skilled process, where it’s running 24/7 constantly.
Steve: Okay. Ravi could you share with us a little bit about your background and how you got started and then how you’ve ended up working with SSIS today?
Ravi Kumar: Sure. I started as a SQL database developer back in 2004 I think. Then it slowly moved in to BI world. Business intelligence usually is 3 parts. As reporting, which is SQL server reporting services, our data warehousing which is SQL server and [inaudible 00:03:31] services. Then moving the data ETL, which is a SQL server integration services. After I started with BI, I started with ETL, moved to data warehousing, moved to reporting, moved back to data warehousing and reporting, now I’m back to reporting and integration services again. It’s the ETL part. Currently I’m a BI manager and I manage a team of 3 other developers who exclusive work on SSIS.
Carlos: 1 additional area in thinking about SSIS and some of the ways you can use it there, is it gets baked in. It’s actually kind of baked into SQL server management studio in a sense as well. Even something as simple as I want to export an Excel spreadsheet into a table or move data from one database to another, I can … They have that, you can right-click import export data. There was 1 for import 1 for export. You actually go to the process and that will actually create a little SSIS package that doesn’t get stored anywhere or saved unless you take that option. Folks can actually use that to move data in and out
Ravi Kumar: Again, SSIS comes with a SQL server right? You’ve got a SQL server licensing, standard or enterprise it comes with it. Once you go into that option it actually creates an SSIS package in the background and executes it and then gets rid of the package. You can save that and execute it again and again and again if you wanted to. Actually I use it for the same thing. Let’s say if somebody says “hey, I just want the data from this table to go to this table and I want to do it every week”.
If I try to manually create something like that, it’s going to take me maybe 20 minutes where it says if I go through SSIS and go through the visceral, it could take me 3 minutes and after that I automate it and I’m done. It has it’s place but it’s a really really small part of SSIS.
Carlos: Sure. Fair enough.
Ravi Kumar: Yes.
Carlos: As we start using those safe packages and we want to put them into our processing or into our systems, there now are 2 modes or 2 ways in which we can deploy or save those packages.
Ravi Kumar: Right. Now we’re starting to talk about package deployment vs. project deployment so there was only 1 way to deploy package up until single Server 2008, R2. In 2012, Microsoft introduced Integration Services Catalog. When you’re developing a package and visual studio, usually you have, starting 2012 you can have package or project. Package that is basically 1 unit of subset. It’s basically in the background as an XML file. You can just deploy it by itself and you can use SQL agent to automate just 1 package itself.
There are lots of benefits that you don’t get using package deployment model because it’s just one package. If you are looking at SSIS from a developers point of view, your looking at … Let’s say you need to download one file, put it in your database, do some transformation on it and do an export out of it. If you’re good developer, how you would develop it is you would have 1 package, just download the file. 1 package just to load the data, 1 package just to transform it somehow, 1 package just to export the data into a file.
If you are using a package deployment model you would have … You can still do it but since there’s different different sub sets of it, it just makes us to just to just deploy everything. Let me go over some of the advantages of project deployment. Benefits. First of all, if you’re using package deployment, you’re going to have store passwords somewhere because you going to move from dev to stage to production right? You got to have some sort of account that needs to access different resources, maybe some file systems. You’re going to have to put that password somewhere. What a lot of people did, they use a configuration file up until 2008 and they put that password in plain text within the file which is a big security hole.
Ravi Kumar: Once you deploy to catalog you can have those passwords or those sensitive information into a parameter and then you can change it directly on the server and they can deploy from server to server. Let’s say you deploy from staging to production. The first time you change it on production, it will remember that next time so you never have to change it again. It uses Alias 256 encryption to encrypt everything on the server so it’s a lot less security holes.
Steve: Then in that example for a developer working on the project, they could deploy to a development server or staging server, then whoever you have doing your configuration management or your deploys could then push from that staging server directly to your production server?
Ravi Kumar: Yes.
Steve: Then you know you’re getting exactly what’s on staging and not some files that might have got mixed up along the way.
Ravi Kumar: Exactly.
Ravi Kumar: You’re going to need to make sure that developers, anything that can change from environment to environment, they put that into the parameters. What happens is that you have … Your connection managers, they’re already have parameterized but anything, let’s say the name of a file or location of a file. If that’s going to change we just put that in parameter and then it’ll get changed once you move to the dev to stage your production.
Carlos: Is that exactly a one for one. I almost think of it in very simple terms, almost like an ad hock query versus a store procedure right? Where I can encapsulate some logic and I can do some different things with the parameters that I pass in right? That I can’t just do it, that I have to manually change if I was doing that with an ad hock query right?
Ravi Kumar: Yes. That’s a good analogy. You don’t need to change the code you just change the parameter within that store procedure.
Carlos: Right. Now there is some additional overhead, I’ll use that word overhead or additional items that we need to have for the project deployment. I think one of the things it will notice is that new database in the SSIS DB. I think this is where again in there you’ll see the option to set those parameters that you talked about in changing things from environment to environment.
Ravi Kumar: Right. SSIS DB is a whole other beast and we’ll get to it. Yes, that does need to be managed from same level. That’s one of the benefits of deployment. The other benefit is obviously encryption. That your whole package and everything is encrypted right away. Also one of the huge things that I noticed with project deployment is, when you deploy to the catalog, it automatically creates login for you. Up until 2008, what developers had to do, they had to go in and say “okay this is …” You have to log when the packet’s starting, you have to log when it’s ending.
They kind of know when everything executing. It automatically does that through SSIS DB. When you create the Integration Services Catalog it doesn’t get created automatically on SQL server. Once you create that, it creates the SSIS DB for you. Everything gets logged to the SSIS DB and there are views that it can clear you off of. We actually have a Power VI dashboard for our exercise environment so very clearly exercise DB and we know exactly how everything is running in the background.
Carlos: Interesting. I think that a full out-of-the-box behavior is that you then have to go into the reporting and also I’ve forgotten the name of what that’s called. I think it’s …
Ravi Kumar: That catalog?
Carlos: Is it the catalogs on any SQL server management studio? I go into to the catalog and then I can see my package is there, then I can right-click and view those reports. Like view the execution times or whatever that is.
Ravi Kumar: Yes. You can go on the catalog and you can see the reports. It’ll tell you what failed and it’s really …
Carlos: You’re actually using something on top, you’re not using those default reports?
Ravi Kumar: No. okay I do use those reports when something fails right?
Ravi Kumar: I go in and I say okay what failed then it’ll give me details message but if I want to know from a [inaudible 00:13:26] because you have 20, 30 different packages running, some of them running every 5 minutes, I want to know exactly what’s happening, how many times it’s executed, if certain process is turned off or not, then I have a dashboard as a high level view and I query SSIS DB directly for that. That’s one of the huge benefits of having that login built into the project deployment model.
Steve: Okay. When you talk about that Power VI dashboard query in the database there. Is that something that is specific to your environment? Or is it something that would work with pretty much any SSIS DB?
Ravi Kumar: That will work pretty much with any SSIS DB but currently I have … It’s just the way you have a setup. I have it setup where I have certain ware clause in it and that ware clause directly relates to the projects I have. You could build it in a way that it will work with any Power VI, I mean any SSIS DB because all the views are same on SSIS DB.
Steve: Right. Okay.
Carlos: Other than the jobs failing because sometimes the login can get fairly … Depending on the number of items you have in your package, you can get a quite large … I guess other than package failures, what are you use login for?
Ravi Kumar: Package failure is just execution that it executed. what time is executed so you know what’s actually happening with the package. There are different levels of login within the project deployment model. We’ll cover that in a later section, which is login in itself.
Carlos: Okay. I thought we were there. Okay.
Ravi Kumar: Yes. Since we’re already into it, let’s talk about login since we’re pretty much done with all the benefits. Login automatically comes with it and there 3 levels. There is performance, basic and [inaudible 00:15:43]. In my shop since we have so many packages running. I’ve run it on basic which is normal, which is default. I’m sorry I said that backwards. Okay, I’m going to redo that. There are 3 different login levels, performance, basic and [inaudible 00:16:02].
Performance is the leanest. That’s what you would prefer your production server to be on. Performance only logs your warnings and failures. Basic, it logs more than performance. It logs when a package is starting and then a task within a package is starting and when it’s failing and what happened with it.[inaudible 00:16:26] logs everything. They even logs the role counts of what’s coming through. In 2016, the latest version, they introduced 2 more login levels which are Run time Lineage and Custom. Not going to do much into it but Custom I am excited about because you can customize what exactly log out of the box.
My preferences, keep your production on performance and your dev on verbos and your staging on basic. SSIS DB, if you’re running packages constantly, it creates a lot of login and your SSIS DB usage goes high. It does create a lot of data so even with me being on performance and only 15 days of data, it does create … My SSIS DB is constantly around 45 gigs.
Ravi Kumar: 15 gigs days. Go ahead.
Steve: Then is that 45 gigs for looking out there, is that something that you have to manually clean Up over time or does SSIS clean it up automatically for you?
Ravi Kumar: That’s another part of this. When you create SSIS Integration Service Catalog, it creates 2 jobs for you within SQL agent and it automatically schedule it to run at 12 am. What that job does it cleans SSIS Db before for you. If you go to integration catalog, right-click on it and go to properties. Within the properties you are going to be able to see different different things. There are 2 properties that you should change. One is the login history, how long you want to keep the login history for. The default is 30 days. In production environment I don’t recommend 30 days because do you really need to look back 30 days?
In production what I use it for is make sure something ran, if it failed. If it failed I need to go look at it so I only kept it for 15 days. If you have only 1 or 2 packages running everyday, it’s not going to create a lot of data so you might be okay with 30 days or even a couple of months.
Steve: Really depends on the volume it sounds like.
Ravi Kumar: Yes. It really depends on the volume and how much data or how much space you have.
Ravi Kumar: The second property I like to change on there, it’s called how many versions of the package you want to keep. Once you deploy to catalog, it actually saves versions for you. Say it’s up to 10 versions, I have it set to 20 versions just in case. Once you change and what you can do, you can automatically go to that project. Let’s say you just deployed something and it starts breaking. You can easily go on let’s say on versions, click on the version and you can revert it back, go to your production and you’re done and you tell developers “hey, something broke fix it”.
Steve: With that, the version of the projects does that roll back the entire project or just specific packages?
Ravi Kumar: It rolls back the entire project.
Steve: So it’s in 1 step you just undo that last deploy effectively or you could even go back.
Ravi Kumar: Exactly.
Steve: 1 or 2 or further if you needed to?
Ravi Kumar: Yes. Up until 2014, you could only deploy the whole project. In 2016, Microsoft gave 1 more option of package deployment where even your project deployment, you can just deploy 1 package to that catalog. Even if you only change 1 package. Up until 2014, you were required to deploy the whole package.
Carlos: Okay. Got you. I can have a nice new feature then just have to only have to deal with the one if you have lots of packages.
Ravi Kumar: Yes. Let’s say if you have a project and you have 60 packages within the project, people are saying they don’t want to deploy the whole thing. It just takes too long.
Carlos: Got you.
Ravi Kumar: Yes. If you use versioning it goes back to that previous version.
Carlos: One question I had on the login, I ask because I was met with kind of blank stares. We could talk a little about execution and performance because it ties in. One of the things that it will log. If you’ve used SSIS let’s just say you have a source, table, let’s just say it has 10 columns. Then your destination only has 5. You’ve selected all 10 of the source that SSIS will put in the login, I’ve seen that little warning. It says “hey dummy this column is not being used”. You may want to exclude that from the package so that you’re not using as many resources to do this.
Ravi Kumar: Yes.
Carlos: One time when we were … Just like everything else SSIS packages sometimes experience performance issues. They want us to take a look to say “hey, will you take a peak at this and see if you can fix it?” When I pointed that out, nothing seemed to change. I guess my question is, should we be taking that warning with a little bit more severity or is that truly just like ehh, forget about it, just another record?
Ravi Kumar: You don’t need to but it’s not good that development practice because what happens, let’s say developer’s developing that’s. It’s using their system to bring in all the 10 columns into the memory and only write 5 but once you deploy to SSIS, SSIS is smart enough saying “okay I’m not using these columns. Why do I need to bring the data into the memory” and it just ignores it.
Carlos: The error warning really is then a red herring?
Ravi Kumar: Yes.
Carlos: I see that in the error messages says “hey, you selected it”. I’m actually not impacting my SSIS packages by it?
Ravi Kumar: Yes. It’s more of what development annoyance and developers shouldn’t be doing this anyways. 1 other parts I do is SSIS internals. This is one of the things I stress and exercise internals is that, don’t bring in these columns because it’s just going to hinder your development. Let’s say one of the columns says a blob, it’s going to slow down your packets considerably.
Steve: Just to confirm with that. If that column is a blob, some type, you’re going to bring that back to the SSIS server. It’s going to see the data and ignore it?
Ravi Kumar: No. What SSIS is going to do, when you’re developing it, it’s going to bring it but it’s going to bring it to your machine in development. Once you deploy to SSIS server, SSIS server is going to say wait I don’t need to use this data. It’s just going to ignore it and not bring that data into the memory.
Carlos: I think we should be more specific even. When I have to run it, when I run the package locally. I’m in SQL server manager studio or visual studio and I hit the run button I’m going to have the effect of having this extra columns hit my memory buffer.
Ravi Kumar: Right.
Carlos: When I save the package out and have it execute through an automated process, the process of saving the package out to the server if you will, that’s when the columns get ripped out of the package?
Ravi Kumar: Correct. Steve let’s say if you open visual Studio on a server, your production server if you want to, that blob is there and you execute on visual studio, that data will be brought into the memory.
Carlos: A double whammy, you’re on the production server and then you running packages that you shouldn’t be. Yikes!
Ravi Kumar: Exactly.
Steve: When it’s running that mode then it sounds like it’s very similar to the age old debate over select star versus selecting explicit columns. SSIS is smart enough when it’s running in production mode to rip that out and strip it down to what it needs it sounds like. I think that answers my question. Thank you.
Ravi Kumar: Okay.
Carlos: I guess let’s go in and talk a little bit about that, about packages execution. We’ve talked a little bit about setting up parameters, we can change the context under who it executes as, what connections it makes, things like that. Other considerations when we think about executing the packages?
Ravi Kumar: As far as executions from the point of view I usually suggest, we’re going to tackle it 2 ways. First is that what do you need in order to execute it and what does when it says go time. It’s a little bit about internals. We use SSIS … You deploy to SSIS catalog then you have to execute the packages using SQL agent. What I suggested that you create a credential with it, so you have an active directory account and have an active directory account for each business area.
Let’s say if you are executing package for sales or if you’re executing a package support finance, they all need to have different resources. I would suggest, for part business area, have execution account for SSIS and create a credential for it and give it proper access to any file shares or any servers that you need. Then create a proxy with SQL agent for that credential and then It’ll be available under run as account when you go do a schedule job.
That way SSIS package would run a lot smoothly. Many times, everybody runs into this problem. They deploy a package and they’re saying “okay wait. It can’t execute” because well your executing it as SQL agent service account.
Carlos: Sure. They made it elevated privileges to get out, attach the files or other systems things like that. If your SQL agent account won’t do that, that’s when the proxy account will come into play.
Ravi Kumar: Right. You shouldn’t be giving SQL agent account access to everything.
Steve: I’ve seen that done where because it’s running as a SQL agent they just give that agent account access to everything. Then you’ve got a huge security hole. The proxy option seems like a much better way to do it.
Ravi Kumar: That’s how we have a setup at our … We have a part business area. We have 1 active directory account and it has access to those particular resources. You create the credential and now it’s time to execute. kind of what SSIS does in the background, as soon as you hit execute, it loads everything into the memory. With SSIS, everything is about memory. It reads XML, it decrypts the XML. Here’s 1 very interesting thing, when it’s loading it, it checks. Let’s say if you’re running all the like SSIS 2008 package on SQL server 2012. It’ll check the version and automatically upgrade it and load it into the memory without telling you anything. Go ahead.
Carlos: I remember this. The version of the execution of the container might play a role in changes to your package.
Ravi Kumar: Yes. It doesn’t tell you that there’s a problem. It just upgrades it and drags it. That’s it.
Steve: That upgrade step, does it take a while? Does that slow things down or is it a pretty quick thing compared to the rest of it?
Ravi Kumar: It’s pretty quickly for the most part but if you have a package that’s like 22,000 line and somebody made one of those into one package to rule them all. Then it could take a while but I haven’t seen many problems with that and then it …
Steve: To get around that then you’re probably best off using the same version to edit the packages with as you are when you deploy it, where you’re deploying them to.
Ravi Kumar: Correct.
Ravi Kumar: Then you apply any configuration on any expression. This is where your configurations comes in play, if you have any. Remember we talked about parameters, that if you change from server to server you have different parameters? The next step is, it loads a parameter and then it says “okay just go validate yourself”. The packet goes through validation so you must have seen it in the past that a package fills that says it couldn’t validate and then everybody’s wondering “what data got executed? I don’t know”. Well, nothing got executed because it was just at a validation step. It never read any data at all.
Steve: Then it’s validated itself and is that where it starts to process the data next?
Ravi Kumar: Yes. Once it’s done validation, that’s when it says execute action time. That’s when it goes into execution. That’s kind of how the package gets started. Once it’s running, that’s when we start getting into buffers of how the data is brought in and how much memory it’s taking, things like that.
Carlos: What happens to the common issues that you’re seeing there. [inaudible 00:31:54] at this point we’re talking about performance. How much work can I do in as little time as possible.
Ravi Kumar: Sometimes validation does take long but everything needs to get validated. One of the issues I’ve seen there is that sometimes the package needs to create a file half way through before another component can access it. The file is not there, it’s going to fail validation and it’s going to say “the file is not there” because package hasn’t created it. There’s a property that you always need to say it’s delayed validation. That’s one of the got you’s I’ve seen many times, where they wouldn’t set that property and the dev they would never be able to figure out why it’s failing.
Carlos: I see.
Ravi Kumar: It can kind of go and load. Steve, any questions?
Steve: No. I think that makes a lot from the experiences I’ve had, understanding how those steps go through now.
Ravi Kumar: Then SSIS goes and does it’s thing and then it moves to … One other big thing is data flow. Most data flow and said okay now I’m going to bring in some data. This is where we kind of get a little bit of performance tuning and we start talking about buffers. There different kind of buffers, physical buffers, virtual private flap but what it basically comes down to it is, it’s taking up memory on to your server. We touched on this a little bit when you were talking about if you have 10 columns and you’re only bringing in 5 columns.
This is one of the examples for that. Also, if you want package to run really really quickly, use proper data types. This is when you can go in and tell developers that “you’re bringing in data as [inaudible 00:34:06] 255 when you’re only loading 2 digit estate code. It should be charged to. What SSIS does, if you’re reading it as [inaudible 00:34:21] 255, your data type is that then it’s going to take up 255 bytes onto your buffer. It creates a lot of overhead and it reserves that memory for itself.
Steve: It sounds like it will take up the maximum size of those variable columns?
Ravi Kumar: Exactly.
Carlos: Then for each record that comes in, it will take up that size?
Ravi Kumar: Exactly. It can inflate your data a lot.
Carlos: I think it’s so easy to overlook that because it happens a lot even in stored procedures. You create a temp table of [inaudible 00:35:05] or an [inaudible 00:35:06] and then you’re comparing or updating the opposite data type. You get a data type mis-match on the engine site then of course the engine has to do that conversion for you. It doesn’t do that very efficiently. The SSIS site we have that same kind of problem except now we’re actually going to get hit a little bit harder potentially with memory allocations.
Ravi Kumar: Right. Just imagine if you only have 10 columns and you have [inaudible 00:35:38] 255, how much data it will take even though you don’t have that much data.
Ravi Kumar: That’s one of the things I preach to everybody that, keep your data types as narrow as possible. It’ll make SSIS Package run a lot faster. This problem gets inflated a lot more later on when I start talking about buffer sizing. Within 2016, there is a new property call Auto Size Buffers. Before that you could tune the buffers if you wanted to and there’s a whole thing on … On internet you can read about tuning the buffer sizes but it comes down to usually each buffer size to about 10 max.
You can increase it too if you want but it’s a balancing act. What it does, let’s say you can only put 10 megabytes in 1 buffer at a time, while the pack is going to run for a longer time because it can only fit so many rows in 10 megabytes. It’s going to read the row and it’s going to apply transformations and it’s going to write the rows. It’s got to do that again. If you can fit more rows into the memory at one point, at one time then it’s going to go, your package is going to run a lot faster.
That property could be changed but there’s another caveat to it. If you increase it way too much and it’s going to say “well, I don’t have that much memory” it’s going to start driving to the disk. That’s when things slow down considerably. There is a performance counter that you can use. It’s called catalog.DM underscore execution underscore performance underscore counters. DM execution performance counter and they’ll tell you if the data is spilling to the disk. That’s one of the things you don’t want SSIS to be spilling to the disk, especially if you have a SQL server and you have 64 gigs on it, 45 gigs allocated to SQL server, the rest is for the OS.
Your SSIS doesn’t really have a lot of memory to execute and it’s going to start writing to the disk.
Carlos: I was going to bring that up as well in the sense that then you need to start taking a peek at what your SQL server wants to read for memory as well
Ravi Kumar: Exactly.
Carlos: As well as SSIS, I think a lot of times at least in the beginning stages they’re the same box because there’s no licensing implication to make that happen. Then all of a sudden, you could have a lot of contention at the same time and kind of have to have a marriage there, decide who’s going to go get the main resources while those processes are happening? They’ll step on each others toes.
Ravi Kumar: Exactly. SSIS can take up a lot of memory one time because we know about execution parts but SSIS also has, it’s what’s called execution tree. Within one execution tree you can have up to 5 buffers and within one data flow you can have multiple execution tree, executing at the same time. It could be 5 buffers x 5 or 10. That could easily start hogging up your server. There is a property that I usually suggest my DBA’s to have the developers change it. It’s called Blob Temp Storage Path and Buffer Temp Storage Path. SSIS doesn’t handle blob data really well and most time it ends up writing it to the disk.
What I suggest is you DBA’s have some sort of fast disk preferably a SSD on SQL server. Have this path, Blob Temp Storage path and Buffer Temp Storage Path mapped to that drive because needs to done be done with the data flow or it could be written into a configuration property as well if you want it to. Then you can have that path be directed to the fast disk. It’ll start writing there. Sometimes you want to be able to figure out that okay SSIS is taking up too much memory, it’s writing to the desk and you will see drivers starting to fill. It’s a nightmare for a lot of DVA’s
Carlos: It should clean up after itself. You’re not going to see the implications. You might see that like the warning come through but then after the package is finished those files will get thrown away.
Ravi Kumar: Right. There’s 2 caveats to it. What if your C drive gets full and you don’t have that much memory on it. The second thing is that it’s going to be slow. Unless you’re writing it to SSD. That’s one of the things that such as developers to change those 2 particular pack.
Steve: It sounds like at a minimum you want to make sure that you get those off of the C drive, so that you’re not going to take the system down if it runs out of space and while you’re doing that you should get them on fast storage like an SSD as well to get the best performance out of it?
Ravi Kumar: Correct.
Steve: Very similar to the temp DB type recommendations that we have with SQL server as well.
Ravi Kumar: Yes. Very similar.
Steve: Okay. Great.
Carlos: As we’re kind of wrapping up here Ravi. You mentioned a couple different flags. We’ve talked about different scenarios and ways people can take a peek at their environment. I guess last thoughts about best practices or things that … I’ll start, I’m going to say that again. As we wrap up here we’ve talked about different parameters, ways that administrators can take a look at their environments. Other best practices or things that the DBA’s can review on their SSIS environments to make sure they’re not shooting themselves in the foot.
Ravi Kumar: Carlos that is a great question and I do have an answer for it. I’ve put together a small list of things I’ve learned over time for SSIS. Number 1. App developers make multiple packages. Don’t let them do 1 ring to rule them all, that practice. Not 1 package do everything. You have 1 package do 1 single thing so they can have about 10, 20, 30 different packages within a project.
Carlos: Now I’m going to have to stop you because then this idea of chaining. I’ve even seen when that doesn’t happen, I feel like the natural instinct is that they break that up into jobs. I’ve actually seen environments where like well these set of packages, we expected to take an hour. We’re going to start that at 1 am. Then these next set of packages will just schedule at 2 am because we have to with for the first ones to start.
Then there’s that day, things are running slow, something happens you have [inaudible 00:43:26- 00:43:27]. Run a little bit long until my 1 am job goes until 2:15. Then that second wave ends because of the “oh hey was it ready?” Then I’m going to get yelled at, jobs fail, the warehouse isn’t loaded and now all of a sudden everybody is like “Ravi!”
Ravi Kumar: I’m going to finish what I was saying Carlos. Thanks for interrupting me. Once they have multiple packages, then you have them 1 package that executes the rest of the packages.
Carlos: That sounds like the one package to rule them all.
Ravi Kumar: No. One package to rule them all is when you have 1 package to do everything. Here you have multiple packages do 1 same thing but you have 1 package to execute rest of the packages. There a couple of benefits to it. One of the benefits is you only need to schedule 1 package. Let’s say you deploy to your catalog and you just execute 1 packet and it will execute rest of the packages. The second huge thing about this is that you can reorganize the packages that how they are executed.
You can take advantage of parallelism like crazy. There are certain packages that are dependent on each other, there certain packages who are not depending on each other. You can have packages that say “these packages I’m going to execute parallelry”. They’ll execute at the same time. Then you can have what’s called presidence constraints and then you can have all those packages connecting to one package that is finished or all those package needs to be finished before moving on. You can paralyze it.
Let’s say you deploy it and you say “well it’s taking up too to much resources and we cannot have parallel. All we need to do is you go in 1 package and you rearrange how everything executed. Now it’s not taking up so many resources any more.
Steve: Interesting. That 1 main package, it just knows what set of parallel tasks can run. Then you can use that to really throttle what’s going on to get the best use or not overload your system at that point?
Ravi Kumar: Right. I’m going to use an example of one of my friends. I know him through our local user groups, he works for Pandora jewelry in Baltimore. They have a huge data warehouse that runs around 2 terabytes. They had one package just like this for execution and it was just bringing the server to the knees and just stuff would just stop. It was taking so much resources. All they did, they just put it in 1 package, re arranged it so things we’re not so much paralyzed, redeployed it, took care of the problem. It makes things very flexible.
Steve: I like that and I think I will certainly use that next time I’m working with SSIS.
Ravi Kumar: Awesome. Also sometimes DB’s are not sure what SSIS package is doing. They’re like okay, “I’m not sure how it’s running into my system”. Well if you don’t know how it’s running into the background and what it’s doing on the database, run SQL profiler. I’ve had to do that many times before but somebody deployed something and I’m like “okay I don’t know what it’s doing”. Instead of going into the package it was like okay “let me look at it, what it’s doing on the server side” which makes it a lot quicker. Obviously you can open a package and figure out what it’s doing but that’s going to take a lot longer.
Steve: I can definitely see how the profiler would really simplify figuring out what the package is doing because the package can certainly be confusing if it’s your first time looking at it.
Ravi Kumar: Yes. Also implement …
Carlos: I would think, that seems [inaudible 00:48:02] to me. I like the visual of the package. It at least will point me in the right direction. The concern I would have with the profiler is being able to window that down. Is there like an event class or application name I guess that we would just filter on to make sure that it’s only SSIS packages that get in there?
Ravi Kumar: I think it’s the user that executes it. If you have one of the user executing it, then it’ll login that user and you can filter by the user.
Carlos: Then it should be our proxy account?
Ravi Kumar: It should be a proxy account but if you’re just, let’s say you are, you’re not sure what you doing in your development. Somebody wants to know what it’s doing in development. Usually I don’t try to run profiler on production but this is one of the tips and tricks for development, is that if you want to know what it’s doing, you user who’s executing or maybe you’re executing it at 1 point, you can see what its doing in the background.
There certain things that sometimes happen in SSIS that are hidden in expressions. Different different expressions that it wouldn’t be apparent to you why exactly it’s doing that. Sometimes you need to go in the background to see what it’s doing. SSIS by nature is that you can program it away so that everything will be hidden and you want to know what’s going on. It really depends on the developer, how they developed it. This has come in handy before.
I’m like okay, I have looked through everything and I don’t know why it’s happening this way. Let me go look at the profiler. It’s more of a development tool than anything or troubleshooting. Moving onto the next HR is bad you can pray everything was going okay I have everything and I don’t know why it’s happening this way let me look at the profile development troubleshooting. Moving onto the next point, 1 of the things I have found huge is implementing standards within SSIS. Your naming standards or how you do things. One of the things earlier I talk about is having multiple packages for a project. That could be in a standard having naming conventions for standards.
That helps a lot. Also SSIS doesn’t do everything out of the box. Something as simple as zipping up a file, it doesn’t do that. It doesn’t have a task for it. I came across many times to developers “oh I’m a developer. I can just build that task”. They’ll go spend days on trying to build a task then US DBA might have to run like EXC on your server, and have to be able to deploy in on every server just do that one thing. It takes development costs really high and it’s a pain in the butt trying to deploy it.
It’s better to buy like a standard SSIS extensions so they’re provided by … One is task factory from Progmatic works. Other is COZYROC. Those 2 connectors, install it every server and you’re done. It covers almost everything that they can think of and you know it’s been tested by the company.
Carlos: Full disclosure COZYROC has been a previous sponsor of this podcast. We’re familiar with those guys.
Ravi Kumar: COZYROC tend to be a bit cheaper than Task Factory but at our work we use both of them because I like the Salesforce connector for COZYROC but there are other connectors that are TASK factory that I really like. I have both of them and it pays for itself over time.
Carlos: Ravi’s just rolling in the dough. Over there in the advertising.
Ravi Kumar: The last point I have is for DVA’s is how you annoy the BI team. What you do is you create email group for BI team. For us we have [email protected] It could be [email protected] whatever. Have an email group for them then have that email group as database mail also. Any time a package fails, it sends everybody an email like “hey this package failed”. Have that email go to the BI manager also so everybody freaks out that something failed and production.
Carlos: What would you have us do instead Ravi?
Steve: I think what you’re saying is do that so that there’s the peer pressure on that team to go fix it.
Carlos: I thought don’t send that email. I see.
Ravi Kumar: Make sure you have a database group for the BI team. Everybody is responsible for their own task. If something failed, they need to know that something failed. It’s better for them also because if some of their reports are relying on that process to run, they don’t want to send that email automatically to the business when it’s the wrong email. They need to be notified that something has failed or not.
Steve: Basically expose all the success or failure of the jobs the entire team so that everyone knows what’s happening and what needs to be fixed?
Ravi Kumar: Exactly. If it’s a success don’t worry about it. If it’s a failure, everybody needs to know.
Carlos: I feel like you have a little bit of culture potentially in there. Who fixes that might vary but at least having everybody on board I can say that would be helpful. Ultimately you do want, I think particularly in the data warehouse environments, if those jobs do fail, you’re going to want to engage those guys because they’re going to be the ones. Hopefully, they’ll be fixing the packages for you. Very good. Should we do SQL family?
Steve: Yes. Let’s do it. A new question I think we’ve added this week is on SQL saturdays. Where was the very first SQL Saturday that you’ve attended?
Ravi Kumar: It was actually Richmond, was the first SQL Saturday I ever attended.
Carlos: [inaudible 00:54:35]Ravi Kumar: It was funny. I have story about that. I went on SQL cruise not knowing that there’s a huge SQL server community there. Why I went on SQL cruise is because I heard Chris Bell, I met him one time at a SQL user group and he mentioned SQL cruise training, that stuck. I talked to him, he convinced me to go to SQL cruise. Then I learned all about how big deal SQL Saturdays were. We went on cruise in January and SQL Saturday was coming up in March in Richmond. That was the first one I attended.
Carlos: Very nice. Technology is constantly changing. We’ve talked a little bit about you and some of the 2016 features that have changed in SSIS. How do you normally keep up with technology?
Ravi Kumar: SQL saturdays, Path summit and if there’s a problem, I try to go find the answer online and Twitter. I usually follow all the people who are leaders and great in technology who tweet about these things. Even the SQL family then I kind of keep watch on it, what’s coming in, what’s not coming.
Steve: Okay. Great. If you could change 1 thing about SQL server, what would it be?
Ravi Kumar: If I can change 1 thing about SQL server what would it be? I wish they would increase the SQL Express database size from 10 gigs to at least 50 gigs. That includes SQL agent.
Steve: They have that. It’s called standard edition right?
Carlos: There you go. We’ll see what happens with the SQL server on Linux. Maybe you can get your gigs that way.
Ravi Kumar: We’ll see what happens.
Carlos: Bets piece of career advice you’ve received?
Ravi Kumar: Best piece of career advice I’ve received? Keep your resume’ updated and go interview at least once a year even if you don’t have intentions of taking a job. Just to know how the job market is out there.
Steve: Interesting. That’s the first time I’ve heard that one. I like that. If you could have 1 super hero power, what would it be and why would you want it?
Ravi Kumar: That’s a tough one. What’s the 1 super hero power I can have and why would I want it?
Steve: Just 1.
Ravi Kumar: I think, does Dr. Manhattan counts as superhero?
Steve: I don’t know.
Carlos: Yes. We’ll count him.
Ravi Kumar: Dr. Manhattan is this blue guy from the watchman. He’s basically a God. He can do anything by … He can close his eyes and in a blink he’ll be on Mars. I would like to have his super powers so I can visit Mars and any other planet. Be next to the sun and not get burned.
Carlos: There you go. You want to explore space?
Ravi Kumar: Explore space.
Carlos: Well, Ravi thanks so much for being with us on the program.
Ravi Kumar: Thanks for inviting me guys.