Episode 66: The Microsoft Professional Program

Compañeros! Welcome back to the SQL Trail. In episode 66 I talk with Microsoft Training Author Graeme Malcolm. After talking with Patrick Thomas in episode 64, I was introduced to Microsoft’s newest learning curriculum, the Microsoft academy, and their first offering in data science.  Microsoft consulted data scientists along with several companies to identify the core skills they need to be successful. A curriculum was developed to teach these functional and technical skills, combining highly rated online courses with hands-on labs, concluding in a final capstone project. This episode talks about the new program and what is next for Microsoft learning.

Episode 66 Quote

“But there is a generation of people who we find learn differently. They prefer to use massive online open courses or MOOC’s as we call them. The like to do real deep, long term study in their own time. Rather than study for just one exam at a time, they kind of what to take on a whole program and do that online. So we’re trying to reach that millennial audience.” – Graeme Malcolm

Listen to Learn…

  • Why Graeme believes education and certifications are changing
  • How the Microsoft EdX program works (is it free? Is it paid? What’s included?)
  • What you’ll learn in the year-long education program
  • What MIT and Microsoft have to do with MOOCs and learning data science
  • What’s next for Microsoft and EdX

About Graeme Malcolm

Professional ProgramGraeme Malcolm is a SQL Server trainer, specialist, blogger, and author. He’s worked with SQL Server since version 4.2. Graeme was one of the first Microsoft Certified Solutions Experts (MCSE) in SQL Server 2012 Business Intelligence in the world, and was the lead author for Microsoft Learning’s entire BI curriculum for SQL Server 2012. Follow him on Twitter @Graeme_Malcom.

Resources

Connect with Graeme on LinkedIn
@Graeme_Malcom
Microsoft Academy Data Science Program
[Video] Updating your Database Management Skills to Microsoft SQL Server 2014 : (01) Making the Most of Memory
EdX Courses taught by Graeme (EdX Page)
[Book] Programming Microsoft® SQL Server™ 2000 with XML (Developer Reference) 2nd ed. Edition
Graeme’s YouTube Channel

Transcript

Carlos Chacon: Well Graeme welcome to the program.

Graeme Malcolm: Thanks very much its great to be here.

Carlos Chacon: All the way from UK, although recently relocated to Seattle.

Graeme Malcolm: That’s right. Yeah, you’ll detect the Scottish brogue in my accent but I have traded one rainy city in Edinburgh for another one in Seattle.

Carlos Chacon: Hopefully its good comparison.

Graeme Malcolm: Oh yeah definitely. Loving it. The coffee’s much better.

Carlos Chacon: Very nice. So ultimately you described yourself as a content developer for Microsoft. Part of the reason we’re talking today, earlier we were talking with Patrick Thomas. We are here at the IGNITE Conference. And one of the new programs of Microsoft that is rolling out is the professional program. And so I guess tell us a little bit about that and what the content developer does.

Graeme Malcolm: Okay. So that was two questions in one. So we will get your [inaudible 00:04:20] for money. So first of all what a content developer does. My original background was in training. I was an MCT for many years. I then worked with a vendor company for Microsoft creating trading materials. It was an obvious leap to go from delivering training to creating training and then creating any other sort of learning content that helps people learn the technology. So these days my job is primarily to come up with curriculum design for a set of courses that we are going to build and then manage the creation of those courses in the labs and all the things that go into them. So that’s me.

The MCP program, which is probably more interesting to people. The thinking behind this was we recognized that there’s a kind of new generation of learners coming through. We’ve had the established MCP program for many years. I’m sure lots of people will be familiar with that and [inaudible 00:05:14] and the certifications there. Its great. Employers love it. Employees love it. Its a great program for people who want to learn product skills and take them into their role. But there is a generation of people who we find learn differently. They prefer to use massive online open courses or MOOC’s as we call them. The like to do real deep, long term study in their own time. Rather than study for just one exam at a time, they kind of what to take on a whole program and do that online. So we’re trying to reach that millennial audience. We’re just trying something new to see what we can do that can please to those needs and help people learn skills that will get them into a role rather than maintaining the product skills for already being enrolled.

Carlos Chacon: Sure. So one of the programs that they’ve developed is on data science. So I guess tell us about the decision to start there and what the status of that current course is?

 Graeme Malcolm: Sure. So the data science curriculum for the professional project. Professional program rather than professional project, was… we considered a whole number of different areas. Obviously we create curriculum and all sorts of technical eras. Data science is just one of these eras, just know that it’s a very hot topic among people in the industry. There is a huge gap in the skills for that. We have all sorts of figures. Companies looking for data scientists and people with analytical skills and just not being able to find them. And when we started to speak to academic institutions who are out there. We have been pondering with Harvard on their CS curriculum, for example, through edX. They were finding, yes they were getting students through and that’s great but there’s still a huge gap. A huge opportunity to get lots of people through. So we felt as an organization, Microsoft is data lead. We have huge in house expertise in data science. So it kind of made sense that it would be a good starting point to build out those new curriculum.

Carlos Chacon: So I think we’ll get into the nuts and bolts a little bit of the program, but its interesting that you bring up working with a university. The program is designed to be six months long. That’s a long time. What’s the difference between this course and like a college course?

Graeme Malcolm: So in some respects its quite similar and obviously the nature of college courses is changing anyways as they embrace these new approaches and then a lot of universities are now offering online courses through MOOC’s. We’re not the only people doing that. And there is a certain amount of overlap in the approach we’ve taken. We’ve tried to be more conceptual in the skills that we are teaching. So we are more open to talking about technologies other than Microsoft technologies fore example. We obviously do teach Microsoft technologies but we also look at things like R and Python and Spark and then all sorts of stuff that are there in the open source area and we talk about the underlying concepts so you know, you never have thought that a couple of years ago “Hey I want Microsoft to learn statistics.” But now that’s part of the program. So we are trying to embrace that more kind of holistic approach to what are the skills you need. Not just the specific Microsoft product skills but what are the more general skills that you need to succeed in that particular area.

Carlos Chacon: I think it is an interesting idea. Kind of the new Microsoft, if you will, embracing that idea that in the course you will review statistics. There is no underlying Microsoft technology for that but ultimately as an overall need, you have to have that base in order to make the science part of it work.

 Graeme Malcolm: Yeah. Absolutely and I think as we go forward with this program we are learning all sorts of lessons about what we can do well in terms of producing content on areas that we have subject matter expertise in and areas where perhaps we want to partner with other people. So for example, in the data science curriculum we’ve partnered with Columbia University, who [inaudible]. And we are using that as part of our curriculum. So we are open to these kind of partnerships in areas where perhaps that someone who has a great story for that particular subject, you know, we can bring that into the fold.

Carlos Chacon: One of the other differences with this program and for example the MCP, which is really, you pass the exam and you become the MCP is that so where there is a charge for the exam. These courses, there is a charge for the course and that gives you access and then you can get access to all the modules. And also there is some additional, the way it was described was some group discussions, or even like a mentoring type attribute. Do you want to talk a little bit about that?

Graeme Malcolm:  Sure. Yeah. Well first of all in the pricing the courses themselves are actually free. You can order all the content for free. The payment is required for a verified certificate. So if you want you can complete all the training, get all the knowledge out of them that you want and that’s fine, there is no charge for that. To get the program certificate for each course you have to get a verified certificate because we have to know you are who you say you are and there is a fee for getting that certificate. So that’s how that works. Then the fees that are charged, you compare that to any of the other open universities out there its generally pretty low. \

The other part of your question in the terms of the mentoring and the forums and what’s actually in the courses. We’ve designed them to try and have an element of community. So every course has a discussion forum and we encourage students to share answers. Well not share answers but kind of thoughts about how they are approaching the problems in there. We have teaching assistants that kind of monitor those forums and will look for students who are perhaps finding one particular thing difficult and trying to help them through that. So there is an element of… although its not direct live teaching there is at least somewhere you can go to get help and either from your peers or from teaching assistants.

Carlos Chacon: So on that community discussions, one of the, I’ll say issues, that you might have with an open forum like that is that you’ll then have varying levels and so lets just say there’s a group. They’ve been in there for three of the six months. I start in as a newbie, I may feel left behind because people are talking about content three months down the road. Is there a closed, or like is there a set open and closed date or is this continually you can just go and sign up when you want?

Graeme Malcolm: Right. So each course. We start with the concept of kind like an academic year. So from July. We start the first of July to the 30th of June. That’s our academic year. So at the moment we are in the 2016-2017 year. Each of the courses runs four times throughout that year. And it runs for three months at a time. So basically it runs for a quarter. You sign up. You have to complete… whichever courses you are taking in that quarter you have to complete with in that quarter. If you don’t or you don’t get a passing grade, you are welcome to retake it the next quarter but you retake it from the start, you don’t carry credit across. If you see what I mean. So that’s kind of how it works.

So the longest you will ever be in the one instance of a course, one run of a course, will be three months. Now even within that we get the varied level from day one. You will get varied level of knowledge. What we find and what I think one of the strengths of the MOOC approach, particularly from a millennial audience who are used to doing this, people tend to help one another. Its kind of reaffirming. We see the students that come in there, the ones that do have the knowledge will try help the ones who don’t. Even without us interceding. Its a pretty nice, friendly, safe environment for people to learn.

Carlos Chacon: Yeah I think that’s key, getting a community that is willing to share, makes that experience so much better to be able to do that. So I guess I will talk about the nuts and bolts of the training. Its kind of broken down into four sections. So fundamentals, at least in the data science course, which is the only course that is currently available. You’ve got the fundamentals, the core data science, applied data science, and then a project on the end. The first three make sense to, you got to divide different content and topics and kind of spread those out. I guess any other additional thoughts there and then maybe lets jump into the project, with what that details.

Graeme Malcolm: Sure. So the units that we’ve broken it into are really just kind of organizational units. You know its not like you complete unit one and then something happens. Its just a good way of spitting up the content. What we kind of found when we researched and we did a lot of research. We spoke to a lot of data scientists who were enrolled and really quizzed them on, “Hey what are the skills that you need to do to your job? What is it you do day to day? And what skills would someone need to do that?” And what we found was that there are some fundamental skills as you would expect. There’s the basics of I know how to explore data. I know how to to work in something like EXCEL and filter and create charts and that type of thing. I have statistics. I have basic understanding of statistics.

One of the other interesting ones that came up, given that you know that this podcast is for primarily for SQL audience, we had suggested well TRANS-X equaled something that really database developers do and the data scientists that we spoke to said no we spent a lot of time writing SQL code. So that then became a fundamental kind of building block for if you want to do data science you have to know some SQL. So those kind of became the fundamentals.

We then partnered with a professor from MIT called Cynthia Rudin and a local Seattle based consultant who specializes in big data problems, called Steve Austin. And those guys built a series of three courses for us, really just covering the essentials of data science in the sense what are the processes you follow to do the other science. What are the techniques you can use to visualize the data, explore or find outliers, remove outliers, or clean them or all that data managing that you have to do to get it ready and then what are the kind of principles for machine learning models that you might build from that data or the different types of machine learning models. How do you build them? How do you optimize them? So really deep scientifically based approaches. Its not about using the tools. Its about understanding the underlying principles of how these things work.

So that became the core data science area. The third unit is where we wanted to say, well you’ve learned these skills, you understand the principles of what machine learning is or how to manipulate data. What do you do with that? And there are some specific types of scenarios that you might want to work in. Maybe you want to work with big data. So maybe you want to do all of this on SPARK, on a distributed cluster. How do you do that? Or maybe you want to focus on a very specific niche area, like I want to do image analysis or I want to do text analysis. What are the principles I need for that? So we kind of took those core principles of huge general data science and gave you specific applications of that. Or maybe you are a developer, you want to build a client that consumes that stuff. And we’ve tried throughout our entire curriculum to say well there are principles you need to understand. There are concepts.

The tools that you would use primarily for a lot of that stuff aren’t necessarily Microsoft tools. Generally you write a lot of scripts in R or Python. You might use as you are machine learning as a kind of way of productionizing this. A very easy way to build your machine learning model and publish it as a web service but within that you still maybe writing your own Python or R code. So there is a lot of kind of technology based skills or programming based skills you have to learn that are common to data science that go with those principles before you can start applying them. And then back to the competition or the project rather.

Having gone through that, having learned the fundamentals, learned the core skills and looked at different applications of those. We kind of felt we would be doing people a disservice if we said, “Hey you could just do that and then you can go out and get a job.” The employers want to know that you can actually do this stuff. You can put it into practice. And you know a student graduating with us wants to feel, “Hey I’ve built a portfolio, in fact here’s my skills.” So we partnered with the Cortana analytics team. They have a competition plan for them. Very similar to Kaggle if you are familiar with Kaggle.

And what we do with that is we basically give the students a huge data set. There are two million observations in the data set. And we say to them, “Hey you need to go and build a machine learning model based on this data. We’ll reserve some of that data back but we’ll test it with that and you have six weeks to build a model that scores an accuracy level of seventy percent or more.” And what was interesting about that was as the competition was running each time you submit an entry it appears on a leaderboard or scoreboard. And we kind of assumed that once people got passed seventy percent they’d be done. No people are really determined to get to the top of that leaderboard, so it was a great driver. I think the students got a lot out of that competition just because it kind of drove them to try to improve and tweak and I think the process of doing that forced them to learn for themselves by actually doing the thing we are teaching them to do, which is great.
Carlos Chacon: You know I think its a great idea. Again you’ve kind of gone though this process of several months of the course. You’ve built up relationships, hopefully with the other students in the course and then to be able to have this project and then be able to try it again to say, “Well that was attempt one, let me see if I can do that a little bit better. Where is an area that I didn’t understand that maybe quite as well as I could have or should have. Let me try that again and that will just make that easier the next time around when I am facing a data set and need to carve it out.” Again applying that hands on approach to all of these theoretical, I don’t want to say theoretical, but to this point it could be theoretical understanding.

Graeme Malcolm: Yeah. That was exactly what it was. I think what I loved about it was how real world that was. If I were genuinely a data scientist and I had a problem at work and I couldn’t optimize it I would reach out to the community. I would reach out and say hey guys anybody got ideas on how I could do this. And that is what we saw happening in the forum. So people were applying the skills that they learned in the courses. They were doing their best to get the model to where they needed to get it and then they were cooperating with one another to try to move the ball further down the field as you guys over here say. So it really was a very positive experience and I think any future programs that we do having that capstone might not be a competition. But having a capstone project at the end that you put into practice the things you’ve learned. I think that’s a key thing. A key element that don’t have to maintain.

 Carlos Chacon: Again. After having spoken with Patrick, one of the things we talked about was the Ranger or the Masters program having gone away. Right again, that was kind of focused for Microsoft people and they opened it up for others as well. Are Microsoft people going to be taking these courses like this? Or is the intent purely for outsiders?

Graeme Malcolm: Actually both. I mean the first thing to say categorically this is not a replacement for the Masters program. This not a bait and switch thing that is going on. This is a whole new idea that we are trying. In trying that internally we run it as a pilot for Microsoft employees but also some invited partners. So we had eight hundred people that went through the initial wave. They were actually forced to try and complete it in three months rather than six. So we really pushed them on that which was challenging. So we’ve got that. It’s going to remain open. We’ve got a lot of internal interest. A lot of people across Microsoft are saying hey this is really applicable to what I do and they want to take that. So we are offering that to our employees. Also to partners. Its now available for general registration, so you can go straight to academy.microsoft.com and go up and then sign. It will remain internal. It will remain used by FTE’s but its a public offering.

Carlos Chacon: So you mentioned the registration process. But because of the gated start, if you will. I mean everybody’s trying at the same time or other times there is a wait list. I guess when can people expect to get in if they were to register. You know here we are in October. When’s the next classes and when would people realistically be able to start. Are you gonna… I guess is there any cap. You mentioned eight hundred in the first group. I’ve heard several thousand are on the waiting list so to speak. I guess what’s that process?

Graeme Malcolm: So as of this week its open. You can go in there, you can register. You can start straight away. The next batch of classes that run is obviously going to be for the final core of this calendar year. So from the 1st of October til the 31st of December. So you can sign up and get in there right now and get started on the courses. Like I said we’ve only just opened registration. We had fifty two thousand people waiting to get in there. So we’re not at the moment we don’t intend to cap any sort of entrance. We’re happy to have everybody go in there. There maybe is the platform is just out there. It might take a little while to get registered and started right now. But the process is reasonably simple. You sign up at the Microsoft. Sorry… academy.microsoft.com. You can then… You will be linked from there to your edX course. The first edX course in the series. All the courses are offered from a platform called edX.org. So you can go to the first course. And the only thing you then have to do is link your edX account to your Microsoft Professional program account. You’ve got two different identities and we need to link those so we can track your completion. So there is a little bit of a kind of logistical hurdle to get through to begin with.

Carlos Chacon: Now that sounds like the Microsoft training I know.

Graeme Malcolm: Exactly yeah. So there is always the reality of how you do this but its fairly straight forward. We’ve got a pretty good engineering team here working really hard on making that as smooth as we can. Once you’ve done that once, that’s it you are then registered in the program. You’re edX account is then linked to your MPP account and we will track your progress as you go through.

Carlos Chacon: So as we mentioned there’s currently one program. It’s kind of now just opening up. What’s the future? What are kind of, in the horizon, what are other topics you are looking to tackle?

Graeme Malcolm: So couple of the immediate ones that we are looking at doing next year are big data engineering, which again is kind of related to the data science world. It’s not exactly the same but we are looking at big data engineering track and also front end web developer track is the other one that we are looking to launch. So those two for definite. We are discussing all sorts of ideas with all sorts of people to try and figure out what is there demand for out there? I guess a key things for us to consider for this audience, what’s appropriate and for the certification audience, and the MCP audience what’s appropriate. Some things would be appropriate for both. Some things are very much about people already enrolled just learning, you know product skills and not…. we’ve done that with MCP. We’re very successful there. We will continue doing that. Other things that are more longer term or more generic in terms in their skills kind of belong in the Microsoft Professional program. So as we are ready to announce those we will.

Carlos Chacon: Very nice. Yeah I think based on the initial reception. Fifty one thousand people waiting on the registration list to get in, that there’s lots of people interested in a program like this.

Graeme Malcolm: That seems to be the case. We are thrilled at the reception of people we’ve had. The people who have been through, the eight hundred pilot participants, by large I would have to say the feedback was hugely positive. And there are things we are changing as a result of the feedback and it will continue to evolve and that’s great, But pretty much most of the people I’ve spoken to had a really positive experience and found it useful including existing data scientists. There were people who went in there and had years of experience and still came out saying, “Hey look I learned stuff. It was valuable to me.” And that’s great.

Carlos Chacon: Very nice. Well I guess we are looking forward to more things coming out of the academy.microsoft.com.

Graeme Malcolm: Watch this space.

Carlos Chacon: Should we do SQL family?

Graeme Malcolm: Sure lets do SQL family.
Carlos Chacon: So while not a SQL person per se, we will adopt you into the SQL family. We’re a big family. So we can always use a few content developers. So our first question is what’s your best piece of career advice?

Graeme Malcolm: Over the years the thing I’ve learned is make sure that you are doing something you really enjoy. That you really love doing. It can be very easy to get kind of wrapped up in the ego of, like I’m the best guy in that particular thing or the monetary competition side of things. The best jobs I’ve had I’ve been really passionate about what I’m doing and I’m just really enjoying it. I think you will get more out of life if that’s what you focus on.

Carlos Chacon: Again. I do appreciate you coming on the podcast and you chatting with me today at the IGNITE conference. And our last question for you today is if you could have one super hero power what would it be and why do you want it?
Graeme Malcolm: One super power. Given my experience in the past year or so in building this program I’d love the ability to be in two or three places at once. That be great. That would make my life much more efficient I think.

Carlos Chacon: Okay, So kind of the duplicating ability. Very nice. Well Graeme, thanks so much for being on the program today.

Graeme Malcolm: Oh thank you Carlos, its been great.

 

Episode 68: SSIS Administration

For all of us working with data, we know that we work no only with SQL Server but the suite of tools around the SQL Server engine.  In this episode, we share some thoughts around SSIS administration, which is the most popular ETL tool in the Microsoft stack.  Just like the SQL Server engine, many times the defaults are taken and in this episode we give some thoughts around logging, package execution, and notification with our guest Ravi Kumar.  With the every expanding functionality of SSIS, Steve and I found this discussion insightful as we think about supporting an SSIS environment and ways we can educate package developers.

Episode Quote

That’s one of the things I preach to everybody that, keep your data types as narrow as possible. It’ll make SSIS Package run a lot faster.

Listen to Learn

  • Why you want to consider SSIS Buffer sizing
  • The differences between package deployment vs project deployment
  • The benefits of package deployment
  • The various logging types for SSIS packages
  • Two parameters to tweak for SSISDB cleanup
  • The validations steps a package goes through for execution.

About Ravi Kumar

SSIS AdministrationBesides being a former SQL Cruiser, Ravi is the BI architect for LMO advertising in Baltimore Maryland.  He is the co-leader of the BI PASS chapter and frequent contributor to SSIS related discussion on the east coast.

Resources

Ravi on Twitter
Buffer Sizing
SSISDB database maintenance
Project Deployments

 

SSIS Administration Transcript

Carlos:       Okay. Ravi welcome to the program.

Ravi Kumar:       Thank you sir.

Carlos:       It’s great to have you. One of the things that you’ve been talking and are doing I guess professionally is working quite a bit with SSIS. I guess shame on us. We’ve never actually had a topic on SSIS in the history of the SQL Data Partners podcast. I don’t know why its been that long but I thought that you would be a great person to bring on and talk a little bit about … From the DBA perspective what we need to know about SSIS. Some of those moving parts and maybe some of the catches that people can kind of get to get stuck with. Why don’t you start us off and talk a little bit about, can you give us the overview of SSIS and what people are mostly using it for.

Ravi Kumar:       SSIS is an enterprise-level tool that has to do with moving data in an out. It’s an ETL to extract transform and load the data. People have been using it for a while. It’s been around since 2005. Before 2005 back into SQL server 2000 it was called DTS. It became a lot more interactive and enterprise-level 2005 and have been improving it ever since then. Most of the organizations use it for moving there in and out and to load the data warehouses and to do migrations from maybe Oracle to SQL server or some [inaudible 00:02:14] to server or SQL server to some other data base.

Currently how we use it, we have at our organization data coming from probably 15 different sources and it goes out to about 5 different other companies. We ingest all the data, we do manipulations for that. We apply business logic to it and it gets sent out. We have certain processes that bring in data every 5 minutes, 24/7. It could be SSIS, it could be very minor where you have 1 or 2 processes running a month and it could be really highly skilled process, where it’s running 24/7 constantly.

Steve:         Okay. Ravi could you share with us a little bit about your background and how you got started and then how you’ve ended up working with SSIS today?

Ravi Kumar:       Sure. I started as a SQL database developer back in 2004 I think. Then it slowly moved in to BI world. Business intelligence usually is 3 parts. As reporting, which is SQL server reporting services, our data warehousing which is SQL server and [inaudible 00:03:31] services. Then moving the data ETL, which is a SQL server integration services. After I started with BI, I started with ETL, moved to data warehousing, moved to reporting, moved back to data warehousing and reporting, now I’m back to reporting and integration services again. It’s the ETL part. Currently I’m a BI manager and I manage a team of 3 other developers who exclusive work on SSIS.

Carlos:       1 additional area in thinking about SSIS and some of the ways you can use it there, is it gets baked in. It’s actually kind of baked into SQL server management studio in a sense as well. Even something as simple as I want to export an Excel spreadsheet into a table or move data from one database to another, I can … They have that, you can right-click import export data. There was 1 for import 1 for export. You actually go to the process and that will actually create a little SSIS package that doesn’t get stored anywhere or saved unless you take that option. Folks can actually use that to move data in and out

Ravi Kumar:       Again, SSIS comes with a SQL server right? You’ve got a SQL server licensing, standard or enterprise it comes with it. Once you go into that option it actually creates an SSIS package in the background and executes it and then gets rid of the package. You can save that and execute it again and again and again if you wanted to. Actually I use it for the same thing. Let’s say if somebody says “hey, I just want the data from this table to go to this table and I want to do it every week”.

If I try to manually create something like that, it’s going to take me maybe 20 minutes where it says if I go through SSIS and go through the visceral, it could take me 3 minutes and after that I automate it and I’m done. It has it’s place but it’s a really really small part of SSIS.

Carlos:       Sure. Fair enough.

Ravi Kumar:       Yes.

Carlos:       As we start using those safe packages and we want to put them into our processing or into our systems, there now are 2 modes or 2 ways in which we can deploy or save those packages.

Ravi Kumar:       Right. Now we’re starting to talk about package deployment vs. project deployment so there was only 1 way to deploy package up until single Server 2008, R2. In 2012, Microsoft introduced Integration Services Catalog. When you’re developing a package and visual studio, usually you have, starting 2012 you can have package or project. Package that is basically 1 unit of subset. It’s basically in the background as an XML file. You can just deploy it by itself and you can use SQL agent to automate just 1 package itself.

There are lots of benefits that you don’t get using package deployment model because it’s just one package. If you are looking at SSIS from a developers point of view, your looking at … Let’s say you need to download one file, put it in your database, do some transformation on it and do an export out of it. If you’re good developer, how you would develop it is you would have 1 package, just download the file. 1 package just to load the data, 1 package just to transform it somehow, 1 package just to export the data into a file.

If you are using a package deployment model you would have … You can still do it but since there’s different different sub sets of it, it just makes us to just to just deploy everything. Let me go over some of the advantages of project deployment. Benefits. First of all, if you’re using package deployment, you’re going to have store passwords somewhere because you going to move from dev to stage to production right? You got to have some sort of account that needs to access different resources, maybe some file systems. You’re going to have to put that password somewhere. What a lot of people did, they use a configuration file up until 2008 and they put that password in plain text within the file which is a big security hole.

Carlos:       Sure.

Ravi Kumar:       Once you deploy to catalog you can have those passwords or those sensitive information into a parameter and then you can change it directly on the server and they can deploy from server to server. Let’s say you deploy from staging to production. The first time you change it on production, it will remember that next time so you never have to change it again. It uses Alias 256 encryption to encrypt everything on the server so it’s a lot less security holes.

Steve:          Then in that example for a developer working on the project, they could deploy to a development server or staging server, then whoever you have doing your configuration management or your deploys could then push from that staging server directly to your production server?

Ravi Kumar:       Yes.

Steve:         Then you know you’re getting exactly what’s on staging and not some files that might have got mixed up along the way.

Ravi Kumar:       Exactly.

Steve:         Thanks.

Ravi Kumar:       You’re going to need to make sure that developers, anything that can change from environment to environment, they put that into the parameters. What happens is that you have … Your connection managers, they’re already have parameterized but anything, let’s say the name of a file or location of a file. If that’s going to change we just put that in parameter and then it’ll get changed once you move to the dev to stage your production.

Carlos:       Is that exactly a one for one. I almost think of it in very simple terms, almost like an ad hock query versus a store procedure right? Where I can encapsulate some logic and I can do some different things with the parameters that I pass in right? That I can’t just do it, that I have to manually change if I was doing that with an ad hock query right?

Ravi Kumar:       Yes. That’s a good analogy. You don’t need to change the code you just change the parameter within that store procedure.

Carlos:       Right. Now there is some additional overhead, I’ll use that word overhead or additional items that we need to have for the project deployment. I think one of the things it will notice is that new database in the SSIS DB. I think this is where again in there you’ll see the option to set those parameters that you talked about in changing things from environment to environment.

Ravi Kumar:       Right. SSIS DB is a whole other beast and we’ll get to it. Yes, that does need to be managed from same level. That’s one of the benefits of deployment. The other benefit is obviously encryption. That your whole package and everything is encrypted right away. Also one of the huge things that I noticed with project deployment is, when you deploy to the catalog, it automatically creates login for you. Up until 2008, what developers had to do, they had to go in and say “okay this is …” You have to log when the packet’s starting, you have to log when it’s ending.

They kind of know when everything executing. It automatically does that through SSIS DB. When you create the Integration Services Catalog it doesn’t get created automatically on SQL server. Once you create that, it creates the SSIS DB for you. Everything gets logged to the SSIS DB and there are views that it can clear you off of. We actually have a Power VI dashboard for our exercise environment so very clearly exercise DB and we know exactly how everything is running in the background.

Carlos:       Interesting. I think that a full out-of-the-box behavior is that you then have to go into the reporting and also I’ve forgotten the name of what that’s called. I think it’s …

Ravi Kumar:       That catalog?

Carlos:       Is it the catalogs on any SQL server management studio? I go into to the catalog and then I can see my package is there, then I can right-click and view those reports. Like view the execution times or whatever that is.

Ravi Kumar:       Yes. You can go on the catalog and you can see the reports. It’ll tell you what failed and it’s really …

Carlos:       You’re actually using something on top, you’re not using those default reports?

Ravi Kumar:       No. okay I do use those reports when something fails right?

Carlos:       Okay.

Ravi Kumar:       I go in and I say okay what failed then it’ll give me details message but if I want to know from a [inaudible 00:13:26] because you have 20, 30 different packages running, some of them running every 5 minutes, I want to know exactly what’s happening, how many times it’s executed, if certain process is turned off or not, then I have a dashboard as a high level view and I query SSIS DB directly for that. That’s one of the huge benefits of having that login built into the project deployment model.

Carlos:       Okay.

Steve:         Okay. When you talk about that Power VI dashboard query in the database there. Is that something that is specific to your environment? Or is it something that would work with pretty much any SSIS DB?

Ravi Kumar:       That will work pretty much with any SSIS DB but currently I have … It’s just the way you have a setup. I have it setup where I have certain ware clause in it and that ware clause directly relates to the projects I have. You could build it in a way that it will work with any Power VI, I mean any SSIS DB because all the views are same on SSIS DB.

Steve:         Right. Okay.

Carlos:       Other than the jobs failing because sometimes the login can get fairly … Depending on the number of items you have in your package, you can get a quite large … I guess other than package failures, what are you use login for?

Ravi Kumar:       Package failure is just execution that it executed. what time is executed so you know what’s actually happening with the package. There are different levels of login within the project deployment model. We’ll cover that in a later section, which is login in itself.

Carlos:       Okay. I thought we were there. Okay.

Ravi Kumar:       Yes. Since we’re already into it, let’s talk about login since we’re pretty much done with all the benefits. Login automatically comes with it and there 3 levels. There is performance, basic and [inaudible 00:15:43]. In my shop since we have so many packages running. I’ve run it on basic which is normal, which is default. I’m sorry I said that backwards. Okay, I’m going to redo that. There are 3 different login levels, performance, basic and [inaudible 00:16:02].

Performance is the leanest. That’s what you would prefer your production server to be on. Performance only logs your warnings and failures. Basic, it logs more than performance. It logs when a package is starting and then a task within a package is starting and when it’s failing and what happened with it.[inaudible 00:16:26] logs everything. They even logs the role counts of what’s coming through. In 2016, the latest version, they introduced 2 more login levels which are Run time Lineage and Custom. Not going to do much into it but Custom I am excited about because you can customize what exactly log out of the box.

My preferences, keep your production on performance and your dev on verbos and your staging on basic. SSIS DB, if you’re running packages constantly, it creates a lot of login and your SSIS DB usage goes high. It does create a lot of data so even with me being on performance and only 15 days of data, it does create … My SSIS DB is constantly around 45 gigs.

Carlos:       Okay.

Ravi Kumar:       15 gigs days. Go ahead.

Steve:         Then is that 45 gigs for looking out there, is that something that you have to manually clean Up over time or does SSIS clean it up automatically for you?

Ravi Kumar:       That’s another part of this. When you create SSIS Integration Service Catalog, it creates 2 jobs for you within SQL agent and it automatically schedule it to run at 12 am. What that job does it cleans SSIS Db before for you. If you go to integration catalog, right-click on it and go to properties. Within the properties you are going to be able to see different different things. There are 2 properties that you should change. One is the login history, how long you want to keep the login history for. The default is 30 days. In production environment I don’t recommend 30 days because do you really need to look back 30 days?

In production what I use it for is make sure something ran, if it failed. If it failed I need to go look at it so I only kept it for 15 days. If you have only 1 or 2 packages running everyday, it’s not going to create a lot of data so you might be okay with 30 days or even a couple of months.

Steve:         Really depends on the volume it sounds like.

Ravi Kumar:       Yes. It really depends on the volume and how much data or how much space you have.

Steve:         Okay.

Ravi Kumar:       The second property I like to change on there, it’s called how many versions of the package you want to keep. Once you deploy to catalog, it actually saves versions for you. Say it’s up to 10 versions, I have it set to 20 versions just in case. Once you change and what you can do, you can automatically go to that project. Let’s say you just deployed something and it starts breaking. You can easily go on let’s say on versions, click on the version and you can revert it back, go to your production and you’re done and you tell developers “hey, something broke fix it”.

Steve:         With that, the version of the projects does that roll back the entire project or just specific packages?

Ravi Kumar:       It rolls back the entire project.

Steve:         So it’s in 1 step you just undo that last deploy effectively or you could even go back.

Ravi Kumar:       Exactly.

Steve:         1 or 2 or further if you needed to?

Ravi Kumar:       Yes. Up until 2014, you could only deploy the whole project. In 2016, Microsoft gave 1 more option of package deployment where even your project deployment, you can just deploy 1 package to that catalog. Even if you only change 1 package. Up until 2014, you were required to deploy the whole package.

Carlos:       Okay. Got you. I can have a nice new feature then just have to only have to deal with the one if you have lots of packages.

Ravi Kumar:       Yes. Let’s say if you have a project and you have 60 packages within the project, people are saying they don’t want to deploy the whole thing. It just takes too long.

Carlos:       Got you.

Ravi Kumar:       Yes. If you use versioning it goes back to that previous version.

Carlos:       One question I had on the login, I ask because I was met with kind of blank stares. We could talk a little about execution and performance because it ties in. One of the things that it will log. If you’ve used SSIS let’s just say you have a source, table, let’s just say it has 10 columns. Then your destination only has 5. You’ve selected all 10 of the source that SSIS will put in the login, I’ve seen that little warning. It says “hey dummy this column is not being used”. You may want to exclude that from the package so that you’re not using as many resources to do this.

Ravi Kumar:       Yes.

Carlos:       One time when we were … Just like everything else SSIS packages sometimes experience performance issues. They want us to take a look to say “hey, will you take a peak at this and see if you can fix it?” When I pointed that out, nothing seemed to change. I guess my question is, should we be taking that warning with a little bit more severity or is that truly just like ehh, forget about it, just another record?

Ravi Kumar:       You don’t need to but it’s not good that development practice because what happens, let’s say developer’s developing that’s. It’s using their system to bring in all the 10 columns into the memory and only write 5 but once you deploy to SSIS, SSIS is smart enough saying “okay I’m not using these columns. Why do I need to bring the data into the memory” and it just ignores it.

Carlos:       The error warning really is then a red herring?

Ravi Kumar:       Yes.

Carlos:       I see that in the error messages says “hey, you selected it”. I’m actually not impacting my SSIS packages by it?

Ravi Kumar:       Yes. It’s more of what development annoyance and developers shouldn’t be doing this anyways. 1 other parts I do is SSIS internals. This is one of the things I stress and exercise internals is that, don’t bring in these columns because it’s just going to hinder your development. Let’s say one of the columns says a blob, it’s going to slow down your packets considerably.

Steve:         Just to confirm with that. If that column is a blob, some type, you’re going to bring that back to the SSIS server. It’s going to see the data and ignore it?

Ravi Kumar:       No. What SSIS is going to do, when you’re developing it, it’s going to bring it but it’s going to bring it to your machine in development. Once you deploy to SSIS server, SSIS server is going to say wait I don’t need to use this data. It’s just going to ignore it and not bring that data into the memory.

Carlos:       I think we should be more specific even. When I have to run it, when I run the package locally. I’m in SQL server manager studio or visual studio and I hit the run button I’m going to have the effect of having this extra columns hit my memory buffer.

Ravi Kumar:       Right.

Carlos:       When I save the package out and have it execute through an automated process, the process of saving the package out to the server if you will, that’s when the columns get ripped out of the package?

Ravi Kumar:       Correct. Steve let’s say if you open visual Studio on a server, your production server if you want to, that blob is there and you execute on visual studio, that data will be brought into the memory.

Carlos:       A double whammy, you’re on the production server and then you running packages that you shouldn’t be. Yikes!

Ravi Kumar:       Exactly.

Steve:         When it’s running that mode then it sounds like it’s very similar to the age old debate over select star versus selecting explicit columns. SSIS is smart enough when it’s running in production mode to rip that out and strip it down to what it needs it sounds like. I think that answers my question. Thank you.

Ravi Kumar:       Okay.

Carlos:       I guess let’s go in and talk a little bit about that, about packages execution. We’ve talked a little bit about setting up parameters, we can change the context under who it executes as, what connections it makes, things like that. Other considerations when we think about executing the packages?

Ravi Kumar:       As far as executions from the point of view I usually suggest, we’re going to tackle it 2 ways. First is that what do you need in order to execute it and what does when it says go time. It’s a little bit about internals. We use SSIS … You deploy to SSIS catalog then you have to execute the packages using SQL agent. What I suggested that you create a credential with it, so you have an active directory account and have an active directory account for each business area.

Let’s say if you are executing package for sales or if you’re executing a package support finance, they all need to have different resources. I would suggest, for part business area, have execution account for SSIS and create a credential for it and give it proper access to any file shares or any servers that you need. Then create a proxy with SQL agent for that credential and then It’ll be available under run as account when you go do a schedule job.

That way SSIS package would run a lot smoothly. Many times, everybody runs into this problem. They deploy a package and they’re saying “okay wait. It can’t execute” because well your executing it as SQL agent service account.

Carlos:       Sure. They made it elevated privileges to get out, attach the files or other systems things like that. If your SQL agent account won’t do that, that’s when the proxy account will come into play.

Ravi Kumar:       Right. You shouldn’t be giving SQL agent account access to everything.

Steve:         I’ve seen that done where because it’s running as a SQL agent they just give that agent account access to everything. Then you’ve got a huge security hole. The proxy option seems like a much better way to do it.

Ravi Kumar:       That’s how we have a setup at our … We have a part business area. We have 1 active directory account and it has access to those particular resources. You create the credential and now it’s time to execute. kind of what SSIS does in the background, as soon as you hit execute, it loads everything into the memory. With SSIS, everything is about memory. It reads XML, it decrypts the XML. Here’s 1 very interesting thing, when it’s loading it, it checks. Let’s say if you’re running all the like SSIS 2008 package on SQL server 2012. It’ll check the version and automatically upgrade it and load it into the memory without telling you anything. Go ahead.

Carlos:       I remember this. The version of the execution of the container might play a role in changes to your package.

Ravi Kumar:       Yes. It doesn’t tell you that there’s a problem. It just upgrades it and drags it. That’s it.

Steve:         That upgrade step, does it take a while? Does that slow things down or is it a pretty quick thing compared to the rest of it?

Ravi Kumar:       It’s pretty quickly for the most part but if you have a package that’s like 22,000 line and somebody made one of those into one package to rule them all. Then it could take a while but I haven’t seen many problems with that and then it …

Steve:         To get around that then you’re probably best off using the same version to edit the packages with as you are when you deploy it, where you’re deploying them to.

Ravi Kumar:       Correct.

Steve:         Okay.

Ravi Kumar:       Then you apply any configuration on any expression. This is where your configurations comes in play, if you have any. Remember we talked about parameters, that if you change from server to server you have different parameters? The next step is, it loads a parameter and then it says “okay just go validate yourself”. The packet goes through validation so you must have seen it in the past that a package fills that says it couldn’t validate and then everybody’s wondering “what data got executed? I don’t know”. Well, nothing got executed because it was just at a validation step. It never read any data at all.

Steve:         Then it’s validated itself and is that where it starts to process the data next?

Ravi Kumar:       Yes. Once it’s done validation, that’s when it says execute action time. That’s when it goes into execution. That’s kind of how the package gets started. Once it’s running, that’s when we start getting into buffers of how the data is brought in and how much memory it’s taking, things like that.

Carlos:       What happens to the common issues that you’re seeing there. [inaudible 00:31:54] at this point we’re talking about performance. How much work can I do in as little time as possible.

Ravi Kumar:       Sometimes validation does take long but everything needs to get validated. One of the issues I’ve seen there is that sometimes the package needs to create a file half way through before another component can access it. The file is not there, it’s going to fail validation and it’s going to say “the file is not there” because package hasn’t created it. There’s a property that you always need to say it’s delayed validation. That’s one of the got you’s I’ve seen many times, where they wouldn’t set that property and the dev they would never be able to figure out why it’s failing.

Carlos:       I see.

Ravi Kumar:       It can kind of go and load. Steve, any questions?

Steve:         No. I think that makes a lot from the experiences I’ve had, understanding how those steps go through now.

Ravi Kumar:       Then SSIS goes and does it’s thing and then it moves to … One other big thing is data flow. Most data flow and said okay now I’m going to bring in some data. This is where we kind of get a little bit of performance tuning and we start talking about buffers. There different kind of buffers, physical buffers, virtual private flap but what it basically comes down to it is, it’s taking up memory on to your server. We touched on this a little bit when you were talking about if you have 10 columns and you’re only bringing in 5 columns.

This is one of the examples for that. Also, if you want package to run really really quickly, use proper data types. This is when you can go in and tell developers that “you’re bringing in data as [inaudible 00:34:06] 255 when you’re only loading 2 digit estate code. It should be charged to. What SSIS does, if you’re reading it as [inaudible 00:34:21] 255, your data type is that then it’s going to take up 255 bytes onto your buffer. It creates a lot of overhead and it reserves that memory for itself.

Steve:         It sounds like it will take up the maximum size of those variable columns?

Ravi Kumar:       Exactly.

Carlos:       Then for each record that comes in, it will take up that size?

Ravi Kumar:       Exactly. It can inflate your data a lot.

Carlos:       I think it’s so easy to overlook that because it happens a lot even in stored procedures. You create a temp table of [inaudible 00:35:05] or an [inaudible 00:35:06] and then you’re comparing or updating the opposite data type. You get a data type mis-match on the engine site then of course the engine has to do that conversion for you. It doesn’t do that very efficiently. The SSIS site we have that same kind of problem except now we’re actually going to get hit a little bit harder potentially with memory allocations.

Ravi Kumar:       Right. Just imagine if you only have 10 columns and you have [inaudible 00:35:38] 255, how much data it will take even though you don’t have that much data.

Carlos:       Sure.

Ravi Kumar:       That’s one of the things I preach to everybody that, keep your data types as narrow as possible. It’ll make SSIS Package run a lot faster. This problem gets inflated a lot more later on when I start talking about buffer sizing. Within 2016, there is a new property call Auto Size Buffers. Before that you could tune the buffers if you wanted to and there’s a whole thing on … On internet you can read about tuning the buffer sizes but it comes down to usually each buffer size to about 10 max.

You can increase it too if you want but it’s a balancing act. What it does, let’s say you can only put 10 megabytes in 1 buffer at a time, while the pack is going to run for a longer time because it can only fit so many rows in 10 megabytes. It’s going to read the row and it’s going to apply transformations and it’s going to write the rows. It’s got to do that again. If you can fit more rows into the memory at one point, at one time then it’s going to go, your package is going to run a lot faster.

That property could be changed but there’s another caveat to it. If you increase it way too much and it’s going to say “well, I don’t have that much memory” it’s going to start driving to the disk. That’s when things slow down considerably. There is a performance counter that you can use. It’s called catalog.DM underscore execution underscore performance underscore counters. DM execution performance counter and they’ll tell you if the data is spilling to the disk. That’s one of the things you don’t want SSIS to be spilling to the disk, especially if you have a SQL server and you have 64 gigs on it, 45 gigs allocated to SQL server, the rest is for the OS.

Your SSIS doesn’t really have a lot of memory to execute and it’s going to start writing to the disk.

Carlos:       I was going to bring that up as well in the sense that then you need to start taking a peek at what your SQL server wants to read for memory as well

Ravi Kumar:       Exactly.

Carlos:       As well as SSIS, I think a lot of times at least in the beginning stages they’re the same box because there’s no licensing implication to make that happen. Then all of a sudden, you could have a lot of contention at the same time and kind of have to have a marriage there, decide who’s going to go get the main resources while those processes are happening? They’ll step on each others toes.

Ravi Kumar:       Exactly. SSIS can take up a lot of memory one time because we know about execution parts but SSIS also has, it’s what’s called execution tree. Within one execution tree you can have up to 5 buffers and within one data flow you can have multiple execution tree, executing at the same time. It could be 5 buffers x 5 or 10. That could easily start hogging up your server. There is a property that I usually suggest my DBA’s to have the developers change it. It’s called Blob Temp Storage Path and Buffer Temp Storage Path. SSIS doesn’t handle blob data really well and most time it ends up writing it to the disk.

What I suggest is you DBA’s have some sort of fast disk preferably a SSD on SQL server. Have this path, Blob Temp Storage path and Buffer Temp Storage Path mapped to that drive because needs to done be done with the data flow or it could be written into a configuration property as well if you want it to. Then you can have that path be directed to the fast disk. It’ll start writing there. Sometimes you want to be able to figure out that okay SSIS is taking up too much memory, it’s writing to the desk and you will see drivers starting to fill. It’s a nightmare for a lot of DVA’s

Carlos:       It should clean up after itself. You’re not going to see the implications. You might see that like the warning come through but then after the package is finished those files will get thrown away.

Ravi Kumar:       Right. There’s 2 caveats to it. What if your C drive gets full and you don’t have that much memory on it. The second thing is that it’s going to be slow. Unless you’re writing it to SSD. That’s one of the things that such as developers to change those 2 particular pack.

Steve:         It sounds like at a minimum you want to make sure that you get those off of the C drive, so that you’re not going to take the system down if it runs out of space and while you’re doing that you should get them on fast storage like an SSD as well to get the best performance out of it?

Ravi Kumar:       Correct.

Steve:         Very similar to the temp DB type recommendations that we have with SQL server as well.

Ravi Kumar:       Yes. Very similar.

Steve:         Okay. Great.

Carlos:       As we’re kind of wrapping up here Ravi. You mentioned a couple different flags. We’ve talked about different scenarios and ways people can take a peek at their environment. I guess last thoughts about best practices or things that … I’ll start, I’m going to say that again. As we wrap up here we’ve talked about different parameters, ways that administrators can take a look at their environments. Other best practices or things that the DBA’s can review on their SSIS environments to make sure they’re not shooting themselves in the foot.

Ravi Kumar:       Carlos that is a great question and I do have an answer for it. I’ve put together a small list of things I’ve learned over time for SSIS. Number 1. App developers make multiple packages. Don’t let them do 1 ring to rule them all, that practice. Not 1 package do everything. You have 1 package do 1 single thing so they can have about 10, 20, 30 different packages within a project.

Carlos:       Now I’m going to have to stop you because then this idea of chaining. I’ve even seen when that doesn’t happen, I feel like the natural instinct is that they break that up into jobs. I’ve actually seen environments where like well these set of packages, we expected to take an hour. We’re going to start that at 1 am. Then these next set of packages will just schedule at 2 am because we have to with for the first ones to start.

Then there’s that day, things are running slow, something happens you have [inaudible 00:43:26- 00:43:27]. Run a little bit long until my 1 am job goes until 2:15. Then that second wave ends because of the “oh hey was it ready?” Then I’m going to get yelled at, jobs fail, the warehouse isn’t loaded and now all of a sudden everybody is like “Ravi!”

Ravi Kumar:       I’m going to finish what I was saying Carlos. Thanks for interrupting me. Once they have multiple packages, then you have them 1 package that executes the rest of the packages.

Carlos:       That sounds like the one package to rule them all.

Ravi Kumar:       No. One package to rule them all is when you have 1 package to do everything. Here you have multiple packages do 1 same thing but you have 1 package to execute rest of the packages. There a couple of benefits to it. One of the benefits is you only need to schedule 1 package. Let’s say you deploy to your catalog and you just execute 1 packet and it will execute rest of the packages. The second huge thing about this is that you can reorganize the packages that how they are executed.

You can take advantage of parallelism like crazy. There are certain packages that are dependent on each other, there certain packages who are not depending on each other. You can have packages that say “these packages I’m going to execute parallelry”. They’ll execute at the same time. Then you can have what’s called presidence constraints and then you can have all those packages connecting to one package that is finished or all those package needs to be finished before moving on. You can paralyze it.

Let’s say you deploy it and you say “well it’s taking up too to much resources and we cannot have parallel. All we need to do is you go in 1 package and you rearrange how everything executed. Now it’s not taking up so many resources any more.

Steve:         Interesting. That 1 main package, it just knows what set of parallel tasks can run. Then you can use that to really throttle what’s going on to get the best use or not overload your system at that point?

Ravi Kumar:       Right. I’m going to use an example of one of my friends. I know him through our local user groups, he works for Pandora jewelry in Baltimore. They have a huge data warehouse that runs around 2 terabytes. They had one package just like this for execution and it was just bringing the server to the knees and just stuff would just stop. It was taking so much resources. All they did, they just put it in 1 package, re arranged it so things we’re not so much paralyzed, redeployed it, took care of the problem. It makes things very flexible.

Steve:         I like that and I think I will certainly use that next time I’m working with SSIS.

Ravi Kumar:       Awesome. Also sometimes DB’s are not sure what SSIS package is doing. They’re like okay, “I’m not sure how it’s running into my system”. Well if you don’t know how it’s running into the background and what it’s doing on the database, run SQL profiler. I’ve had to do that many times before but somebody deployed something and I’m like “okay I don’t know what it’s doing”. Instead of going into the package it was like okay “let me look at it, what it’s doing on the server side” which makes it a lot quicker. Obviously you can open a package and figure out what it’s doing but that’s going to take a lot longer.

Steve:         I can definitely see how the profiler would really simplify figuring out what the package is doing because the package can certainly be confusing if it’s your first time looking at it.

Ravi Kumar:       Yes. Also implement …

Carlos:       I would think, that seems [inaudible 00:48:02] to me. I like the visual of the package. It at least will point me in the right direction. The concern I would have with the profiler is being able to window that down. Is there like an event class or application name I guess that we would just filter on to make sure that it’s only SSIS packages that get in there?

Ravi Kumar:       I think it’s the user that executes it. If you have one of the user executing it, then it’ll login that user and you can filter by the user.

Carlos:       Then it should be our proxy account?

Ravi Kumar:       It should be a proxy account but if you’re just, let’s say you are, you’re not sure what you doing in your development. Somebody wants to know what it’s doing in development. Usually I don’t try to run profiler on production but this is one of the tips and tricks for development, is that if you want to know what it’s doing, you user who’s executing or maybe you’re executing it at 1 point, you can see what its doing in the background.

There certain things that sometimes happen in SSIS that are hidden in expressions. Different different expressions that it wouldn’t be apparent to you why exactly it’s doing that. Sometimes you need to go in the background to see what it’s doing. SSIS by nature is that you can program it away so that everything will be hidden and you want to know what’s going on. It really depends on the developer, how they developed it. This has come in handy before.

I’m like okay, I have looked through everything and I don’t know why it’s happening this way. Let me go look at the profiler. It’s more of a development tool than anything or troubleshooting. Moving onto the next HR is bad you can pray everything was going okay I have everything and I don’t know why it’s happening this way let me look at the profile development troubleshooting. Moving onto the next point, 1 of the things I have found huge is implementing standards within SSIS. Your naming standards or how you do things. One of the things earlier I talk about is having multiple packages for a project. That could be in a standard having naming conventions for standards.

That helps a lot. Also SSIS doesn’t do everything out of the box. Something as simple as zipping up a file, it doesn’t do that. It doesn’t have a task for it. I came across many times to developers “oh I’m a developer. I can just build that task”. They’ll go spend days on trying to build a task then US DBA might have to run like EXC on your server, and have to be able to deploy in on every server just do that one thing. It takes development costs really high and it’s a pain in the butt trying to deploy it.

It’s better to buy like a standard SSIS extensions so they’re provided by … One is task factory from Progmatic works. Other is COZYROC. Those 2 connectors, install it every server and you’re done. It covers almost everything that they can think of and you know it’s been tested by the company.

Carlos:       Full disclosure COZYROC has been a previous sponsor of this podcast. We’re familiar with those guys.

Ravi Kumar:       COZYROC tend to be a bit cheaper than Task Factory but at our work we use both of them because I like the Salesforce connector for COZYROC but there are other connectors that are TASK factory that I really like. I have both of them and it pays for itself over time.

Carlos:       Ravi’s just rolling in the dough. Over there in the advertising.

Ravi Kumar:       The last point I have is for DVA’s is how you annoy the BI team. What you do is you create email group for BI team. For us we have [email protected] It could be [email protected] whatever. Have an email group for them then have that email group as database mail also. Any time a package fails, it sends everybody an email like “hey this package failed”. Have that email go to the BI manager also so everybody freaks out that something failed and production.

Carlos:       What would you have us do instead Ravi?

Steve:         I think what you’re saying is do that so that there’s the peer pressure on that team to go fix it.

Carlos:       I thought don’t send that email. I see.

Ravi Kumar:       Make sure you have a database group for the BI team. Everybody is responsible for their own task. If something failed, they need to know that something failed. It’s better for them also because if some of their reports are relying on that process to run, they don’t want to send that email automatically to the business when it’s the wrong email. They need to be notified that something has failed or not.

Steve:         Basically expose all the success or failure of the jobs the entire team so that everyone knows what’s happening and what needs to be fixed?

Ravi Kumar:       Exactly. If it’s a success don’t worry about it. If it’s a failure, everybody needs to know.

Carlos:       I feel like you have a little bit of culture potentially in there. Who fixes that might vary but at least having everybody on board I can say that would be helpful. Ultimately you do want, I think particularly in the data warehouse environments, if those jobs do fail, you’re going to want to engage those guys because they’re going to be the ones. Hopefully, they’ll be fixing the packages for you. Very good. Should we do SQL family?

Steve:         Yes. Let’s do it. A new question I think we’ve added this week is on SQL saturdays. Where was the very first SQL Saturday that you’ve attended?

Ravi Kumar:       It was actually Richmond, was the first SQL Saturday I ever attended.

Carlos:       [inaudible 00:54:35]Ravi Kumar:       It was funny. I have story about that. I went on SQL cruise not knowing that there’s a huge SQL server community there. Why I went on SQL cruise is because I heard Chris Bell, I met him one time at a SQL user group and he mentioned SQL cruise training, that stuck. I talked to him, he convinced me to go to SQL cruise. Then I learned all about how big deal SQL Saturdays were. We went on cruise in January and SQL Saturday was coming up in March in Richmond. That was the first one I attended.

Carlos:       Very nice. Technology is constantly changing. We’ve talked a little bit about you and some of the 2016 features that have changed in SSIS. How do you normally keep up with technology?

Ravi Kumar:       SQL saturdays, Path summit and if there’s a problem, I try to go find the answer online and Twitter. I usually follow all the people who are leaders and great in technology who tweet about these things. Even the SQL family then I kind of keep watch on it, what’s coming in, what’s not coming.

Steve:         Okay. Great. If you could change 1 thing about SQL server, what would it be?

Ravi Kumar:       If I can change 1 thing about SQL server what would it be? I wish they would increase the SQL Express database size from 10 gigs to at least 50 gigs. That includes SQL agent.

Steve:         They have that. It’s called standard edition right?

Carlos:       There you go. We’ll see what happens with the SQL server on Linux. Maybe you can get your gigs that way.

Ravi Kumar:       We’ll see what happens.

Carlos:       Bets piece of career advice you’ve received?

Ravi Kumar:       Best piece of career advice I’ve received? Keep your resume’ updated and go interview at least once a year even if you don’t have intentions of taking a job. Just to know how the job market is out there.

Steve:         Interesting. That’s the first time I’ve heard that one. I like that. If you could have 1 super hero power, what would it be and why would you want it?

Ravi Kumar:       That’s a tough one. What’s the 1 super hero power I can have and why would I want it?

Steve:         Just 1.

Ravi Kumar:       I think, does Dr. Manhattan counts as superhero?

Steve:         I don’t know.

Carlos:       Yes. We’ll count him.

Ravi Kumar:       Dr. Manhattan is this blue guy from the watchman. He’s basically a God. He can do anything by … He can close his eyes and in a blink he’ll be on Mars. I would like to have his super powers so I can visit Mars and any other planet. Be next to the sun and not get burned.

Carlos:       There you go. You want to explore space?

Ravi Kumar:       Explore space.

Steve:         Nice.

Carlos:       Well, Ravi thanks so much for being with us on the program.

Ravi Kumar:       Thanks for inviting me guys.

Episode 67: Dear Developer

Compañeros! The good organizers SQLSaturday in Pittsburgh allowed me to take a slot on the schedule and opened up the podcast for a little Q&A session and I was super happy with the results.  We had several SQL Server pros come and our conversation was around one central issue–testing in your development environment.  We came at from a setup perspective, but also from a how to compare apples and oranges when your hardware is different.  Because we had a developer ask the question, I opted to call this episode  “Dear Developer”. Hear from Microsoft MVPs and SQL Server consultants such as Allen White, Kevin Feasel, Jonathan Stewart, Tim McAliley and others give their thoughts on this subject.  At the very least you will have a few more ideas on where to start looking for potential options to your testing after listening to this episode.

Episode 67 Quote

“Containers are the same thing. Learn Docker, learn Docker or eventually you will lose your job. Windows Server 2016 will have Docker support. Windows 10 professional edition has Docker support. We’re seeing this already on the Windows side. We’ve seen it on the Linux side for several years now. It will get more popular on this side of the house as well. It is time to get ahead of that because otherwise these containers will come up, you will have no clue what you’re doing with them, and bad things will happen.” – Kevin Feasel

 

Listen to Learn…

  • The importance of Docker and containers
  • Why Allen says you have to “re-tool” yourself every five years
  • Why being the master of everything is no longer necessary
  • How to pick a professional specialization
  • Which essential database tools they recommend

 

Resources

Microsoft Ignite
What are Containers and Why Do You Need Them?
What is Docker?
Wait Statistics, or Please Tell Me Where it Hurts
SQL Sentry (Sentry One)

Episode 67 Transcript

Carlos:                  Okay. You can stay seated. Here we go. Okay companeros, we are here. It’s SQL Saturday Pittsburgh. We’ve got a nice little discussion brewing. I think we’re going to title this session Dear Developer. We’ve opened it up to invite people to come in and ask questions to our panel. Actually as we go through, because the transcript … Poor transcript-er, I’m sorry, whoever you are that’s listening to this now. There’s a lot of people on this on. Hopefully you can make heads or tails of it. When you get the mic for the first time say your name please so that we know who the transcript-er will be. Anyway, we have David here and David has a question, our first question.

Dave:   Hello. My name is Dave [inaudible 00:00:46] and I work for Community Care Behavioral Health, which is a subsidiary of UPMC. In a prior lifetime I was a DBA, but that was 20 years ago when things were so much simpler. Now, as a developer I find that when I’m having production issues or when I’m developing code and I’m trying to test something out I can generate the volume of productions records that I have on my test system, but I can’t determine whether my code is really as bad as it seems or it there’s an issue with the development box having been set up by a system administer who doesn’t know sequel server if they’ve set it up wrong, or if my queries that bad because my response time is just really bad sometimes on some of my solutions. I don’t always have time to use all the tools that DBA’s use. I wondered if there were some easy tools that would give me quick pointers as a developer to say, “Hey. Look, if you do this or that than you can change it. After that then you have to call in the cavalry.

Carlos:                  Okay. Ultimately I’m having problems in my dev environment, what do I do? Right? We’re going to start. Again, if you introduce your names. I’m going to start over here with Allen, and let us know.

Allen:                    I’m Allen White and currently i’m the business development manager for Sequel Sentry. That plays into the answer that I’m going to give you. Just this past week we released our free tool Plan Explorer in the new, what we were going to call, the Ultimate Edition. The thing that is tremendous about this tool for you as a developer is the fact that you can not only look at the query plan in far more detail that you’re going to see it in management studio, but you could actually have a DBA with full access to the production environment, run the query from plan explore, save the actual query plan and all the data that our tool collects into a special PE successional file, and then send you that PE session file. You could pull that up in Plan Explorer and not only get the execution plan and all of the data related to that execution plan, but also the detail on all the indexes it uses, it could use, and what we might recommend for you to use as an index to solve the problems with the query.

There are multiple tabs that you can look at on a query plan, both the visual plan that you’re familiar with from management studies, but also we show you the top operations that are going on, we can show you a diagram that shows the various joins that are in that particular query, we can show you the waits that that query is waiting on, all within Plan Explorer. If they run this in production, even thought you don’t have access to production, they can save that session file, send it to you, and then you can do a detailed analysis and find where the issues are in that query. The tool is now free. They can run it, they can install that in their environment where they can access it on the production system, and you can install that same tool at no cost to your company. That’s the approach I would take.

Carlos:                  Okay. Getting deep down into execution plans already. Tim, what you got for us?

Tim:                       [inaudible 00:04:48].

Carlos:                  NO, no, no. I’m saying that was where Allen was going. What’s your thought? Response to my dev environment is slow, what do I do.

Tim:                       I think Allen’s got a really good response just in general, but I always take the onion approach and start broad. I make sure I take a look at storage, I might take a look if it’s a performance problem related to how your work loads going, take a look at storage, take a look at network, take a look at the components of the applications tier that are hitting the data tier, then work my way into the server and up through the engine to see what’s making the query run slow.

Carlos:                  That’s Tim [inaudible 00:05:24].

Speaker 5:           [inaudible 00:05:25].

Carlos:                  Okay. Hold on. We want to stay on this topic just for a little bit longer. ON this topic. Okay, here we got. Make sure you state your name.

Speaker 5:           [inaudible 00:05:44]. Independent contractor from Baltimore. I have question to team.

Carlos:                  I don’t want to do another question just yet. It’s the same question?

Speaker 5:           Question, yeah. For me it seems like you don’t have access to the host machine, so you have access only to the chill box, right? It seems like something [inaudible 00:06:04] or in the cloud generally. You have a machine there, you run a query there, and it runs slowly and you don’t know why it is slow. Is it your query or it’s maybe family and neighbor that are in the cloud? Maybe that approach will link to a dancer. Any thoughts about how to [inaudible 00:06:30] in the cloud?

Allen:                    Azure just released a new product. We’re pitching products here.

Carlos:                  That didn’t take long.

Allen:                    We’ve got these lovely sweaters with kittens on them. They’re in triple X now, then we’ll get … Anyway. Excuse me. We’re in Pennsylvania. Yeah. [inaudible 00:06:47]. The Assure product team announced this past week at Ignite, I believe, it’s called the Assure Tuning Advisor. It does a real time work load capture to make assessments on how the queries are running or your response times, and it gives you actionable tuning information where you can take that information and either apply it or save it off to maybe test on another workload or another system. Maybe in a situation where you were abstracted from being able to do some really deep trouble shooting where you couldn’t get to all the components, but you did have something running out in Assure, which I’m very pro. You’d have a tool like this Assure Tuning Advisor.

Tim:                       It hasn’t been announced per say, however talking with the PM manager of the query store they are going to be looking enable the query store by default in Assure super database. That’ll be another one coming.

Carlos:                  [inaudible 00:07:54].

Dave:                    Hi. This is Dave again. I understand what you’re saying Allen, about using Sequel Sentry. I’ve used it before. Paul did a great job at helping me out with one of my issues before. In this case, I’ve got a production data load for a data warehouse running. Telling me what the production thing is right now might help a little, but what I’m currently working on is making that go faster. Now I’ve built new indexes that I believe are going to work better, but because I’m on a virtual development box I can’t tell whether that index is helping or not because I don’t have the right performance things. I can’t put it into production, because it might make things worse, to get the actual result out. I only have this development box. Like I said, if there was something quick that I could to say, “Oh. Look, it’s because I’m memory confined.” Something like that. If there were any easy things as a developer that I could then point the administrator of the virtual machine to do to help me, because sometimes administrators of machines don’t know where to look and don’t want to look bad so they just say, “We’ll figure it out.”

Allen:                    I will say one thing and then I’ll hand it over. That is at least the execution plan will tell you if the index is being used. Then you could then potentially push that over to your production DBA’s to say, “I’m looking at implementing this index, can you please tell me maybe from the DMV’s what Sequel Server is using. What’s the usage of this table, what’s the read rights currently on the table?” TO at least get a guesstimate or expected impact as a result of the indexes. Of course, we’re assuming that assumes that you have good care and feeding for your indexes as well.

Tim:                       Yeah. I used to work for a company that had made the mistake of in the enterprise combining a transactional system with data warehousing. It’s just classic Squeal Server problem. I had to do a lot of work without any tools, third party tools, to make a case to do the due diligence and separate these tiers out. I used things like task manager and quickly spun out [inaudible 00:10:19] and did some [inaudible 00:10:20] 14 day log captures to show them spikes, show the impact of things like index maintenance we’re having on the platform broadly as well as the horrible customer experience. In the absence of really good third party tools, even if they’re free, old school task managers used to say what’s on fire? The built some perf mod around it just to see if you can get some history around that.

Vladimir:              My name’s Vladimir and this goes back to my very first session of Sequel Saturday. IT was by Kevin Klein. 10 Things Every Sequel Developer Should Know. During that session he explained couple items available on T Sequel, one of them is DBCC Optimizer what if. That allows you to generate query plans based on the production size and production RAM and CPU. Optimizer will change the plan, how it run in the production environment versus how it’s running in your environment. If you have a two cores CPU, but in production have 64 cores, using that option you’re able to say run as if you had 64 cores. Show me the kind of plan you’re going to get. Besides that there is a way to generate statistics and indexes from production environment and put them in your dev environment, so then even thought you don’t have the same amount of data you’re using the same indexes and statistics that you did in production.

Carlos:                  That’s just regular DBCC commands from packaged in with Sequel Server, it’s not an add-on?

Allen:                    Yes. We’ve taken a peek at 2016, no changed there to your knowledge?

Speaker 7:           I’m not sure.

Allen:                    Okay. We didn’t maybe [inaudible 00:12:13].

Johnathan :         This is Johnathan Stewart, SQL Locks. I had a question to follow up for you. You mentioned these are data warehouse loads. Are you just doing procedures in Sequel agent jobs or using SSIS? A little bit more information because your problem may not even be your selects. Your problem could be your transformations in you [inaudible 00:12:34] package too. If we can get a little bit more information on your load, that’ll help us give you other ideas to help trouble shoot it.

Speaker 7:           [inaudible 00:12:47].

Dave:                    Actually, it’s kind of a mix. I found that issues with my SSIS package. It’s probably up to 70 million claim lines now that we have to move, but I split them out because I end up joining the different tables based on the type of … I’m in healthcare, so if it’s an inpatient claim there are a lot less of those but I also need additional data about what was your admission date and what was your admission diagnosis. For the claim lines overall 98% of them come out of one table. I’m using SSIS for the three types of sub joins that I have, but I’m using a direct insert statement from a Sequel Select for the fourth one because it was so large it was crushing my SSIS push through.

Johnathan :         With that, just the little bit of information that we have, obviously we can’t see your system and stuff there. If that amount of data is crushing your system the way you had it just there, are you doing a certain type of components that may be synchronous such as sorts and stuff like that? That can slow it down as well. There’s certain things in your package, tuning your package, whether you want to tune the buffers, the role counts. There’s a lot of tuning that can be done for SSIS as well. It may not just be … You still want to do what Allen talked about too, checking the database stuff before, but that’s the first part of it. If you’re using SSIS there’s a whole another world of performance tuning behind it as well that you may want to look at. One of the things that I see a lot of clients, they do run into, they don’t realize that they’re doing a lot of synchronous versus asynchronous transforms. With a synchronous transform you literally got to wait for the whole thing. Like a sort, obviously you can’t sort a data set unless you have the whole thing. You have to wait for it, and it slows it down, and then there’s you’re memory and all that type of stuff as well. Begin to look at how your packages are configured. You can just Google asynchronous versus synchronous components SSIS to see.

Carlos:                  Bing.

Johnathan :         We’re not looking for pictures.

Allen:                    Something I guess I will say there from a set up perspective. I guess I’ll say two things. One is [inaudible 00:15:08] SP. Not ISP who is active, that’s Adam’s. SP Blitz, right? That’s just from a like, “Is my system set up correctly?” That’s one area. My partner Steve Stedman is databasehealth.com. If you download his it’s actually more of a gooey interface. He has what’s called a quick scan report. Where I was going with that is from a set up perspective in your development environments particularly if you’re running SSIS and your database, we talk about memory constraints, if the amount of memory that you’re giving to Sequel Server by default exceeds the box and then you’re trying to run SSIS on top of that you need to compartmentalize those two so that they’re not kicking each others … They’re not kicking each other.

Vladimir:              I would suggest maybe if you run your queries in SSIS maybe if you run your queries in SSIS maybe use option set statistic on your own. Then you can run, for example, query before implementing index after implementing index and see the difference of IO’s. Use then a next one. See maybe any bottleneck, if you do a lot of inserts, maybe there is a problem with gam contention. Then maybe there is a lot of page splits during the in sort. There could be a situation how you upload the data. Maybe you clean the data before. Do you? Or you just always do inserts?

Dave:                    It’s all inserts. [inaudible 00:17:02].

Vladimir:              Here is a problem. Here is what I’m leading to. Maybe [inaudible 00:17:06]. Maybe delete will work, yes. Yes. [crosstalk 00:17:11]. Wait, wait. Listen, listen, listen, listen. Okay. When you delete from the table you just delete all records. Yes, it’ll take longer to delete, but it will faster to insert because all pages are already allocated and index structure is already built. When you have clustered index you can try that with clustered index built on [inaudible 00:17:42], for example. If you [inaudible 00:17:45] it will one speed insert. If you just delete it it will be much quicker and much less page splits.

Dave:                    If your records are already in order.

Vladimir:              There’s another thing. Maybe you lose your time on ordering records.

Dave:                    [inaudible 00:18:13].

Allen:                    [inaudible 00:18:13].

Carlos:                  That’s something that Johnathan brought up. That’s interesting, I hadn’t seen that on the deletes. I’m interested to find out more about that now. Yeah, yeah. That’d be very interesting. Again, just from a am I set up correctly? Those tools will help with that. [inaudible 00:18:31] contention means I don’t have enough tempt [inaudible 00:18:33], or files, things like that. Those are things that I can be doing in those dev environments which probably didn’t get set up before, and would help me there as well.

Kevin:                   Kevin [inaudible 00:18:47], professional podcast groupie.

Carlos:                  We’re not increasing your pay Kevin, just so you know.

Kevin:                   Okay. I want to take a step back and say we’re getting into some pretty hairy things. I know we talked about gam contention, we’re talking about some fairly detailed things. At a higher level, if i have .net code for example, the code is slow or it may not be slow, I don’t know. The important thing is let’s get some metrics on what calls are slow, what lines are slow, when are we reaching out to external resources, and how long is that taking? Can we trace that out, can we log that information, and then be able to figure out, “Look. This line of code is really slow. What’s going on in there? Okay. Well, it’s going over a network and it’s connecting to my Chinese data center and somethings happening over there and it’s coming back three hours later.” That’s where you know, “Can I at least get down to that level?” I think that digging deeper, deeper in is interesting. I think we have to get there. When you’re starting out, when you have a code base that somebody’s just saying, “The code is slow.” Start at the top level, get those early metrics.

Allen:                    That’s a great point. We were having an issue and experience where the time out was set for a minute. It was making “database calls.” They’re like, “Well, their database is timing out.” Oh, really? We put some tracing on there. We’re like, “Look. Yes, there were some things that were taking a couple of seconds which maybe should have been faster. Okay, I get it. The cumulative transaction for what you’re asking is not taking more than 20 seconds. There was several components in there. Then they went and did exactly what you said and put some, in my code, how long does this take? How long does this take? They’re like, “Oh. Gosh. Guess it’s not a database problem. Let’s keep looking at the code and see if there’s something else like some function that’s spinning, it’s causing the problem. That’s a great point.

Tim:                       Well, first I was mad because there was a guy in a SQL Sentry shirt and my first thought was, “Oh. Man, plan explorer is awesome and they did just make it pro.” Two other things that are different that I wan to say is one, I think Kevin has a good point that part of the original question is, “Hey. I don’t have a lot of time. THere’s all these different options. What’s the quickest bang for the buck?” The first is execution plants, but another thing I think worth looking into is wait stats. If you do [inaudible 00:21:29] SP Blitz it’ll give you a little bit of data if it has some really bad ones, but I know Paul Randal. If you Google Sequel wait stats, his blog will generally be the first thing that pops up [inaudible 00:21:39].

Tim:                       Sorry.

Carlos:                  The post is tell me where it hurts by Paul Randal.

Tim:                       Yeah. It’ll come up and you can just run that script and it’ll tell you what’s the biggest performance bondage. What is the Sequel waiting on? You can clear out those stats right before you run your query and get an idea of what’s the performance issue.

Vladimir:              Since he mentioned wait stats, one thing that the plan explore does, I don’t sell it, it’s free. Plan explorer is actually will capture wait stats for your query when you run it. Now you get to see what’s impacting your individual query right there in one of the tabs.

Allen:                    Okay. Very nice. I guess I’m going to have to update my version of plan explorer. One thing I guess I would say with wait stats, and probably in the dev environment more than likely, are big wait stats going to be our disc? That’s probably going to be a scenario of contention where we’re going to have problems because we’re on commodity hardware probably over provision VM’s. Yeah. The age of the disc and all those things as well.

Speaker 7:           [inaudible 00:22:59].

Allen:                    Something I guess I am interested to come back to. We went a little tangent a little bit. The question ultimately, or the thought was, I had a licensing issue and that limited what I was able to use, what Sequel Server was able to use.

Speaker 7:           That was the second half of the problem.

Allen:                    That’s right.

Speaker 7:           [inaudible 00:23:30]. Multiple times per row.

Allen:                    My question, I guess, is thoughts on finding licensing issues in your environment.

Speaker 7:           [inaudible 00:23:44]. We’ve been licensed [inaudible 00:23:44] since then.

Allen:                    I thought that was going away.

Speaker 7:           I hear you.

Allen:                    Is that still allowed.

Speaker 7:           I don’t have the money to change it.

Allen:                    I thought that was gone.

Speaker 7:           The newer license [inaudible 00:24:00].

Allen:                    Okay.

Speaker 7:           They don’t even allow that anymore.

Allen:                    Yeah. I was going to say. Even the operating system is moving to core seating, or core licensing. Okay. I guess that answers that question. Bye, bye seats.

Speaker 7:           [inaudible 00:24:38] because you’re going to need it.

Speaker 10:         [inaudible 00:24:41]. We worked a deal so they got some systems I’m migrating.

Speaker 7:           [inaudible 00:25:12] to try to determine what the right numbers are for us to get psych license on that. I’m not involved in any of that. Like I said, I’m a developer. I don’t talk about any of that.

Allen:                    As we hand this over to Kevin, what if any tools do you have to know if you’re still on seat licensing? Is that even a thing?

Kevin:                   I have no idea about that. I was going to say in contract distinction to Johnathan’s implicit bias against virtual machines, I will say that it is much easier to license VM hardware than it is to license bare metal stuff. Hey, I don’t care how many VM’s I have sprung up on this thing. We’ve license the rack, once I license the rack I can just throw stuff on there. At least as far as I know I am not a licensing person, nor do I pretend to be, even on podcasts.

Carlos:                  You’re just a big fan or over-provisioning, is that what you’re telling me?

Kevin:                   No. It is not over-provisioning. What is it is saying, “You know company, you buy this big rack here, we’re going to license the rack.” That’s actually, I think in our case, it was cheaper than it was if we were to purchase physical hardware. We also get some nice benefits like being able to v motion machines across and being able to replicate machines, very easily sped up new machines. All those fun things you get with virtualization, but in the end our bill, again your mileage will probably vary, our bill ended up being a little bit less than if we did everything on physical hardware.

Carlos:                  Where’s Neil Hamly when we need him, our licensing guy?

Speaker 11:         I don’t believe I have an answer on how to do time on the license type of sales. Select sell for property, license type. From that website called Stack Overflow.

Carlos:                  There we go. Okay. You closed it already. Select server property and then you pass in license type as a parameter. Or num license, and that will give you that information. There you go. Now you’ll be better positioned. Yeah, there we go. We’ll put that out on the tour.

Speaker 7:           [inaudible 00:27:07].

Carlos:                  The transcriptionist is not going to know who’s talking now. Okay. I’m not sure how much longer we have, but I want to switch gears slightly. I don’t know. I was at Ignite this week. Something just clicked. I was in Doctor Sirocious session. He’s the VP of the data platform at Microsoft, right what Sequel Server falls under. A lot of the advancements that they’ve made in 2016 with column store, even some of the services in Assure, and the migration to Sequel Server on Linux. Basically his presentation or his focus was that we want sequel server to be a transactional system and an analytical system as well. That’s going to change things for what I’ll just call the traditional folks. I guess my question is thoughts on that and how do we start preparing?

Kevin:                   I’m going to say I embrace our ant overlords. Yes, absolutely. I think it’s a good thing on net, first of all. Technologies are fun, it’s fun to learn these types of things. I like Sequel on Linux. I like this idea first because of the licensing fees that I get rid of where I don’t have to pay for Windows server licensing. Secondly, it helps pull in that marginal unit shop, Linux shop that they need a real database platform and my SQL isn’t going to cut it.

Carlos:                  I guess I should say, Microsoft actually put out some information. The number one database platform for “big data or analytics” is currently My Sequel. Sequel Server’s number two. I think that Microsoft would like to just provide that solution end to end to be able to say, “Hey. I’m there.”

Kevin:                   You’ll see the same thing with this move to Assure Data Lake. You’ll see this thing where we’re moving toward [inaudible 00:29:33] clusters, we’re moving toward things that 5, 6 years ago were scary. They’re not as scary anymore. I think it’s a good thing. IT’s a good thing for us to, in the community, to learn as much as we can, embrace what we like, poo poo all the things that are terrible. Sure. I don’t get paid by Microsoft so I’ll trash the stuff that sucks. It’s nice to see them moving in this direction. It’s better than we know what the world wants and the world wants NT35.

Speaker 11:         I was talking to one of the members on Sequel Sever on Linux product team and one of the concerns they had was, it’s kind of to your point, as Sequel Server ties it future to the future of Windows Server and more people are moving over to open source server platform workloads they were in a intentional bifurcation move to get Sequel Server uncoupled from a reliance on Windows Server. You’re going to continue to see more and more improvements on the Sequel Server on Linux features, you’re going to see more of the same … You’re going to see parody basically, what you see on Windows Server. They’re still working on things around SSIS, SSAS, SSRS, but all the core security features are there and it’s all quite intentional to embrace your ant overlords.

Carlos:                  I just want to point out [inaudible 00:30:56], we’ve actually reached out to the PM thanks to Tim, and we’ll be having a session dedicated to that and talking more deep dive about that.

Allen:                    Not specific to your question, Carlos. Again, this is Allen White. I’ve been in IT for 42 years now. Over the life of my career I have found that technology changes enough that you almost have to re-tool yourself on average of every five years. Now, fortunately five years from now I’ll be 67 and I can retire and spend my time on a golf course, but this is not anything different than what I’ve gone through. When I started in IT personal computers didn’t exist at all. You always have to be learning. You can never ever stop learning if you’re in this field. You should probably do it any field, but especially in this field. Never stop learning. Right now the big things are our services, right? The whole push towards analytics, just like you’re talking, just be prepared to always learn and then you fine. You just move along as things grow.

Tim:                       I think if nothing else, one of the goals to plug the podcast a little bit is to help data professionals, or developers even, to be aware of some of these other technologies. Just how it influences them, we’re not necessarily looking to master all of these things. You want to at least be able to address them and to understand, “Oh. When I hear the word are I’m talking about a statistical language and not the letter of the alphabet.”

Carlos:                  Not to put too fine a point on the learning new things, but when you speak with Travis Wright he’s going to talk to you about the importance of container-ization. It gives them the flexibility to move around containers of platform support, or I’ve used that inappropriately. He’s going to ask you about both the popularizes of that in your user group.

Tim:                       Right. I think that’s a whole new area. At least I’m speaking personally for myself now. Containers, I hear that in the application in the dev space. I’m like, “Oh. My gosh, I’m a little bit nervous.” I think it will. It’s going to come our way.

Kevin:                   Everything that you heard from people saying for the past several years learn power shell, learn power shell or you will lose your job. Containers are the same thing. Learn Docker, learn Docker or eventually you will lose your job. Windows Server 2016 will have Docker support. Windows 10 professional edition has Docker support. We’re seeing this already on the Windows side. We’ve seen it on the Linux side for several years now. It will get more popular on this side of the house as well. It is time to get ahead of that because otherwise these containers will come up, you will have on clue what you’re doing with them, and bad things will happen. Unless you’re Allen and you’re golfing.

Carlos:                  I guess I should say being at the Ignite conference Docker was a gold sponsor. They had a big booth at Ignite. They were a major player in the center section kind of thing. Yeah. I agree that they’re particularly because Microsoft is adapted in Assure. It’s going to make it’s way down stream. We’re going to start to hear more about it.

Speaker 12:         Something to compliment what Allen said. He said you have to re-tool yourself every five years. I’ve been working with Sequel for about five years, which apparently means that I’m going to have to learn everything new. That’s what I was about to get to. I did two presentation on Power VI. You talk about the ways Sequel and data is going and it scares me a little bit because Power VI is a perfect example. Every week they’ve got new features they’re releasing and every month the tooling, the desktop tooling, it’s different. You look at Assure, you look at Power VI, you look at Sequel Server management studio changes every month now. I haven’t been in the field for 42 years, so I don’t have the perspective to go, “Oh. This is going to be okay. This is normal.” To me, as someone who’s new, it feels like it’s accelerating and I don’t know how you keep up. I listen to five different podcasts and I watch plural sites and all this kind of stuff, but I just don’t know how you keep up with how quickly things are changing.

Carlos:                  I think the days of begin the master of everything is over. I think Johnathan’s going to follow up with that.

Johnathan :         Yeah, that’s actually where I was going to go with it too. Everybody knowing everything is going to be reserved for MVP’s like Kevin, maybe Allen, but he may be golfing. I don’t know. No, no. I think the days of everybody knowing everything and being masters of everything, because there’s just so much coming. It gives you the benefit too of being able to chose where do you want to specialize? You can be a high end VI professional and know nothing about [inaudible 00:35:53] because there’s just so much other stuff as well. We’re to the point now where you can specialize in master data management and have a great career, make a lot of money, and not know anything about anything else. Don’t feel like you need to know everything. Find something that you like and run with it. Let that be your thing and just run with it. Don’t feel like you need to know everything.

Kevin:                   Following up on Johnathan I absolutely 100% agree.

Johnathan :         That’s right.

Kevin:                   Let me give you an analogy. You had [inaudible 00:36:19], who was probably the last mathematician who knew everything about math. Part of his knowing everything about math was the fact that he opened up so many doors that future mathematicians came up upon. There will never be another [inaudible 00:36:35] because nobody will understand the totality of mathematics to the extent that he did. It is a good thing on that because we’ve learned so much more since he opened so many of those doors. IN this data platform world, yeah. You could be the guy who knew everything about Sequel Server 7 or Sequel Server 2000. You could have everything in there, but things have changed. We’ve gotten more things. You will not be the person who understand that totality of Sequel Server 7, you will not be the guy who knows how everything functions under the covers. Don’t try to be that guy, don’t want to be that guy, don’t aspire to be that guy, be the guy who knows something, maybe a lot about something and a little bit about a lot of things. Don’t think that you have to be all omniscient about the entire platform.

Carlos:                  I think one potential new horizon there is, this is going back to our CEO panel we had in that executives want IT to be more familiar with the business. That may be the next hurdle that we have to embrace is-

Speaker 7:           [inaudible 00:37:47].

Carlos:                  You’re going to know these niche topics, but then you also need to understand how it applies to the business so that the business can become more profitable. You create value there, right, rather than just performing an IT function. Very good. How we doing on time? We have 8 minutes. Final thoughts. Dave, you want to?

Dave:                    I don’t know if it’s a discussion for the podcast, but I’ll ask it. Because of everything else that’s going on in our environment and trying to keep up with all the new stuff I’ve been trying to look at Assure lately, and some of the stuff that I found in Assure that would make it difficult for us is if you try to use link servers instead of a virtual machine is the Assure database won’t let you sue link servers. When you have data in despair places that you’re trying to pull together for a report you’re hosed. You end up having to put them all in the same instance or something, and then there’s some issues with if you refresh your data every week if you want to take your data and load it into Assure you have to create a whole new database each week and then rename the database or something in order to get all your old links to work again because you can’t reuse the same database name when you do a restore from an MDF or something. [inaudible 00:39:14].

Carlos:                  I’m not familiar on the restore, that’s new to me. I guess I’d have to find out more about what that topic is. I guess-

Dave:                    [inaudible 00:39:29].

Carlos:                  Let’s make sure that we’re talking about the right things here. Are we talking about a VM that lives in Assure or are we taking about Assure Sequel database?  There is not “DR” to Assure Sequel database, however you can … There is some HA type features and you can make those connections, but you can take a database and push it into Assure. I guess I need to figure out more of that.

Allen:                    I thought when you do a URL base back up and you do a URL base restore in the same instance you’re right, it doesn’t let you do a restore over with that URL base backup.

Dave:                    You need a new name and then you have to drop the original one and rename the new one to the original name or something like that is the way I believe I read about it. Again, I just didn’t know because I do so much stuff that’s more reporting based I have to report to the depart of human services certain stuff. I just restore databases that are production and OLTP onto a server where I can then do something with it without completely destroying the OLTP environment. Now do I restore to an Assure database? Well, if i have to recreate the database using Assure database then that’s a pain. Whereas that tells me that I need a VM instead and on a VM I have to do the copy to the VM area, blog area, whatever and restore from that in order to not have to change everything. I didn’t know if anybody had run into that, resolved that issue.

Carlos:                  I guess the pain point really is that that database then relies on other databases that you’d have to get that networking for, that’s really the pain point. Restoring to a different name shouldn’t really be a problem. IT’s just all the other downstream components that you’d need, the connections.

Tim:                       Yeah. Old step, clear out the connections to the old one, drop it, and then rename. Yeah.

Kevin:                   This probably won’t solve your exact problem, but this is an opportunity to plug power VI. They’re a solution for a lot of this stuff because you’ve got the on premises data gateway and you’re able to pull in your data. As long as it’s something where you’re doing more analytics and you’re not trying to do all your LOTP kind of stuff, but you’re pulling in data that’s highly compressible. IT’s able to just pull in the data for you, send it off to Assure, it’s running in memory, it’s all compressed and everything, and you don’t have to do anything special. You don’t need any inbound ports, you just set it up and go. You’re going to see more and more hybrid solutions like this where instead of trying to ship everything up they’re going to provide a bridge to work with the data that’s on pram. I think you’re going to see more and more of that instead of the old way of, “I need to copy everything, put it on a bus, and ship it over to their data center.” Kind of deal.

Carlos:                  Good stuff. Okay, final thoughts? No. Awesome. Okay. Well, thanks everybody. Let’s get a big round of applause. Here we go. Everybody clap really loud, as loud as they can. Here we go. 1, 2, 3. All right.

Episode 64: Are Microsoft Certifications Worth It?

Have you ever thought that Microsoft certifications were a waste of time? Depending on where you are on the experience scale, certifications still play a large role in hiring and career development. You are probably in one of two camps–you either an employer and you want some peace of mind in the hiring process or you think the certifications are bunk.  The aren’t too many people in the middle ground it seems.  In Episode 64 of the SQL Data Partners Podcast I talk to Patrick Thomas the program manager for the Microsoft MCP and MCT programs and the role they play at Microsoft and in what way they continue to provide value, including what’s in store for data science learners and what he thinks of the Microsoft purchase of LinkedIn.

Listen to Learn…

  • What being a Microsoft “MVP” will do for your job prospects
  • The history behind the Microsoft exams and why they became so popular
  • How to use certifications to your advantage in job hunting
  • The buzz around data science certifications
  • How Microsoft exam online proctoring works
  • Patrick’s thoughts on Microsoft’s acquisition of LinkedIn
  • The story behind the now-extinct Master program
  • Plus, what’s coming down the pipeline to replace the master’s program

Episode 64 Quote:

“Right now, we don’t really have an equivalent of a Masters program, or that boot camp style learning. We did however, announce this summer, the Microsoft Professional program. It’s in pilot right now….” – Patrick Thomas (Listen to hear more)

This week’s Tuning Review: Visualizing VLFs

This week’s tuning review is on Virtual Log Files, also known as VLFs. High VLF counts can slow down the transaction log backup, the crash recovery, and writing to the transaction log. Reducing the VLF count usually speeds up the crash recovery “In Recovery” time. This post on DatabaseHealth.com shows you how to visualize VLFs in your log file.

About Patrick Thomas

Microsoft MCPPatrick Thomas is a Microsoft Program Manager for the MCT and MCP programs who has a background in aerospace. He has been with Microsoft for almost two years.  Connect with him on LinkedIn.

Resources

Microsoft Certification Program Benefits
Microsoft Professional Program FAQ
EDx Data Science Program from Microsoft
Microsoft’s new online certification program kicks off with data science specialization
How to become a Microsoft MVP
Microsoft MVP Overview

Transcript

Carlos: Patrick, welcome to the program.

Patrick: Hey, thank you very much. Happy to be here.

Carlos: It’s great to finally get to talk with you. We’re here at Microsoft Ignite in Atlanta. Exciting to see all the new, different things that are happening. Glad that we were able to bump in, and talk to each other. Ultimately I guess, your role is the leader of the MCP program. Why don’t you just tell us a little bit about your role, and maybe a little bit more about the MCP program.

Patrick: Sure. My role at Microsoft is, I’m the program manager for both the Microsoft Certified Professionals, and the Microsoft Certified Trainers. I get both of those. When it comes to the MCP program itself, essentially if you passed a technical certification, that you get that MCP, you’re part of the MCP program. You’re …

Carlos: Great.

Patrick:Yeah. You’re one of the millions. Now, obviously there’s different levels. We have our MCSAs, our MCSEs but once you pass that technical exam, you are an MCP. From a program standpoint, we manage everything. Basics, like you’re transcript that shows what exams you’ve taken. Maybe if you have an exam scheduled, where is that located. Where’s your certification information? Also, there’s a benefits side. As a certified professional, you have access to what’s called an MCP profile. It’s a customization profile, that you can share. That would have your certification data. You can link it to your social sites. Your vlog properties, things like that. Essentially it’s a Microsoft database, that is full of certified professions vetted by Microsoft. You have a chance to publicly share and customize your profile within that.

Carlos:  If you wanted the links on your resume, what have you, you could linked to that, and say here’s my things and it’s Microsoft site. They can see, yes I do this?

Patrick: Yeah. It think it’s a good way to validate that you have the skills and credentials that you claim that you have. We’re not trying to, with that profile, replace LinkedIn. I would say most everybody uses LinkedIn. I know that I do, but for MCP specific, it really doe highlight your achievements through certification. Some of the other benefits that we have, we have a discount for MS Press, both print and eBooks. Which a lot of MCPs, don’t know about.

Carlos: I think that benefit is going to be, a little less than, because Microsoft Print is basically being outsourced or they …

Patrick: The way that we’re managing the business has changed, but the benefit as far as what you’re receiving, and the benefit will be the same.

Carlos: I see. Any books that do come off the Microsoft press, even though the inside organization has changed a little bit, you’ll still get that benefit.

Patrick: Yeah.

Carlos: Okay.
Patrick: Also, we have a partnership with a company called Guru. Which is a technology job placement company. They’ve grown exponentially. They’re here with us, at Ignite. They’re helping to sponsor the MCP party tonight. They’re just about to release a new, higher platform. Where MCPs are preferred in job search results on their platform, when employers come to the site looking for talent. Which is a really awesome way to highlight the achievements of the certified professionals. They have a really outstanding site. I was encourage everyone to go check it out. It’s guru dot io.

Carlos: I guess let talk a little bit about that. I know that Microsoft training continues to evolve. I’m not sure if was under the MCP, but ant one time we had born to learn dot com. Which I think is still there. There’s the Microsoft Virtual Academy, and what not. It sounds like there’s this evolution Azure, as things can just change so quickly. That now there’s another insight or another point through Guru that people can get training.

Patrick: Yeah. I think that, Guru is external to Microsoft. They have their own sight, their own business. If a Microsoft Certified Professional came to the Microsoft slash learning site, logged in. They would go to their individual dashboard. From there, they could actually see the link to the Guru page, and it’s going to contain all their information. To your point about the web properties, it’s a little bit scattered. You have MVA, you have Born to Learn, you have Microsoft Learning site. There’s a digital evolution effort going on within our organization to really make that more of a connected, cohesive experience. As we get into December, January of 2017 you should see some improvements there. To where it’s just easier to ind what you need. Whether that’s training, certification, or just information about a technology.

Carlos:
I know, there was an announcement earlier this year, about online proctored exams. I think this is probably just getting out to reach other people that maybe don’t have a hosted proctor as near to them as well. Can you talk to us a little about how you continue to reach out to those in different areas? Particularly, places outside the United States.

Patrick:
Yeah. Online proctoring, it’s now available world wide. I think it really gives everyone the opportunity to be able to take an exam, no matter where they’re at. Depending on where your located, if your far away, even if you are in the United States. You might be two hours from a testing center. With online proctoring, a little bit more convenient to be able to sit and take an exam. We’ve seen good growth over the last year, since it’s become available. Just really trying to remove that roadblock of having to physically be at a testing center. I thunk it is a really great benefit. It’s a good process. Essentially, it’s the same process as sitting for an exam, in-person. With the same requirements.

Carlos: Now, are you actually seeing somebody looking at you.

Patrick: Yeah. You do. One thing about online proctoring, if you’ve never taken an exam before, if at all possible, I would recommend people take an exam in person first, because the online proctoring experience is a little bit unique. In that there is somebody looking at you. I can’t remember for sure, but I think you have to stand up, so they can make sure you don’t have anything in your pockets, and

Carlos:
You go through the same process you would, like there. The little e pat down, right?

Patrick: You’re prepared for it, if you’ve gone through it in person first.

Carlos: I see. I can appreciate that, because the little timer there is clocking down. Again, I know they talk about time. I guess I have been one, that I like to take my time. I’m not saying that I’m down to the seconds, before I’m hitting the submit button, but I review. I do normally go through the test twice. I can imagine that having somebody watching you there could change it. You just need to be prepared for that, in the mental preparation of the exam.

Patrick: Yeah, for sure.

Carlos: Let’s see. Microsoft has purchased LinkedIn. What do you think … well, so no announcements have been made. They’re still working out some details. Maybe what would you see there, or what possibilities could we be expecting through the merger with LinkedIn?

Patrick:It is kind of tough, in that the deal is still pending. I know that, even though some of us within Microsoft, especially within Learning, we have questions about what’s going on. The fact of the matter is we just don’t know yet. However looking at LinkedIn, and Linda dot com, which is a LinkedIn property, there’s a lot of great products that they have. LinkedIn itself is such a powerful tool that I know that, I’ve used it in my career. The existing benefit that we have right now for MCPs is they can share their certifications on LinkedIn, and add it to their profile. What I’m hoping is, there will be discussions as Microsoft learns more about LinkedIn. They learned more about us. When the deal gets final, hopefully down the road, that brings new opportunities for MCPs. Right now, we just don’t know, but I definitely hope so, because they have a great platform.
Carlos: I did have a thought. Ultimately, we’ve been talking about MCPs, which is the one exam.

Patrick: Yeah.

Carlos: Ultimately, we have the MCSAs. We have the MCSEs. Other than having the combination of exams, are there any additional benefits to those programs specifically?

Patrick: For MCSE? That’s a good question. At this point, the benefit is really in the higher level cert, itself. In that, hey, I have MCSE, versus I’ve passed on exam. We are coming out with badges. Cert specific badges that are a digitally shareable badge that you put essentially anywhere. On your own website, on your blog, on your LinkedIn profile. That would be MCSE, MCSA, or an exam specific badge. I think the MCSE, kind of brand itself. We believe it still holds the value. Other than that, the benefits for MCPs is pretty much similar at this point.

Carlos: Okay. Very good, because our audience is a SQL server related. At one time, there was a training course called the Microsoft Certified Master program. This is a week long experience. You go out to Microsoft, it was on site. It was very intense. It was kind of like Ignite, but only SQL server. At the end of it, you were supposed to pass an exam. Previous to that, there was something called the Ranger program. That was it’s predecessor. When it went away, some of the prestige, if you will, in the community went with it. Not that we’re looking for a replacement, but what is the thinking there, and new thoughts around it? How do Microsoft employees get that very specific and in depth training to then do supportive things like that?

Patrick: I think, since I’ve been in the role for about two years. I’ve been at Microsoft for about two years. I run across people all the time that talk about the Masters program. Like, “Man. Why did you get rid of it?” I wasn’t here, so I don’t necessarily know, but I do know that something like that, that’s testing your skills. Especially in an intense hands-on way. If you work hard at something, you’re going to take pride in it. When it goes away, that’s kind of tough. I think that as an organization and as a company, we need to be a little bit more thoughtful about keeping things in market that have some staying power.

Right now, we don’t really have an equivalent of a Masters program, or that boot camp style learning. We did however, announce this summer, the Microsoft Professional program. It’s in pilot right now. There’s a data science pilot, and if you go out there, and you do a bing search on Microsoft Professional program. Formally known as the Professional Degree. You can find a lot on information, and there’s actually somewhere you can sign up to be on the waiting list to be part of the first official class. The Professional program is really focused on job skills validation, so data science. The data science field, and it’s a combination of learning through online video. Using Open edX, then the mentoring, the community space, with doing hands on labs. It’s a pretty intense program. I mean, it’s not in person for a week. However, it may be like six months. To where you’re investing ten hours a week.

Carlos: Sure.
Patrick: I think it’s not a one for one, but we’re really excited about it. Last I heard, we have tens of thousands of people on that waiting list, ready to get in.

Carlos: No doubt. I think it helps, at least in that case, it’s kind of a, I want to say a niche topic. What’s the word? It’s a buzz word, right?

Patrick: Yeah.

Carlos: I don’t think it hurts the cause there either.

Patrick: For sure. People are excited. Right now, it’s data science. There will be other tracks as we go.

Carlos: Since I have you here, I can’t help but asking this, as we kind of get to the end here. There are those in the community, that we talk about certifications. The SQL community is very focused on training. We have Pass, who’s actually here. Microsoft gives them a booth. Connect, Learn, Share. That idea of, in the SQL community, it does give their fair share of time to helping others learn and what not. There are some voices in there that talk about certifications. Basically, they’re bunk. In that, A it’s not worth it. There are basically ways to cheat. I guess, what’s your response there? What’s your thoughts there?

Patrick: Yeah. No. It’s definitely not the first time that I’ve heard certifications come into question, as far as the value. I think a few things. One, from an exam security stand point, and there’s the data dump sites where are people are, exam cram type stuff. We do work very hard at policing those, to try to make sure the integrity of our exams are there. The exam process, to create certification exams, is very comprehensive, with SMEs coming. There’s internal reviews with Microsoft. I do think the process is sound. I think the biggest issue, is not necessarily with the exam itself, or even people taking the exams. It has to do with the value in the market with the hiring mangers. For instance, I went and got a bachelor’s degree, and I went and got a master’s degree. Yeah I wanted to learn, but then also I believe that those things were going to help me get to where I wanted to go.

Carlos: Almost like the key that opens the door.

Patrick: Exactly. I think at one time, certifications were very much that. People with a high school diploma, that were getting their certifications. Working toward that MCSE, that was solid gold, in the market. I think the reality is, right now, we’re not in that same environment. How do we make sure that Microsoft is doing it’s part with the companies, and the hiring managers to really push the value of certification, and highlight the right talent? To where the people that they’re hiring with those certs, are doing, and filling the jobs that they need. The last thing I’ll say about it, is when I work with the Azure team, this year, they had roles to fill in the Azure space. They came to me, and said, we want to hire Microsoft certified professionals with Azure certifications. From a Microsoft standpoint, we still use that to validate the skills of our people.

Carlos: That’s an interesting point. I think, particularly the Azure certifications have changed that a little bit. Being that they’ve made, it’s harder because things change so quickly.

Patrick: Yeah.

Carlos: It can be hard to keep the testing aversion control.

Patrick: Absolutely. We are making a lot of changes on how we manage, both our training content, and our exams, to keep up. The pace of technology, it’s a lot faster than it used. We need to …

Carlos: Thanks Satya.

Patrick: Yeah, right? We need to adjust. I think that we are very much in the middle of that transformation. I think with the simplified certification tracks, that have just been announced, the shorter path to MCSA. The shorter path to MCSE, the recert going away. To where we have a time stamped MCSE, I think we’re really head in the right direction, but we got to just keep pushing.

Carlos: Very good. Ultimately, we’re encouraging people to get certified. I agree with, that it will open those doors. It is a differentiator. I don’t think that, we’re going to say, if you become certified, you know everything.

Patrick: No, definitely not. There’s not just one way to do something, or one way to validate something. I think that the certifications definitely have their value. Hands on experience, and experience on the job, there’s probably nothing like that. Then if actually done the job, in role. However, certifications are one way that you can show really quickly, that hey, I have these skills.

Carlos: Very good. Well, awesome. Great conversation. Interesting to get some of those insights, and some thoughts around where certification programs are going. Let’s do SQL family.

Patrick: Okay, let’s do it.
Carlos: While, technically not SQL person yourself, you’ve never been database guy, right?

Patrick: No. You wouldn’t want me managing your database.

Carlos: We will still ask you some SQL family questions. You get the inherited, not inherited. What’s the word? You get adopted.

Patrick: You get adopted, I love it.

Carlos: You’ll be adopted into the family. A couple of questions, we’ve tossed out, because they’re SQL server specific, but one of the things that we like to ask is, what’s the best piece of career advice, you’ve ever received?
Patrick:The best piece of career advice that I’ve ever received, is that you interview for your next job everyday.

Carlos: Interesting.

Patrick: It’s so true, because you never know, who’s going to be the person that’s going to give you the recommendation beyond the interview panel, or perhaps even be your boss.

Carlos: That’s true. Very good. Our last question fro you today, Patrick. If you could have one super hero power, what would it be? Why do you want it?

Patrick: I just want to fly.

Carlos: Especially home. Right now, right? We’re in Atlanta, and your home is Seattle.
Patrick:  It’s not really fair. It’s just so cliché to say Superman. I want to be able to fly, but it’s the truth. It would be awesome to just fly around. Especially last night, with trying to get out of here with the Beyoncé concert going on. Trying to get across town, I would’ve liked to have been able to fly.

Carlos: Yes. Luckily, I made it out before that became a huge crowd. Patrick, thanks so much for being with us today.

Patrick: Yeah. Thank you so much. Appreciate it.

Episode 65: Building a Great Data Architecture

Does your organization use a data warehouse? Maybe you have a cube but find it hard to maintain? This week on the SQL Trail, Steve and I chat with Jason Horner about building a good data architecture. When talking about a data warehouse architecture, or even just a data architecture, people often think about building the data warehouse and specifying the server hardware, building the ETL, things like that–they tend to lose sight of the larger picture of data architecture.  Jason gives us his “four pillars of data warehouse architecture” and how each of them impacts what you are able to do from a reporting prospective.

Listen to Learn…

  • Jason’s four pillars of data warehouse architecture
  • Why most warehouse are organic, rather than planned
  • How building a data warehouse is like building (or rebuilding) a house
  • Important recovery factors for a data warehouse
  • Why Jason thinks you should start your data dictionary in Excel
  • Why Jason says your data lake can be an intermediary stop for your data warehouse
  • The value of using source control for your data schema

Episode 65 Quote:

“Whether it’s actual doing the dimensional modeling and making sure that you have a good dimensional model, that’s the number one thing that helps with data warehouse performance.” – Jason Horner

 About Jason Horner

Reporting architecture
Jason is a consultant for Pragmatic Works. He specializws in data warehousing, data platform, geo-spacial power-show. Jason is also a Microsoft Certified Master and Microsfot MVP. Follow him on Twitter at @JasonHorner. Read more about him here on his about.me page.

Resources

Follow @JasonHorner on Twitter
[Free Training] Data Warehouse Implementation with SQL Server Jumpstart
[Article] Building a Data Warehouse with SQL Server
[Article] SQL Server Data Warehouse Cheat Sheet

Episode 65 Transcription: Data Architecture

Carlos Chacon: Jason, welcome to the program.

Jason Horner: Que paso Companeros.

Carlos Chacon: Yes, it’s good to have you on the show. We’ve kind of been talking about this for a little while, and it’s nice to have you here. Our topic today is kind of the idea of what would make a good data architecture. I think most of the folks listening, they have data responsibilities, whether they be database administrators, or even the reporting, whether it’s some transactional system that they have. Then kind of taking that, and migrating it into a reporting environment, or even eventually into something as big as a data warehouse.

You have some interesting thoughts on maybe ways that organizations could improve that process, and make it better so that they’re not kicking themselves later down the road with some of their poor choices, as they build out this infrastructure, this architecture rather.

Jason Horner: Yeah, absolutely Carlos. I see, worked as a full time employee a lot of places. I’ve been a consultant for about two years now. I’ve seen a lot of different environments, and you really hit off on a good point. A lot of these data warehouses, they’re built organically right. It’s starts off there’s somebody wants some kind of a number. Maybe it’s a top line number for sales, or some such thing. Usually it’s the application developers or somebody that are tasked with building up some reports. Then you guys both now as production DBAs, you start screaming, bringing pitchforks to their desk, “what are these report inquiries doing on this OLTP system, they’re slowing everything down. We’ve got long blocking and blocking chains.” These kind of problems. Usually what people will do is they’ll say, okay well we’ll use transactional replication. We’ll move the data over into a second copy, and we’ll let you do whatever you want there. Then ultimately that kicks the can down the road a little bit further, right? They get some mileage out of that, but then the queries don’t perform, because the analytical questions get deeper, the business demands more. What I like to …

Carlos Chacon: The data itself just gets bigger as well.

Jason Horner: Yeah, absolutely right. You can’t use some of those same OLTP management techniques on that data, right? You have to remodel it and build it. I really like to think of it, like think about a house. If you’ve ever went shopping for a house, you’ll see a lot of houses, and maybe they don’t look to pretty on the inside, but as long as the bones of the house are good. As long as the inner two by fours and framing of the house is good, you can work with that. You can remodel it and you can build it. That’s really how I see setting up a framework for data architecture.

Steve Stedman: Okay so given that you’ve got that data architecture that’s built out for your OLTP process, and you’ve got the good bones, like you said there. What kind of things do you look at to figure out how do you get that into a data warehouse environment?

Jason Horner: My model is kind of, I use four pillars right. It really boils down to a solid platform, information management, data integration, and application life cycle management. Those are really the components that I look at and try to figure out good strategies along all those accesses.
Steve Stedman: Okay, so when you refer to the platform, where are you going with that?

Jason Horner: The big thing with the platform is obviously as you know, I know backups and recovery and corruption are near and dear to your heart. It’s really defining and RTO and an RPO. So recovery time objective, how long does it take you to recover, and a recovery point objective. To what point can you recover to? I think a lot of people don’t think about that for the data warehouse. It’s not as critical, some would say, because a lot of times you can recreate that data from the source system, or maybe you’re getting text files or other feeds, but some systems the data warehouse becomes a de facto system record.

Steve Stedman: That’s something I’ve seen as well, where we just say we can rebuild it from the source data, but the often times over time, you lose the source data, or it gets purged so that data warehouse is the only place that you might have some of it.

Jason Horner: Yeah, absolutely, and think about how long it would take to, in just 10 years worth of daily files. It could be a long haul.

Steve Stedman: It could be close to 10 years.

Jason Horner: Yeah, exactly. Other things to consider is high availability. Do you need to use failover clustering is always on appropriate. There’s a lot of other technologies there. Maybe even the answer is you don’t need high availability. Maybe you want to do a hybrid scenario, where you have always on, into the cloud. That’s another possibility there. Then the last piece I would say is, oh go ahead Carlos.

Carlos Chacon:  I was going to say, you said something there so availability groups in the data warehouse. Now I’ve seen it, a client is working with, they have it, but it seemed like, I never thought that was a good idea. Ultimately, I guess my thinking there is that while yes you have the highly available solution, is the data warehouse the right place for that, and all of the benefits that you lose I’ll say, by going to a transactional log replication type structure. It seems like when I load my data, doing table locks, bulk inserts, and reducing the amount of logging that gets generated, to make that go faster, yes that makes sense. I’m curious to know, how have you seen the data warehouse in an availability group scenario?

Jason Horner:  It’s going to be a certain profile of customer that’s going to do something like that, because like you said, it really impacts how fast you can go on some of these minimal logging scenarios, and what you can do with index maintenance. Some of the pains there of being a fully logged model. What I would say is if you think about a customer that’s integrated that data warehouse, and that information from there, into more of an operational side, where it’s actually feeding an application. Maybe it’s providing near real time analytic intelligence back to the application to make a decision on something, then they can’t afford for that data warehouse to go down.

Maybe it’s appropriate for a small section of the warehouse, maybe you take a slice of that data, and expose it via that. Then if you need something that’s more longer term, let’s say 5 years and back, or some similar time frame like that. Maybe that goes into a different architecture. A lot of companies, the data warehouse has really become the lifeblood, so you can’t have it go down. Maybe a failover cluster is a way around that, just to enable patching scenarios and upgrades. There’s certainly other more lightweight technologies that might be appropriate, that aren’t necessarily the new hotness.

Carlos Chacon: Sure I was not suggesting that an availability option wasn’t a good idea. I was just kind of interested more in the availability groups because it seems like in the data warehouse I’m seeing interest there as well, and I kind of feel like maybe this is not, your use case may vary

Jason Horner: Yeah, absolutely. It’s really set, and that’s why I think it’s important that people actually think about this. Often times it involves bringing in other people, not just the data warehouse team, but relying on your DBAs, relying on your infrastructure team, so on and so forth.
Carlos Chacon: Okay, so now as we take a look at that, we’ve kind of made some of those decisions, now we’re going to start actually building the nuts and bolts to make either the ETL process for that recording to start to take form. For information management

Jason Horner: Yeah, absolutely. Information management is really key. I spend a lot of time talking to people at SQL Saturdays and other conferences about components of that. Whether it’s actual doing the dimensional modeling and making sure that you have a good dimensional model, that’s the number one thing that helps with data warehouse performance. If you’ve just taken and done a lift and shift, and moved an OLTP scheme over and tried to build a data warehouse on top of that, it’s likely not going to work, as the data approaches larger sizes.

I’m talking about 500 Gigs, Terabytes, those kind of size ranges. It’ll start breaking down. It’ll work pretty well for smaller sizes, given appropriate hardware, but it’ll really start to become painful. Now that being said there’s a lot of exciting technologies in 2016. Things like the operational analytics, which combines column store technologies with memory OLTP, which I think is really going to be a good enabler for some of those scenarios, to kind of bridge that gap.

There’s other things too. I think one of the key things, and you’ll see a lot of stuff in 2016 around master data services. Getting master data management, and then really empowering the business users to manage their reference data, manage their hierarchies, and other pieces of data that actually enrich that data model. When I say enrich, I’m saying data that’s not stored in the OLTP system, but that might be hidden in an analyst’s spreadsheet, or sticky notes in their cube for example.

Hand in hand with that, go things like data quality. Making sure that the results are [inaudible] and accurate. Then having a data dictionary so people understand how to use the model, and what the columns mean. Information management is a lot more than that. There’s some stewardship, and there’s some governance as well, but those are the key things that you really have to start thinking about.

Steve Stedman: Okay, so when you mention the data dictionary, and having that available. What kind of tools do you use, or how do you go about doing that, so that it can be consumed by the people trying to use the data?

Jason Horner: It’s a great question right. There aren’t a lot of things. I mean obviously there’s third party vendors that have come up with solutions to build data dictionaries. Often times customers or clients or places that I’ve been, they don’t have the money or they don’t want to fund it, so I often recommend just starting with Excel, or even using a SharePoint list to define some of these core attributes. It even goes beyond just a data dictionary. It could be something like a source to target mapping, which kind of specifies how do columns from the source system get transformed and then ultimately placed in the data warehouse.

Starting with something simple. Excel spreadsheet, or maybe even using SQL server. SQL server has a lot of system table views that we can use to fill in some of that basic meta data. Then we can use extended properties, or we can use modeling tools to basically bridge that gap and bring that stuff together. I think it’s a really important thing to have, is a data dictionary, because so many people will be interacting with that data warehouse over time.
Steve Stedman: Sure, and it really sounds like use whatever you have access to use there. That will give you something that you can work with, and then build on.

Jason Horner: Yeah, absolutely Steven. The other thing too, that’s important to remember is start small, and then as your home grown solution or your Excel solution becomes painful, maybe then look at buying a tool. A lot of times people put the cart before the horse, and they go out and they spend the money on the tooling initially, and that doesn’t solve the problem. It just gives you now two problems that you have to deal with. Your original problem, and then figuring out how to correctly implement this tool.

Steve Stedman: Yeah, good point, good point.

Carlos Chacon: That’s an interesting thought, because when you said Excel, my first thought was oh my gosh, that’s never going to get kept up, so any changes happen, it’s just going to get lost. I think it is interesting that you’re talking about, basically the institutional process. Like “hey guys, let’s get buy in, let’s make this thing happen”. Kind of starting small because even there I could also see people saying “well let’s just say I start with a single dimension”.

Again it’s a piece of the overall warehouse, they’re going to say “well it’s not complete, therefore I’m not going to use it.” That’s so frustrating when you’re trying to make changes, but ultimately I think you’re right. Being able to then give that as a data team member, I want to give something to the analyst, or to the other reporting people, so they’re not coming to me with questions about where these things are, or making bad assumptions about where the data is.

Jason Horner: Absolutely Carlos and there’s a lot of interesting things that you can do there. I’ve seen places where they have reporting services as their primary reporting tool, and in each of the columns, they’ll turn that into a hot link that points back to either a Wikipage, or even another report that actually surfaces the data dictionary that way, or they use SharePoint and they integrate SharePoint so that they give the user a soft landing spot to start exploring these reports. I think Power Bis is doing some of that. Especially some of the stuff with the natural language query. Where you can build these more robust semantic models, and allow the user to find and enable more self servicing areas.

Carlos Chacon: Yeah, very cool.

Steve Stedman: Okay so Jason the third pillar that you mentioned was the data integration pillar.
Jason Horner: Yes absolutely, and this is something that I think is key to get right, because what I often see is what I call data sprawl, or information sprawl. Where people just start using replication willy nilly, they don’t think about it, and data just ends up in six different warehouse. That’s not really what you want right? You want to think about what are my source systems, what systems to I have to go after, how am I going to consume that data?

Is it going to be a flat file dump, is it going to be transactional replication, is it going to be a direct SQL connection to that, am I going to do extracts? You’ve got to think about my delta detection, my incremental load process. I also have to think about I’m going to want to bring that data into a staging area, and often times you want a persistent staging area. Are you going to use something like Azure Datalake, are you going to use HDinsight or some other base technology just to store this data, because it’s going to be a lot of data coming in and the warehouse may only need a small subset of that initially.

The operational system may archive data out or it may get changed over time, and you just need to start accumulating it. That’s really the core of the datalake pattern. It’s just dump the data somewhere, before you know what you want to do with it, and then you always have the ability to go back and carve out the pieces that you need, to put it in your warehouse. You can also enable explorational data warehouse or data [inaudible 00:26:55] scenarios, where maybe somebody needs to just go in there and get a subset of the data that’s too big for the data warehouse, and that hasn’t been formally modeled, but they need to run a model on it, or an experiment on it, and pull out that data.

Those technologies are really great for enabling that without taking on the full cost of building a third normal form staging layer, as suggested in architectures that you see, or even Ken will, at the end of the day says we build this staging layer in SQL. That’s really expensive right?
Steve Stedman: Oh yeah, definitely.

Carlos Chacon: That’s a very interesting concept that you bring out Jason, is that idea that my data lake can be an intermediary stop for my data warehouse. I think traditionally, some of the feature sets that are coming out around that, because I’ve always thought about it in the “streaming analytics” quote unquote process, meaning I am a website like weather.com, getting hundreds of hits. I have all of this information that I may not want to keep but I want to get a real time feel for what’s trending for example, and I want to make a decision based on that rather than just information that’s going to stay with me for a year in report.

I think that is actually an interesting idea to potentially explore, is that if you were to go that route, you’d then give your analysts a place to go and play around, and kind of get used to some of those new tools that are coming out. You would reduce load on the transactional system, but it would also reduce the pressure on the data warehouse scheme, because the analyst will have access to the data and the stuff that actually needs to get reported on. Year over year, quarter- that kind of slice and dice information can then be processed as they determine what it is that they need.

Jason Horner: Yeah, absolutely Carlos, and that’s the thing, and now you’re not having people kind of building their little one off data warehouses. They’re going directly to where that’s stored. You don’t have multiple people hitting source systems, which is kind of an anti-pattern that I’ve seen. Where you may have three data ware houses. They each extract let’s just say for example customer data. They each extract customer data out of the different ERPs that you host, and maybe out of your CRM system. You’re doing kind of that once, move the data, and then letting people go in and get the data that they need.

Another thing that I see that’s kind of in line with that is people, not necessarily moving way from cubes, but going to a model where we’re going to prototype a quick solution in Excel. Using things like Power query, and power pivot data models, and these things to build something up, and get a top line number out to management and the business users and get some really quick feedback on that. Then iterate again. Then once it’s ready and the requirements are flushed out, and we’ve addressed all the data quality issues, then let IT take that over or the data warehouse team, and productize it. Integrate it into the data warehouse.

Steve Stedman: Yeah, and I think Excel and PowerBI, is a great way to go about doing that, because it let’s those analysts or business users have more of a direct access via that datalake option, without slamming your OLTP system.

Jason Horner: Absolutely, absolutely.
Carlos Chacon: Yes, very interesting

Steve Stedman: Then the fourth pillar we..

Carlos Chacon: We just have…

Steve Stedman: Go ahead Carlos.

Carlos Chacon: All right Steve, go ahead.

Steve Stedman: I was going to say, and the fourth pillar that we have there, is the application life cycle management that you mentioned. Do you want to touch on that a little bit?

Jason Horner: Absolutely Steve. This is something that I’m really passionate about, and it’s an area where the tools have kind of let us down. Historically if you think about working with reporting services, analysis services, SSIS, the source control integration, the ability to do things like continuous integration, those things haven’t been, there hasn’t been a great story there. It’s getting better, it’s getting better all the time certainly, but the ability to use version control. A lot of times I’ve seen environments where their version control is the production database, and they’ll back up and restore cubes, they’ll back up and restore databases. One of the fundamental areas where I see people fall down, and it boggles my mind because it’s so easy to solve, is managing that relational schema for the data warehouse. Even if it’s for an OLTP system, there’s tools out there.

Redgate has SQL compare, they have their SQL source control tool. Even SSDT is a solution I like just because it’s free, for the most part. You can start building these database projects. It can help you support building out a dev, UAT, and production environment. Where you can target builds of the database out. Then initially DAKPACs right, they had a really bad rep. These data tier applications. They didn’t support a lot of features, they were painful to work with, but now they’ve become more of a viable option.

There’s still some problems there, especially working with larger tables, but it’s definitely worthwhile to investigate these things and get those schemas under source control. I think that’s really key. You can’t rely on the production database backup copy to be the golden record for your schema.
Steve Stedman: No, I’ve seen how that can be dangerous. Okay.

Jason Horner: Right.

Steve Stedman: How, let me start over on that, so then how is all of this impacted, or is not impacted, by the newer cloud technologies out there?

Jason Horner: Well I think what it’s done is it’s allowed people to go faster and to choose what things that they want to set up internally, and you don’t have to deal with the IT management side of that, and what things that they’ll let somebody else manage for them. A classic example of this is the source control. You have VSO online, which is basically a cloud hosted version of TFS, Team Foundation Server. You get version control, you get project management, you get all these good things, and support for agile templates, and Scrum, and whatever process you want to follow.

You get builds and all these things, and it’s really great because a lot of times, I remember back in 2006, to set up TFS required 4 servers. You had to sacrifice a few goats, you had to do it on a full moon, and getting these things to work, just before you could make your first check in. That’s ridiculous, I just want to click a few buttons, and start checking in code, right? I don’t want to manage that, I don’t want to worry that oh my gosh, the TFS database is suspect. What do I do? Now I’ve got to go hire Steve to come in and fix the corruption. As much as I’d like to drive some work Steve’s way, I don’t want that headache. I don’t want to slow down my development process.

Steve Stedman: On that, as much as I love working on corruption, I don’t wish it upon anyone.

Jason Horner: Exactly. Some other things to think about is okay, maybe I’m getting data from the web anyway. Maybe it’s, I’ve had customers that get real time feeds from sensors out in the field, so Scada data, from various sources. Well if it’s already coming from the web, why not just, your first step is dump it into HDinsight or dump it into Azure Datalake, and then once it’s there, you can enable some basic analysis from that standpoint, or you can then move it into a hybrid scenario where you’re bringing it on prem, into a data warehouse, that lives on a server that you manage. That’s maybe a little bit beefier to handle very specific reporting use cases, or you could just put the whole thing in the cloud.

The huge thing that came out last year at Ignite, was Azure DW. Not only does it give you an MPP solution, which scales very well, but it gives you a cloud based solution that you don’t have to manage. Then there’s not an initial capital outlay to get going on this. You spin it up, see if it works, if it does great, and you can manage that cost more effectively. If you don’t need it running 24/7, 365, you can scale it up while you need it, and then instantly shut it down and only pay for the storage.

That’s a huge win compared to something like APS, where you’ve got to figure out how many racks do I need, I’ve got to get data center space, there’s very specific cabling requirements. APS is a great solution, don’t get me wrong, but a lot of organizations don’t have the IT expertise in place, or they don’t have the bandwidth to take on yet another thing in the data center.

Steve Stedman: Yes, well having worked recently with the Azure data warehouse, I’d completely agree with what you’re saying there. I mean it’s so easy to get started and get going with it. Compared to other more historic options that we’ve seen in the past, definitely.

Carlos Chacon: Yes. I think it answers that question for the small businesses, because I was saying well we’ve thrown out some of these technologies, and datalake, and I think it could be very easy for a smaller organization to say, well we don’t have all that stuff. I’m just going to stick with what I know. I think you answered that question Jason, is in that the cloud technology makes you, try it for a day, a week. Yes there’ll be some cost incurred, but you’re talking about a couple hundred dollars maybe, maybe.

To see if that’s something that will work for you, and you’ve gone through the process, and you’ve gotten access to, I guess we should define a couple of things. We’ve got Azure data warehouse, that’s Azure DW, it’s an MPP, with massively parallel processing. That basically is big iron, if you will, to be able to do things very very fast. One of the things we didn’t really talk about, that even the first platform to set that up, is how long am I going to have to be able to process all of that, and being able to do that in an MPP environment is going to be faster, or you’ll have more resources available to you, should you need to beef that up for your processor.

Jason Horner:Yeah, absolutely. The pattern I see that people implement all the time is okay, we’ve got data coming in, we’re going to scale up to process this data. We’re using instead of an extract transformation or an ELT pattern, you’re usually using an ELT extract load and transform. You’re using things like CTAS statements, so create table as select, and you’re just kind of doing a tSQL based pipeline there, but you’re scaling up while you need it. Then maybe you dial it back because the reporting’s been offloaded to say a farm of analysis services servers.

That’s what I like about the cloud. I can come up with these great architectures, and not have to wait for IT to catch up with me. I don’t have to sit there and say, here’s this proof of concept where I’ve got a farm of AS servers, that are fronted by some reporting services servers, and I’ve got a SQL DW. I don’t have to wait for IT to open up all these firewalls, and order boxes, and get them provisioned. Then of course the big thing is we’re seeing more hyper convergence, we’re seeing more virtualization, and so what’s the number one problem with virtualization today. It’s over provisioning. If I use a cloud based platform like Azure, they’re managing that for me, so I don’t have to rely on an IT system admin whose knowledge of virtualization is 5 years in the past.

They don’t understand that you can’t over commit memory, and that IOPs aren’t free. You can’t just build all these virtual resources, there has to be physical resources behind it to support the load.

Carlos Chacon: Very interesting, yes, so lots to consider there.

Steve Stedman: I guess before we wrap things up then, oh go ahead Carlos.

Carlos Chacon:  I’m sorry, you mentioned some of the tools that we’re a little bit behind. There is one tool, we happen to be recording this during the Ignite conference, and one of tools, one of the announcements that had been made. Maybe not hand in hand with this, but it’s going to be a tool that, it will be leveraged, as you start looking at reporting, Jason do you want to talk about that for a little bit.

Jason Horner: Oh, yeah sure, so one of the big announcements, something I’m really I’m really bullish on. Big inter-geo spacial analysis, and I think, a lot of data warehouses. They have location data, but they’re not leveraging geo-spacial, and so one of the things that Power BI announced at Ignite is this partnership with ArcGIS to provide some backing for maps. ArcGIS is a platform by ESRI or E-S-R-I, depending on what your accent is. They’re a known leader, first mover in the GIS industry. Geographic Information Systems.

They’ve done a lot of work with things like spacial statistics, and really showing how to analyze things beyond just plotting points on a map, or color coding maps. Doing these Porous puffs. You’re seeing a lot more of capability to build compelling dashboards that integrate maps and tabular data, you’re seeing more ability to kind of do mashups, so it looks like there’s capability there, where they can give you demographic information from ESRI’s ArcGIS platform. They have all sorts of things on zip codes, and what census information is there.

They’ve integrated that, they’ve built all that, and so you can layer your data in with that. It really shows that they’re starting to give some love back to some mapping. We know back in reporting services 2008R2 they introduced Bing maps component, and it was great for what it did, but it definitely had some limitations. There’s been other kind of stabs at that. Powerview had some mapping capabilities, extremely limited, but really great product. It used to code named GeoFlow, and is now I think Powermaps. It did 3D mapping right, so if you think about your Google Earth model, and being able to overlay your data on that, and create a video, that you can maybe embed in a PowerPoint presentation. That’s a really compelling thing for businesses.

It looks like Microsoft is continuing to kind of build on that quiver of mapping technologies. I’m excited to see that, and see where this goes. It looks like it’s in preview right now, so I don’t think they’ve completely finalized what they’re going to be offering. It’s definitely something I’m going to be following pretty closely.

Steve Stedman: Well that certainly sounds like an exciting new move there that I’d like to find out more about. As it becomes available. Shall we do SQL family?

Carlos Chacon: Very good.

Steve Stedman: The first SQL family question is on how do you stay up to date with technology? With things changing as fast as they change today?

Jason Horner: Do any of you guys watch Agents of Shield?

Steve Stedman: Oh yeah.

Jason Horner: Oh yeah, so do you guys remember the big quote from Season 2, Discovery requires experimentation? That’s true, that’s something that I’ve really started to live by, is I used to be a big blog reader. I would read every blog that was out there, and twice on Sundays. I would read a lot of books, I would reread books, but then I wouldn’t actually go and experiment with it, and so you wouldn’t find these little gotchas. I still read blogs, and try to keep up with things, but not as religiously as I used to.

What I do is, when I need to get up to speed on a technology, I pull it down and I start experimenting with it. Seeing where the limitations are, trying to come up with scenarios that might break it, or how things behave. That’s really the big thing. Microsoft Virtual Academy is a huge resource for me. I love taking those classes, there’s so much good content up there. It gets you kind of jump started, and then I’ve got an Azure subscription, so I spin up the Amazon new technology, I can build out environments. It’s great for testing some of the high availability technologies. Great for even testing old stuff.

Things like log chipping. Where you don’t necessarily don’t want to install that, or you can’t install that all on your machine, and you don’t want to run VMs locally, because you don’t have the resources or whatever. Azure’s really my hot rod. I do all my development in there, I build things up, I tear them down as I need to. It just helps. You’ve got to experiment, you’ve got to get your hands on this technology. It’s the only way to become proficient.

Carlos Chacon: You’ve been talking about a gamut of tools, but if there’s one thing you could change about SQL server. We’ll allow you to open it up, to kind of a SQL server environment, as for some of the other tools they go along with that. What would you change?

Jason Horner: There’s so many things. The thing that’s really encouraging me, we’re starting to see the SQL Dev team really taking things from Connect, and from some of the other community trouble boards that people have set up, and really starting fixing these longstanding Connect issues. That’s what I’m very encouraged by, but where I think we really need some love is the SSMS, or the management studio project and solutions structure. It’s very outdated. The source control integration doesn’t work very well anymore. I always, even as a DBA, I want to have some way to manage my scripts.

Putting them in a project to me makes a lot of sense, and being able to have folders, and a solution. The way they currently handle connections across multiple projects is a real pain. It’s not ideal for sharing it across multiple developers because things get hard coded in there. It’s just an outdated model. I hope they can kind of give that some love going forward.

Steve Stedman: Yes, I hope they do to.

Carlos Chacon: That’s interesting. I agree with you on the projects, because that source control integration, if you’re used to using … if you’re using TFS, kind of switching to the visual studio while they’re making them more similar every day, is kind of a pain. You’d like to be able to work with your SQL server management studio and integrate with TFS and do that all with that one interface.

Jason Horner: Yes, absolutely, and I think we’re starting to see more and more of that. It always surprises me when, especially working with Azure DW, and APS a lot for different clients, you can’t use management studio, or you couldn’t as of some of the CU updates. Basically you lose all that and you have to work in SSDT, or Visual Studio basically. Visual Studio Shell. Seeing that they’re bringing the telesense over, and it’s getting more parity with the Management Studio query editing, and query execution, where it’s really lagging though is it doesn’t really have a lot of the DBA capabilities built in.

I would hope that at some point we see a very lightweight SSMS kind of environment, where it loads very quickly and it just has the core things that the DBA needs. A lot of the kind of old crusty people are, bring back query analyzer, and all that. I agree with that. Some of these tools are very heavy weight, and they take a long time to load. You start working with them for a while and all of a sudden you’re using 500 gigs of memory, and you’ve got to restart them. I just hope that they can get that source control integration back as well, because that’s something that’s sorely missing.

A lot of DBAs aren’t good with those practices. It’s because the tooling isn’t there. It’s so painful to have to then shell out to a command line and work with GIT, or Subversion, or whatever TFS, any of these versions control providers.

Steve Stedman: Okay, great. Jason what is the best piece of career advice that you’ve ever received?
Jason Horner: I’ve probably got a lot over the years. I think the one thing that’s enabled me to be so successful is really to learn the jobs of the people around me. I don’t mean that so you can tell them what to do, because that’s what a lot of people would try to do. For example, if I go into a client and I have to do some DBA work, well I’m going to have to interact with their infrastructure team, their SAN administrator, maybe their network administrator, and the developers, so I need to be able to talk them on terms that they understand. I needed to be able to talk to the SAN administrator about LUNS and about IOPS, and about all these kind of storage technologies. HBAs, so on and so forth.

I need to be able to quickly develop a rapport with them, and the best way to do that is to really understand pieces of that. Now that doesn’t mean that I’m going to go be the SAN administrator anywhere, or I even have that capability. It just means I can talk intelligently with them. Even things like the project managers. Understanding a little bit about Agile, or if you’re stuck with Waterfall, those kind of management processes, and how they work, how they operate. It just helps you.

It gives you a better rapport with people, it builds your credibility. In doing that, people will trust you. People will often say as maybe an interview question or just a general question. What’s the first thing you build when you start on a new project. People will say well, you’ve got to build the database, or I build the ETL process. Well the first thing I build is trust. I do that by being honest and straightforward with people, and speaking in languages that they can understand.

Steve Stedman: Wow, it sounds like very good advice there.

Jason Horner: Thank you.

Carlos Chacon: Our last question for you tonight Jason, if you could have one superhero power, what would it be and why do you want it?

Jason Horner: Oh man there’s just so many to choose from. I like the last X-men movie where you saw Quicksilver and how he went into the mansion and he had the super-speed, but it was just really cool how they kind of did that. I’ve been a big fan of a lot of superheros over time. Gambit’s ability to take and charge a card and make it explode, I probably could have used that at a few client sites occasionally. I don’t know, it’s a hard question to answer. I believe we all have a super power within us, and that’s the ability to become a better communicator. Ultimately that’s what enables IT. Is to be able to communicate with people, to be able to translate business requirements into technical solutions, and be able to work with other people on your team.

Carlos Chacon: Very good, awesome. Well Jason thanks for being with us tonight.

Jason Horner: Yeah, thank you guys so much. This has been great, it’s been really good talking with both of you guys, and I know it’s a bit of a journey to get this scheduled, but I’m glad we did it.

Steve Stedman: Yes, definitely. Thanks for being here. I appreciate it.

Jason Horner: Yeah, and Steve thank you so much for all you’ve done. You’re community corruption challenges, have really inspired a lot of people to spend some time on that. I think it’s something that not a lot of DBAs run into. To give them these kind of scenarios and have them work through it, and sometimes collaboratively, really increased a lot of the IQ around corruption repair, and perhaps even backup and restore best practices. Kudos to you for all you’ve done there.

Steve Stedman: Well I appreciate that. Thank you.