Episode 45 Service Broker

1400Compañeros, we are going to get back into the engine this episode and go back to a feature that has been available since 2005–Service Broker.  If you are wondering what service broker is–you aren’t alone.  In this episode, I chat with Wolf about how he uses it, the major components, and scenarios where you might use it.

Service broker, like the early version of extended events, doesn’t get much love due to the lack of a GUI.  If you are using service broker, I would love to hear from you below in the comments.  How are you using it?  What issues have you had?

http://sqldatapartners.com/wp-admin/post-new.php#

Wolf on Twitter
Link to Wolf’s presentation

About Wolf

Wolf works as a database manager for RDX in Pittsburgh, PA.  He has been a DBA for 15 years, originally starting with Oracle and them moving over to SQL Server.  He loves to ride his motorcycle and is a regular speaker at local SQL Server events.

Transcription

coming soon

Episode 44: Azure SQL Data Warehouse

Compañeros, it is time for another episode of the SQL Data Partners podcast.  Episode 44 takes us into a new service in the Microsoft cloud–Azure SQL Data Warehouse.  While still in preview, the Azure SQL Data Warehouse looks to help level the playing field for organizations that want to analyze their data without the expense of creating a data warehouse.  Sound a bit counter intuitive?  Check out this episode to find what the service is all about and what the future of data processing might look like.

Quote From the Episode

“It is an interesting time in the technology space with the convergence of different data types . . . into systems like SQL, Hadoop, MongoDB, and key value pair systems.”

This episode is sponsored by

Listen to learn . . .
  • What is the Azure SQL Data Warehouse service
  • What types of organizations might want to consider this service
  • Why you still need to look at the entire flow of data for ETL processes
  • How a laundry mat company used analytics to increase business by 30%

About Matt Usher

Azure SQL Data WarehouseMatt Usher is the program manager for the Azure SQL Data Warehouse for Microsoft, where he has been for the last 8 years.

Links

Matt on Twitter
SQL DW Overview
SQL DW Blog

Transcript

Coming soon

Episode 43: Why is ETL So Hard?

1400

 
Compañeros! Welcome back to the SQL Trail. If you’ve ever thought about improving your ETL game, this episode is for you. In Episode 43 of the SQL Data Partners Podcast I talk with Rafael Salas. Rafael is a Data Platform MVP and SQL Server Architect with over 12 years of experience building business intelligence solutions. We talk about the nuts and bolts of your data’s back-end and the data warehousing that powers the data-driven decision-making in your organization.

Episode 43 Quote:

“I always recommend that the first stage of any ETL project is to spend the time on very basic things. Understand your requirements and go and profile the data. Get familiar with the data. ”- Rafael Salas

Listen to learn…

  • Why it’s hard for IT folks to learn ETL
  • Why ETL tools alone aren’t enough to get good data
  • How to transition from an ETL mindset to a data architecture mindset
  • Why the best data architects spend a lot of time learning their dataset
  • Where to start when you want to architect your own data
  • The data warehousing method Rafael Salas recommends
  • Where to find good ETL learning resources
  • How Big Data will affect ETL developers
  • The SQL tools Rafael uses daily

As always, please join me on the trail by starting a conversation below or connecting on Twitter. If this podcast is helpful to you, please leave a review on iTunes so others can find it as well. Subscribe via iTunesStitcher, and Podbean.

This episode is sponsored by COZYROC

Cozyroc-logo-png

 

Show Notes

About Rafael Salas
Rafael Salas is a Business Intelligence architect with more than 14 years of experience providing data architecture solutions. He is a SQL Server MVP, MCTS, and an active member of PASS and local user groups where he regularly speaks on Power BI, ETL Architecture, and SSIS. He’s also a part time professor and program advisor for the Computer Technology Institute at Central Piedmont Community College (CPCC), participating in the Business Intelligence continuing education program and the Regional Effort to Advance Charlotte Information Technology (REACH IT) Business Intelligence program. Connect with him on Twitter and LinkedIn.

Resources
Rafael’s Blog
Follow Rafael on Twitter
SSIS: Design principles for robust ETL processes
Implementing a Data Warehouse with SQL Server Jump Start
Designing BI Solutions with Microsoft SQL Server
PowerBI
BIDS Helper
Red Gate SQL Prompt

Transcription: ETL

Carlos: So Rafael, welcome to the program!

Rafael: Thank you Carlos, happy to be here.

Carlos: Yes, it’s always nice to have an East Coast friend to come on the show with us today. So interestingly enough, and I’m not sure if I’ve ever shared this with you, not that I’ve done a ton of SSIS, but what I’ve started doing is, I’ve started looking out there on the internet. Your blog was actually one of the first that I came across and helped me on more than one problem. So if I’ve never said it before, thank you.

Rafael: [laughing] Thank you again. I’m glad that you find it helpful. I think that takes my readership to up around two people: my wife and you. So thank you.

Carlos: [laughing] Yes. I thought it was great. So that’s one of the reasons why I wanted to have you on the show today. Our topic, ultimately, is around ETL and why ETL is so hard. It seems like a lot of people struggle with it and I think that it’s changing a little bit. And so I guess, let’s go ahead and jump in. What are your thoughts? Why do you think ETL is so hard for IT folks or data folks to do ETL?

Rafael: Yeah, that’s an interesting question. I think it’s a combination of things. For one, I believe many many of us still probably underestimate the complexity of bringing data. It goes like, it’s all about moving data. It’s all about moving data that we have and house most of the times. We know where it’s going. So what could go wrong, right? And at the next level of detail when the first pass goes, truly it’s coming from different systems at different frequencies at different granularities. And it’s not very clear that there is a common key between one table in one system and the table on the other system. And now we’re going to merge those two together. So I think it’s a compound problem that boils down to underestimating the complexity while not giving the right amount of importance to data quality as well. The assumption is, “Hey, the data is already in house, it’s running our CRM system, it’s already running our ERP system, so it should be good.” But the data is actually good for why it does on that systems, not necessarily for what you’re pretending to do after you extract the data from that system.

Carlos: Yeah, exactly. The downstream, who knows what happens.

Rafael: Yeah, so it all just comes down to data quality as well. My last one, my favorite reason for why  make it harder is, again, people think, “Oh, all I have is data. And I have a good ETL tool, say SSIS, everything should be just downstream very easy.” And you know what, this tool doesn’t come with an architecture design and this tool doesn’t come with a system design. So there is a lot of engineering that has to come into place to make it sustainable.

Carlos: That’s an interesting thought, I think, when you think about data. Particularly from the traditional transactional model in that if I want to connect two pieces of data, it’s a join. Right? And I know that I have a foreign key relationship and I can join them and I can move forward. ETL is not necessarily like that, right? You mentioned that this data could be coming from anywhere, and so do you think we don’t spend enough time getting to know our data sources? Making too many assumptions there? I guess you kind of referenced that in the sense that there’s no architecture, right? We kind of have to figure that out on our own.

Rafael: Yeah, I think you are probably touching on and nailing it on the head with this. I think we fail to plan building this system. I always recommend that the first step or the first stage of any ETL project is to spend the time on very basic things. Understand your requirements and go and profile the data Get familiar with the data that you’re going to be moving and conforming. Make sure that it’s gonna fit the bill. And you don’t need to spend a month on that. You spend a couple of days, you spend a week, and you come back with your tool to say, “Here is what you’re asking me to do on this ETL process and based on the requirements and based on this sample data and on this profile I think there is a huge gap. So why are we going to spend the time building a system when the data from the most fundamental level of quality is not where it needs to be?”

Carlos: Alright, so let’s dive into that just a minute. You mentioned, before we started recording, that you’ve moved a little bit more into the architecture role. And so now you probably have a little bit more experience with saying, “Okay, let me step back and think about how I want to put all this together” What are some thoughts or some nuggets of wisdom, if you will, that you might share about making that mind switch? The mindset thought of, “Okay, I’m not just going from Point A to Point B. I’m getting to know that data.” And if I wanted to architect from that, where do I start?

Rafael: That’s an excellent point. So, I arrived at this point probably learning from my own mistakes. That’s obviously one thing. But there is also a lot of good material out there. One of the first articles I read when I started researching was, well, there’s got to be a better way. An easier way. And I always had been designing data warehouses using the Kimball methodology. The Kimball method. Star schemas. And he’s known for that. The fact tables and the dimensions, but, if you review literature and their books, they go very deep into ETL architecture. They actually break it down the ETL system into 34 subsystems. I believe it’s probably detailed and there are a lot of those subsystems that we will never ever use and will not apply if you are working on a data warehouse. But that probably will put you on the mindset of, “this is more than just copying, transforming, and writing data from one side to another.” There is a lot more to it and there are a lot more moving parts, so that article on architecture and the subsystems I found it very helpful. I think it goes back to 2004 and was updated in 2007, but I think the principles still apply.

Carlos: Interesting, yeah. I think that is a good point. I’m not a data warehouse person per se, but you hear of the Kimball method and I think you mentioned those dimensions and the way that fact table should be structured, but I don’t hear anyone talk about the Kimball method of ETL.

Rafael: That’s correct. Well, and sadly enough, ETL is probably harder to sell. It’s not as sexy. There’s not as much to show. Maybe for us we’re a little more geek here and we enjoy looking at data flows and seeing the colors turning from yellow to green and all that things. But there’s not so much to it. There’s no data, there’s no pretty charts and graphs. It’s all about moving. It’s plumbing. When you show your house, you show the sexy thing. You show the front room, the living room of your house, not so much the back space. The crawl space.

Carlos: Some of that goes back to, you know, the limited training opportunities. You had mentioned that you were teaching an SSIS class at the college, and I admit that I haven’t looked at college curriculum in a few years. Because it’s not the value-add, so managers are definitely as you get higher they’re not concerned about ETL. And I’m not sure that workers when they sit down for training go, “Hmm. I’d like to do ETL training. I want to do this class or what-not.” Why is that so limited if it’s such an important thing? You know your plumbing, if your toilet backs up all of a sudden it becomes a big deal.

Rafael: Yeah, absolutely. I think you’re right, nowadays it’s probably easier to find said training online. There are a few resources but exactly. If you compare the amount of resources that are available on the ETL side versus what might go into, you know, writing queries, performance tuning, creating databases, utilization reports and charts, obviously ETL may not have that much of resources. So yeah, that’s an interesting point. However, I think there is still a lot of material. I think you have to dig it up a little bit in order to find it. I have found things that Microsoft has on the MVA, the Microsoft Virtual Academy. I think they have good self-paced training that probably will go into some of that, but to your point there is nothing specifically about ETL.  Just ETL pieces as part of an overall larger curriculum around SQL Server, for instance.

Carlos: Right. Here are the individual components, right? Not so much like, “Here is how your put it together.” Or, “Here are some common thoughts if you’re going from an Excel file into your database.” Or something like that.

Rafael: Right, right. That’s a lot into, “Here’s how you can move the switches or change the properties within the tool.” Not so much, “What’s the best way to do it?” The answer to that probably requires a little more research on your own, you know, making your own mistakes and learning from them.

Carlos: So, we talked about making your own mistakes. One of the first things we hit upon is, we test our package, we create the package, we create our test environment with a handful or a couple of hundred rows. And then we push it out into production and it’s like, “Whoa!” We start to have some performance type issues. So I guess I’m curious what you see as the common performance issues that people have. I think most of our conversation revolves around SSIS. We talk about the ETL tool for SQL Server folks, SSIS is kind of a no brainer, right?

Rafael: Yes, absolutely. Well, looking at two specific mistakes. SSIS or any ETL tool for that matter, you open the toolbox of molar on ETL tools and you will find a component for pretty much every task. But just because it’s there doesn’t mean you should use it. Case in point, typical mistake of sorting. Why would you sort the data in memory within the data flow or the ETL pipeline when you are reading data from a highly optimized database engine? You are just an ORDER BY away from pushing that piece of logic or that task back to the database. I imagine many times you still find people sorting the data thinking that if I sort the data before inserting it into my destination, it’s going to be stored in that order. And then that makes people think it’s going to buy them some benefits. So, there is a little bit of misunderstanding between the responsibilities of your ETL tool of choice, SSIS in this case, you need to learn and understand the pros and cons of using each of those components and tasks that might be there. And also, looking outside your ETL tool and understanding the resources on your destination. What are the things your destination could help with as part of the overall ETL process and trying to have that product pushed down, pushed up type of optimization?

Carlos: Right. And I think to that pint, because SSIS when you have SQL Server you get the license for SSIS, right? And it tends to be on the same machine. It’s one of the first things I notice when I’m in different environments and they’re having some SQL issues. Well, are you running ETL workloads on it and can we potentially move that off on another server?

Rafael: It comes from the old everything, you buy SQL Server and you get everything else for free, right? And that mindset. And that means store everything on that SQL Server and do it all on one server. And I think the fine print is, everything is included as long as you’re running everything on the same server. And that is not always a good idea.

Carlos:  Exactly.

Rafael: So you need to understand, I think it comes down to understanding the different workloads, right? If you are doing a workload of a data warehouse it’s different than the workload of an OLTP system. And sometimes you have them coexisting. And you top all that with, oh I also have my ETL process on top of the same server. And I also have my OLAP analysis services structures in the same server. Now you have a little bit of a mess and you got in trouble yourself.

Carlos: Right. And I think if you’re coming from the programming side, another concept that you might be used to is kind of that, not so much looping, but working with a single record set at a time. And then in ETL, that may or may not always work. And I think we talked a little bit about the role of temp tables in our ETL workloads as a performance bugaboo.

Rafael: Absolutely. It’s easy to visualize our ETL process. Like, hey, I’m moving rows. Maybe I’m moving one row or a few rows at a time and we fall into that trap. And the truth is, SSIS and many of these ETL technologies are benefiting from using memory engines. So they are not really going one row at a time. So sometimes we fall into the trap of designing our data flow thinking that’s the way it’s going to process. And we might start having like a OLE DB command transformation on that data flow. Those who are not familiar with that maybe you’re good and should not be using that, because that means you’re issuing T-SQL commands against the database for each record going through the pipeline. And if you’re moving a few hundred, maybe, you would not see the difference. But if you are moving a few million records, you will see a lot of difference. So, I think taking into account those two things I want you using an in-memory pipeline and in-memory engine and I want you to be thinking in set of rows. And sometimes a transformation being applied to those set of rows as opposed to the rows going through the transformations it requires a little bit of practice or time to visualize that but once you get those you’re going to be in a better position to decide your flows in a better way that is better optimized.

Carlos: Sure. So we’ve already talked about some of the issues that ETL folks have in limited opportunities, but they’re also kind of facing some changes to the landscape, the IT landscape. Big data was one, right? Like, “Why do I need to transform it?” Potentially and even tools like PowerBI. How is that affected the role of the ETL developer if you will?

Rafael: Yeah. So interesting question. When the first time we started thinking about Big Data and we thought we were going to be out of work very soon, right? Because Big Data was taking over and relational database engines were going away. ETL, data warehouses, everything like that. Well, I guess with a few years and a little bit of better understanding of those technologies we realized that that’s not the case. They’re here to help out, they’re here to complement.

 Carlos: Sure, they have their place.

Rafael: Absolutely. One big promise that’s still there with big data in terms of ETL in my opinion is the whole concept that they have that schema. Land the data the way that they are and you are able to have the flexibility to define the schema or multiple schemas on top of the same data set as you need to query that. If you think about that, a lot of the time it goes into ETL processes, you need to understand well the data and the data types need to be well-defined and the destination is likely a relational database and so if you say TIME and you try to pass a number it’s going to blow on you. And if you say it’s NUMERIC data type and you pass a letter, it fails. A lot of that, a lot of the time and testing goes into that to ensure those little tiny details are in place. With big data technology, it promises the user, and one of the buying propositions is, that maybe you are able to land it in its native format and you just define the different schemas as you need to rate them. I see a big saving on that, I don’t think it’s going to replace a traditional ETL and a traditional database engine next to a need for certain use cases. But I believe that it’s just like another tool that you have on your toolbox that might speed up the process in certain areas.

Carlos: Isn’t the funny right? Even in the big data scenario, data types still matter.

Rafael: Oh absolutely. It probably matters a little later, but it still matters. AT some point, when you’re going to rate the data and you’re going to create that visualization or you’re going to create that report, or do whatever is next, that information needs to be defined.

Carlos: I was at a presentation with James Phillips, who is the VP of the Microsoft BI stack, if you will, and PowerBI was ultimately one of his things. And he talked a little about how that role might change a little bit and I guess we could dive into the PowerBI discussion a little bit here. So the thinking is, and again, I think there will still be a place for it because I don’t think that the data warehouse is dead, but some of that might change and that ability to glue separate sources is going to become more important as the self-service reporting kind of continues to evolve.

Rafael: Yeah, I think that’s something that resonates with me. Now you would have the ability to probably have two different sides of the house if you will. The more structured, maybe the more curated side of the house, which might use the traditional ETL or the traditional data warehouse approaches that we know. Perhaps the less curated, or the high volume or the most rapid data side, on the other side. And the technologies now have so many different layers and they allow you to use as you say glue them together and bring value a little quicker to the business. So that’s definitely something very interested to keep an eye on.

Carlos: Right, so if you could go back, you’ve been working in the data space for a while. If you could go back to the younger Rafael and give him some pointers for this journey, what thoughts would you share or what points would you want yourself to know if you could go back and do some things over?

Rafael: Hmm. For this side of the house that I’ve been working, I think the earlier we could learn about writing better SQL scripts, especially query. Write better queries, I think that would help us. A lot of the times we get in trouble on our ETL system just because we fail to take a good approach. We rely on techniques which we should not be using that are less performance. So I think that would go along with the way with understanding better data modeling. You know, avoiding integrating those tables where if the following datatype is varchar 255 for every column. So, just learning. Yes, yes, this is still cheap and you will get cheaper, but writing and reading data from disk is not cheap. That takes a lot of time. So understanding a little bit more internals, when we’re younger we’re just happy to see the data move from A to B and when we query the data, it looks red and we declare victory, right? I think learning these things probably will help out and I’m into understanding more now into trying to understand the bigger picture, the architecture. Think about this as an overall system that’s not just moving data but it needs to be maintained it needs to be backed up, it needs to be secure, it needs to be expandable.

Carlos: So Rafael, let’s do SQL Family. So one of the things I always like to ask people is how they get work done. And I know we’ve mentioned SSIS here, but what’s your favorite SQL tool and how do you use it to get work done?

Rafael: Well, I spend a lot of my time querying the data, even in my new role as a data architect. The first step that I always take is, “Let me get familiar with the data.” So as I write queries in unknown databases or relatively new databases to me, SQL Prompt from Red Gate is something that I found very very helpful. I know that one pet peeve of mine is that I hate when a query is not properly formatted. And properly formatted by my standards.

Carlos: It looks ugly.

Rafael: Exactly. I mean, I tend to think that maybe my SSIS package is running slower because the SQL statement hasn’t been properly formatted. So, I like SQL prompt. And if yous pend some time in Visual Studio creating anything that has to do with BI projects, not necessarily SSIS, I think BIDS Helper is another good tool that’s also free. It comes from CodePlex.

Carlos: The BIML stuff, all of those components. Very  handy.

Rafael: Oh absolutely, absolutely.

Carlos: If you could change one thing about SQL Server, what would it be?

Rafael: The hard part of that question is you’re only saying one. No, SQL Server is really great tool and has been feeding my family for many years now. But of course, there is always room for improvement. In the SSIS world, things have improved a lot especially since 2012. I always wish we had a better repository for executions. We are there, we have kind of a good way, but I think I would like to see more enterprise-class ETL. Ability to have clustering, mirror monitoring, better partitioning, a little bit more of the scale up, scale out capabilities in SSIS would be on my wish list.

Carlos: So, what’s one piece of career advice that has been helpful to you throughout your career?

Rafael: Wow, several things come to mind but one thing that probably hit me early is of value. Somebody told me, “Never stop learning. Don’t be afraid of learning new things.” And sit in the front row and raise your hand and say, who can do this? And you just go for it. I think that has been a big help for me. Don’t be afraid of learning and jumping into new opportunities. Sometimes it looks daunting or scary or you think you’re going to look bad if you fail. But you know, that’s the way it’s going to go and it’s going to open you up to new opportunities.

Carlos: Yeah, especially in technology. You have to keep up. Our last question for today. If you could have one superhero power what would it be and why would you want it?

Rafael: Hmm. Yeah, I was driving today from work to take this call with you and then I got stuck in traffic. And I wish I could fly. I think I would like to be able to fly and not be stuck in traffic and see if I can get to work on time more often.

Carlos: [laughing] Very good. I’m sure your wife would enjoy that as well.

Rafael: [laughing]

Carlos: Rafael, thank you so much for taking some time with us today.

Rafael: Thank you, my friend Carlos. Thank you so much for having me. It’s such a pleasure to be here.

Episode 42: SQL Server Auditing

1400

How made that change–Have you ever gotten that question?  Today’s stop on the SQL Trail is all about SQL Server auditing. We all know we should do it, but is the default setup really going to keep you covered? In this episode of the SQL Data Partners Podcast, I talk with Brad McKuhen. Brad is a Senior Database Architect and PASS speaker who recently spoke about auditing at PASS #506 in Baltimore, Maryland.

Episode 42 Quote:

“ …a lot of it has to do with knowing what you don’t know that you need to know until after you need to know it.”- Brad McKuhen

Listen to learn…

  • The difference between PII, PCI, and PHI
  • Database auditing versus server auditing
  • The role of change data capture in auditing
  • How to implement auditing so that you’re actually protected
  • The circular conundrum of auditing
  • The paid SQL Server tool Brad relies on
  • How to use auditing to add business value

As always, please join me on the trail by starting a conversation below or connecting on Twitter. If this podcast is helpful to you, please leave a review on iTunes so others can find it as well. Subscribe to the podcast via iTunesStitcher, and Podbean.

This episode is sponsored by

Database Health Monitor

Show Notes

About Brad
Brad is a Senior SQL Server Database Architect and SQL Server DBA with the Clutch Group. He regularly speaks at PASS events about SQL auditing and working with developers. Brad has worked with SQL Server in some form or another since SQL Server 2000, and has been in IT for over 20 years. In his spare time, you’ll find him spending time with his family and furthering his SQL Server skills. Follow Brad on Twitter, LinkedIn, and on his blog.

Resources
Follow Brad on Twitter
MSDN: Change Data Capture
SQL Saturday #506 – Baltimore, MD
SQL Auditing – Getting a View of Yesterday’s Activity Today
PASS DC on Twitter
PASS DC Website
BradMcKuhen.com
Spotlight

Transcription: Auditing

Carlos: So Brad, welcome to the show!

Brad: Thank you very much

Carlos: It was nice of you to be here! We find ourselves in Baltimore today at the SQL Saturday, so thanks for taking a little time to come and chat with us. So you’re actually presenting today on SQL auditing and that’s the topic of our conversation today. You’ve done that a couple of times.

Brad: Yep.

Carlos: And so I’m interested. I know that can be a very boring concept, although here in the United States we just finished our taxes, right? So some people might be a little bit concerned with auditing. But ultimately our conversation is around SQL Server components there

Brad: That’s right, because even in the SQL Server world, no one likes getting that letter.

Carlos: That’s right. So let’s talk about auditing. Why should we be auditing?

Brad: Well, a lot of it has to do with knowing what you don’t know that you need to know until after you need to know it. So it’s a circle, right? That’s kind of a goofy way to explain it, perhaps, but tomorrow or Monday, you’re going to get a note from somebody that says, “Hey by the way, what happened on Friday? We’ve got this question and there’s some instances…” and that’s one reason. Because you have a historical need or a need to know what happened historically. The other reason is to prove that you’re protecting. To prove that… it’s a sword and shield conversation. To prove that nothing was changed. That there’s consistency and constancy in the protection of the, let’s call it protected information because that takes a number of forms. If you’re in the medical space, we call that PII and PHI. Protected personal information, protected personally identifiable information, which would be something like your blood type, right? Personally identifiable information, which is true in almost every business in today’s world, you have to guard the social security numbers. If you’re dealing with payment systems, then you’re dealing with PCI, which is the credit card numbers themselves. The sixteen digit number or the fifteen digit number plus the expiration date. So that has to be guarded. And sometimes it’s about proving the protections. So there’s a number of reasons to do it and a number of considerations.

 Carlos: So when we talk about auditing in SQL server, there’s at least in my mind a little bit of a misnomer, because to that point you want to know what’s changed. Auditing feature per se does not let you do that. You have to use Change Data Capture. Now you have an interesting philosophy in that those are one and the same in your mind. I misspoke there: in the sense that I’m going to implement both of them to help me answer that question.

Brad: That’s right. It’s a comprehensive solution that’s provided by SQL Server and if you kind of look at it as a funnel. You can say, “Okay, so at the very high level, who touched my box?” You can get that from SQL, you can get that from Windows via what network connections were made and you can track it that way.

Carlos: Here’s a login. This person logged in at what time?

Brad: Right. Or even, “I got a connection request, a network connection request from this unsecure IP. This is an out-of-range IP, so I want to log it because it didn’t come from one of our junk boxes.”

Carlos: So let’s go into the SQL Server auditing components. What kind of things are you auditing and taking a look at from the, and again because I look at it as features, let’s look at the auditing feature maybe first, and then we can talk about change data capture.

Brad: Sure. So it is of some interest, actually, that you can record a lot of these things to the Windows log, right? And I started there for a reason. Because really what ends up happening is that people don’t understand what you can log to the SQL Server log, because it can actually receive a lot more of this information than people sometimes give it credit for. So if you turn a lot of these things on, you now have an obligation to protect your log files. Who thinks about where the server log files go? Who thinks about that? Have you ever actually tried to move them?

Carlos: It’s in the default location.

Brad: Right! They’re always in the default location and they’re always set to a depth of six. Everybody sets it. Nobody goes, “You can move those?” Yes you can. And when you go historical, you probably should. Because once it rolls over, now it’s a fact. It’s a protectable unit of information that needs to be protected because once you revolve your logs, and you are rolling your logs on a regular basis, right?

Carlos: Well the default is, you get six logs and it rolls over once it restarts. That’s the only option.

Brad: But if you’re dealing with a cluster, or you’re dealing with a 5-9 situation, you shouldn’t be restarting that often. So your log file can actually get actually massive. But it becomes kind of the first step in proving who did what where when how is the SQL Server error log.

Carlos: That’s interesting, because normally I’ve adopted, my philosophy is that I like to cycle my logs every night at midnight. Keep it consistent and easy. So Friday, I know what the log data is. And I’m only keeping thirty days. Your suggestion is that I should keep it longer than that and include auditing? So your idea of auditing is, “I’m going to push more detail to the log and then I’m gonna manage those in a different way.”

Brad: Well, there’s different levels. What I would suggest, and what I suggest in my presentation, is you find a number that works for your organization which is based on the size of data per server. Because every machine ois always different. Now you can go with the corporate standard: we do every Friday and we’re going to keep the depth of 20. So now I have twenty weeks. But then look at the other side of it, which ties to SQL Server auditing itself, which I’ll kind of drill into know. I have to know where the information is going. It’s the feature, right? Not database auditing, we’ll get there in a moment. But auditing at the SQL Server level. I want to know who makes a database and when, right? That’s kind of a basic auditing feature.

Carlos: Right. DDL changes.

Brad: DDL Changes. But I’m going to use the SQL server audit feature and not a trigger, because a trigger pushes to a table and I’m talking about auditing which pushes to a file. So in SQL Server auditing when I push this stuff to a file, I now have the exact same problem that I have with the SQL Server error log. It’s data on disk that can be manipulated by a Sys Admin or someone with rights. And insider attacks are a very real things. So the first thing that even an entry level auditor learns to do is, when it comes to electronic records or electronic communications, what are the rights associated with this. We know that it’s a valid record. So how do you know that the record itself is valid, that the record itself wasn’t changed? If you have some smarty pants and drops a 3 TB database and then knows the logs are there…

Carlos: …and covers his tracks…

Brad: … covers his tracks and makes a little creative auditing and all of a sudden it’s your fault because the log says so. You have a problem, because the log says so. Not because you actually did anything, Mister or Missus DBA. But because the log says you did.

Carlos: Kind of a like who’s watching the watchers.

Brad: That’s right! That’s absolutely right. So when you first talk about auditing you first have to start to talk about: where’s my stuff going, where are my files stored, what is the security surrounding that landing zone? And this comes back to the error logs. Because I may want to capture them as soon as they roll off as a part of the same rollover script, I may want to fire a PowerShell step inside the SQL agent job that rolls the things over and I may want to catch them and move them someplace. So I roll it over and I take the first most recent, which is now .1 and I put it someplace. And I put it someplace that sys admins can’t get to, right? So it’s controlled with AD permissions. Yes, a sys admin of a Windows domain can always go in and do any network share and yada yada yada. But if I have tracking turned on in that directory and I have the necessary permissions turned on that directory, then I should be able to see if somebody modifies the security of that folder. There should be system monitor on it or Microsoft network monitoring or systemic logging when it pushes to the SAN. So sys admins don’t have the SAN permissions so you have to be a SAN admin. Right? So there’s kind of a check and balance going on.

Carlos:  That’s seems a little, and I don’t want to say nuanced, right? So let’s say that we are in a trustworthy environment…

Brad: And that’s all relative too. If you’re the only DBA, you know?

Carlos: What other kinds of things, to go back to the SQL Server auditing feature, what other kinds of things am I going to be putting in there. You mentioned creating or dropping tables.

Brad: That’s right. So, well at the server level it would be creating or dropping databases. Creating or dropping logins, modifying system configuration, creating an end point. It’s instance level. Am I creating an end point, right? Am I creating credentials? Am I altering credentials? Am I resigning things using certificates? Am I recreating certificates so I can sign it? Why did it get unsigned in the first place can be an interesting conversation. Because maybe you know about it. Like, we have this stored proc and we’re  doing delegation, we’re doing substitution and delegation service accounts are the only ones with the rights to run the thing or whatever. But we’ve got to mod the stored proc. And all the paperwork has been filled out because it’s a production level change. But when you modify the stored proc, when you modify the proc to get the latest business logic in it’s gonna break the certificate because it’s signed. So I’ve got to go back to sign it. Turns out, nobody knows the password to the cert, right? Look, these things happen. Like, DBA Bob who originally signed it five years ago has left the firm. But his is a known business case and it’s trackable. So when the auditors come through and they say, “Show me any security exceptions or exemptions in the last twelve months.” Well, here you go. We had to resign the certificate.

Carlos: You document the reason why.

Brad: That’s right. It all comes down to, “We know what happened.” This isn’t a fly by night operation and we know what’s going on in our organization and when there are exceptions, we know why and we can run them down.

Carlos: Yeah that’s interesting. And I think, knock on wood, I’ve been lucky there that I haven’t been in environments where auditors have been busting down my doors wanting to know that level of detail. And I think that’s the trick or the gotcha, if you will, with auditing that’s the instance level. A lot more common question I get is, “Who changed this data?” My number was six today and now it’s seven. What happened? And auditing unfortunately doesn’t give us that granular of detail.

Brad: Well, that’s intended for change data capture. Which if you marry that with a database level database audit, but basically, you have to audit the database too. If those rights get elevated or get escalated, now all of a sudden I have a way of saying, “You shouldn’t have the ability.” Let’s talk about the HR database and the HR table, to get specific. If you have somebody who’s modifying the salary numbers of the people table, you want to know who, you want to know when. You don’t know why, but that’s why we turn these things on and ask why these things happen.

Carlos: And it would be helpful if you knew what the original values were.

Brad: That’s right. That’s exactly where change data capture comes in. Another interesting piece o information to know is when did you get rights to do that? Or how did you get rights to do that? Because that’s why I would turn on database auditing. Because if somebody granted that login permission an hour before the change is made and we’re talking production HR stuff, now I have a very interesting conversation. Because I’m going to have a conversation with the grantor and with the grantee. Like, “Hold on a second. This is people data in the HR system in production. This is not a paperwork free zone. To change this manually through SQL and not through he application and not through the application logins and we need to see a paper trail. Because this is actually really serious.”

Carlos: That sounds like a better scenario for creating a new server, like I create a server for change auditing. I can capture all of that. I’ve got SQL servers now that have users. What’s auditing doing for me other than capturing the changes. How can I get to that level? I guess it’s not going to help me inspect who currently has write access to those tables.

Brad: That’s true. This is sort of one of the logical fallacies of all of it. Which is, I have to know what I need to know before I need to prove it. And so that’s the real challenge of any existing environment. Or, a cots product. Where I can’t necessarily turn change data capture on, because if I turn the shadow table on, now any failure in the cots product makes the entire thing unsupportable. I can’t open a support case with vendor x because I added the change data capture shadow table. And I can’t add a trigger on the table.

Carlos: Right. Make modifications to the table. They frown upon that.

Brad: So the simplest of attempts to understand a performance issue, the vendor considers invasive. Lot of people are in that situation. Lots of DBAs find themselves the sole supporter of an application or the sole DBAs. You’re the lone DBA on the ranch. How are you then going to get meaningful information about your systems, because you can’t turn on change data capture, let’s say. And so, there’s sort of this in between space that exists where you can turn on things like DDL triggers and you define it and you can say, Well OK, I can’t necessarily turn on change data capture, but what I can do is I can find out if anyone alters columns in the database. And I can find out if anyone adds databases and I can push this to someplace so it’s non-impactful in the application. And I can stand up a database named, DBAs Only and I can have the trigger fire and push the stuff in there and I can go look and review it every day.

Carlos: Get an email when a record gets added.

Brad: There’s lots of lightweight stuff that you can do so that it doesn’t touch the cots system. You can define an extended events session and then you can have a watcher watch the extended events session. And you can rotate that file and define a depth or a history or stuff like that. And that would be a way, because you can actually watch the inserts on the database. Like say, “I’m really concerned about the people table in HR. We believe somebody is monkeying the people table.” Okay, so I’m going to define an extended events session that I’m going to use that as a form of sort of lightweight watchering.

Carlos: Now that has to be table-by-table, right? That’s one of the downsides. You have to know what you want to watch. It’s the same with change data capture, don’t get me wrong, I have to know what I want to watch. Starting with the end in mind. What questions do I want to answer and again, so the developer folks come to me and go, “Hey, how did this change?” It’s like, “Well, that wasn’t on my list of things that I was going to give you an answer about.”

Brad: That’s exactly right. It’s that sort of you have to know it before you need to know it, that you need to know it in order to answer it. I know, isn’t that crazy, right?

Carlos: Isn’t that the definition of experience? Experience is the thing that you get right after when you need it.

Brad: Yes, yes, yes.

Carlos: So, let’s talk a little bit about costs. The costs of putting these in?

Brad: Almost everything is available as a part of Standard Edition. Which means that automatically it’s included in Enterprise Edition. The common criteria compliance, which is in my mind, a wonderful feature for QA.

Carlos: I’m not familiar with this. Do tell!

Brad: Common criteria compliance is…

Carlos: Are we starting in 2012 here?

Brad: It might be 2008 R2. I don’t remember when it was first implemented.

Carlos: Yet another feature I’m not familiar with.

Brad: It’s very cool, because it turns on, in terms of auditing, unsuccessful logins inside of sys.dm_exec_sessions, so it starts to record the unsuccessful and successful logins of the sessions inside the DMV.

Carlos: Isn’t that the default?

Brad: They’re there but it’s not recorded. The columns are there but they’re all null by default.

Carlos: You’ll forgive me. In the agent log I get unsuccessful login records.

Brad: Which is configurable. So you go into system and you go into properties and you go under the security tab, that’s configurable. So you set it to successful only or unsuccessful only. This actually goes per session, right? So once the session is established I can start to see, “Oh well hey, hang on a second. This guy keeps bashing the server.”

Carlos: So they’ve created a connection and they’re allowed public access, if you will. They’re trying to get access to a table and they don’t have permission on it. That’s what it will give you?

Brad: Right. It’s that sort of, if you’re going to implement it you need to probably test it and make sure that I’m not remembering it incorrectly. You always got to test. The other thing that it does is it creates residual memory production, which is where SQL Server goes to grab a new block of RAM and wipes it before it uses it, which you can kind of be forgiven for thinking, “Why wouldn’t you do that?” And you say, “Performance.” Because it’s faster. So you just assume the memory is good and you just start writing your stuff into it. What it does is it wipes it before it uses any block of RAM, again, you have to test.

Carlos: That’s a big change.

Brad: For some systems. Like for a data warehouse, I may be willing to accept the overhead for the security. But for an OLTP, I really wouldn’t suggest it necessarily. Now, it depends on your OLTP, because if you’re sitting on some insane amount of RAM and you’re dealing with like a TB of RAM and your databases are 100 GB, yes you’re okay. But the reality is that most of us have a database size that is closer to or exceeds our RAM count. So that’s where it starts to, you gotta test this. Then the other thing is that the default behavior in SQL Server is that a column over grant overrides a table deny. So back to the people table. I can deny you access on the people table and grant you permission on the zip codes. So you can then go over here and run the stored proc or run this view and group by zip code, gimme a count and show me across the entire world, or the country, or the state or whatever. Drill it down,  and show me all the counts. So you have no permission to the rest of the table, but you can still see the zip codes. That’s just the default behavior in SQL Server.

Carlos: I always thought that the denies overrided everything. If there is a deny, there’s no way you’re gonna get in.

Brad:  When you turn on common criteria compliance, that is exactly what happens. A table-level deny overrides a column-level grant. Which is, when I started by database career, exactly what I expected to happen. When I pulled up my big boy junior DBA pants and said, “But I denied you! You can’t get there.” Wait a minute there junior, not so fast. The default behavior is that the column-level grant overrides the table-level deny.

Carlos: Kind of let’s circle back, right?  When you are setting these up are you sort of doing this by default on all your servers? Auditing as part of your server setup? Or is it still kind of case-by-case?

Brad: In my current environment where I am, it’s definitely case-by-case. Because we are dealing with significant IO considerations in terms of the way the application behaves.

Carlos: A lot more records writing to disk, right? I mean, outside of SQL Server I mean if you’re going to put them in the log, you’re still writing to disk.

Brad: It’s still going to disk, and it is sequential, right? But if you haven’t moved your log’s default location, your log’s default location is in your C drive. So you start to get into this consideration of “hang on a second. How fast is my C drive?” Sometimes it’s the fastest drive in the thing.

Carlos: So that was going to be my question. You’re using a third party tool and you’ve implemented auditing and you turn on the triggers, if you will, but then you are using a third party tool to capture it and monitor it. Maybe capture is the wrong word; take the logs that are there and monitor them or review them.

Brad: No, no, that’s exactly right. And we have enabled the monitoring and the system that it’s on to be historically aware, right? So that it doesn’t just roll off the next there. There’s a depth there to the records. So if we need to answer the question of, “Hey, what happened yesterday?” we have enough depth there to be able to say, “Okay, not only did this happen yesterday but I can see it X number of times in the past period of time.”

Carlos: And I think we’re going to talk about that tool in a second in SQL Family. Let’s wrap up our auditing conversation. Any other thoughts or gotchas that you’ve been working with auditing?

Brad: You know, the only thing worth mentioning really is that it all depends. I know, isn’t that the single DBA answer? “Well, it all depends!” What I mean by that is that you have to know what you’re trying to capture. In suggesting something, I would really just suggest that you start someplace. Start with the lightest weight thing that you can to answer the questions that you get. Whether that’s how many people are connected, how much ram are we using, this database has this thing that we need to know. Whatever the lightest weight thing is you can, for the most frequent question, just start there. Make it the smallest, simplest. But just get started. Because what will happen over time is you will find questions that you didn’t know you needed to answer using the data capturing. Figure out the questions you get the most and then get the smallest unit of data that lets you gather that answer for the business and then start taking that back to the business.

Carlos: Well, there you go. We’ll see where it goes then.

Brad: Right. And the business then will tell you and they will guide you and they will say, “Well that’s useful but what we need is, find the number of widget we make a day and then we need to know what color the widgets are.” Awesome. Now I know what two data points I need to get and then you can start to talk about, “Okay, how are we going to protect the information?” Then it leads to other conversations that lets the DBA be seen as a proactive business-focused individual instead of just a resource or instead of just an expense. Now you’re adding value to the business. Now you’re protecting the lines of business. Now you’re enhancing the agility and the flexibility of the business to respond to the market. Like all these things that the business wants from its IT resources, I think you can find value inside of the auditing records. Just by capturing little bits of information about what’s happening inside the server.

Carlos: Very good. As always, just get started and see what’s there and move forward. Brad, let’s do SQL Family.

Brad: Sure.

Carlos: So one of the things I always like to talk with people is how they work and how they get their job done. So what’s your favorite SQL tool?

Brad: Spotlight.

Carlos: Spotlight. So a paid tool. Why do you like it and what’s it doing for you?

Brad: It does two things: one, it draws pretty graphs.

Carlos: There you go, that’s always nice.

Brad: Which it sounds really off-the-cuff or it sounds really flippant, but those who I’m trying to communicate to specifically in my environment or in general, respond very strongly to a visual representation of a technical problem.

Carlos: Our next question: if you could change one thing about SQL Server, what would it be?

Brad: [sighs] Especially surrounding the auditing, I would love to have a control panel. A security-related control panel so that I don’t have to go into Security. Think tree view, but for users and logins. So, you go here and you say, “I want the pretty method.” But again, this goes back to the ability of the screen capture. What I want to do is I want to go into security and I want to grab a principal, whether that’s the service account or login or SQL login, and I want to be able to display down in a graphical format that I can easily communicate to people how much access this login or principal has and what securables it has access to. I would love to have the ability to graphically represent the cascading nature of the change that was made.

Carlos: What’s the best piece of career advice that you’ve received?

Brad: Don’t mix your honey with your money.

Carlos: [laughing] You’re going to have to help me understand here. Don’t mix my honey with my money.

Brad: Don’t mix your honey with your money. It gets sticky.

Carlos: Are you talking about work relationships?

Brad: Be very careful.

Carlos: Stay away. Keep a ten-foot pole, especially if you’re married already.

Brad: Well yeah. Yeah. Just stay away, that’s all we can say.

Carlos: Our last question for you: if you could have one superhero power, what would it be and why would you want it?

Brad: The whole thing embodied by Wolverine, the whole super healing. It would be possible to do a lot of good in the world if you were able to go places with less fear of self harm and to do things to help people at a greater scale because you could go places and do things that ordinary firefighters can’t do. You’d be able to solve problems that others can’t because you know that you can bounce back.

Carlos: Interesting. Very good. Well Brad, thanks so much for being with us today.

Brad: Thanks for inviting me.

Carlos: We do appreciate it, and it was great to have you on the show.

Brad: Thanks again.