Carlos: So Brad, welcome to the show!
Brad: Thank you very much
Carlos: It was nice of you to be here! We find ourselves in Baltimore today at the SQL Saturday, so thanks for taking a little time to come and chat with us. So you’re actually presenting today on SQL auditing and that’s the topic of our conversation today. You’ve done that a couple of times.
Carlos: And so I’m interested. I know that can be a very boring concept, although here in the United States we just finished our taxes, right? So some people might be a little bit concerned with auditing. But ultimately our conversation is around SQL Server components there
Brad: That’s right, because even in the SQL Server world, no one likes getting that letter.
Carlos: That’s right. So let’s talk about auditing. Why should we be auditing?
Brad: Well, a lot of it has to do with knowing what you don’t know that you need to know until after you need to know it. So it’s a circle, right? That’s kind of a goofy way to explain it, perhaps, but tomorrow or Monday, you’re going to get a note from somebody that says, “Hey by the way, what happened on Friday? We’ve got this question and there’s some instances…” and that’s one reason. Because you have a historical need or a need to know what happened historically. The other reason is to prove that you’re protecting. To prove that… it’s a sword and shield conversation. To prove that nothing was changed. That there’s consistency and constancy in the protection of the, let’s call it protected information because that takes a number of forms. If you’re in the medical space, we call that PII and PHI. Protected personal information, protected personally identifiable information, which would be something like your blood type, right? Personally identifiable information, which is true in almost every business in today’s world, you have to guard the social security numbers. If you’re dealing with payment systems, then you’re dealing with PCI, which is the credit card numbers themselves. The sixteen digit number or the fifteen digit number plus the expiration date. So that has to be guarded. And sometimes it’s about proving the protections. So there’s a number of reasons to do it and a number of considerations.
Carlos: So when we talk about auditing in SQL server, there’s at least in my mind a little bit of a misnomer, because to that point you want to know what’s changed. Auditing feature per se does not let you do that. You have to use Change Data Capture. Now you have an interesting philosophy in that those are one and the same in your mind. I misspoke there: in the sense that I’m going to implement both of them to help me answer that question.
Brad: That’s right. It’s a comprehensive solution that’s provided by SQL Server and if you kind of look at it as a funnel. You can say, “Okay, so at the very high level, who touched my box?” You can get that from SQL, you can get that from Windows via what network connections were made and you can track it that way.
Carlos: Here’s a login. This person logged in at what time?
Brad: Right. Or even, “I got a connection request, a network connection request from this unsecure IP. This is an out-of-range IP, so I want to log it because it didn’t come from one of our junk boxes.”
Carlos: So let’s go into the SQL Server auditing components. What kind of things are you auditing and taking a look at from the, and again because I look at it as features, let’s look at the auditing feature maybe first, and then we can talk about change data capture.
Brad: Sure. So it is of some interest, actually, that you can record a lot of these things to the Windows log, right? And I started there for a reason. Because really what ends up happening is that people don’t understand what you can log to the SQL Server log, because it can actually receive a lot more of this information than people sometimes give it credit for. So if you turn a lot of these things on, you now have an obligation to protect your log files. Who thinks about where the server log files go? Who thinks about that? Have you ever actually tried to move them?
Carlos: It’s in the default location.
Brad: Right! They’re always in the default location and they’re always set to a depth of six. Everybody sets it. Nobody goes, “You can move those?” Yes you can. And when you go historical, you probably should. Because once it rolls over, now it’s a fact. It’s a protectable unit of information that needs to be protected because once you revolve your logs, and you are rolling your logs on a regular basis, right?
Carlos: Well the default is, you get six logs and it rolls over once it restarts. That’s the only option.
Brad: But if you’re dealing with a cluster, or you’re dealing with a 5-9 situation, you shouldn’t be restarting that often. So your log file can actually get actually massive. But it becomes kind of the first step in proving who did what where when how is the SQL Server error log.
Carlos: That’s interesting, because normally I’ve adopted, my philosophy is that I like to cycle my logs every night at midnight. Keep it consistent and easy. So Friday, I know what the log data is. And I’m only keeping thirty days. Your suggestion is that I should keep it longer than that and include auditing? So your idea of auditing is, “I’m going to push more detail to the log and then I’m gonna manage those in a different way.”
Brad: Well, there’s different levels. What I would suggest, and what I suggest in my presentation, is you find a number that works for your organization which is based on the size of data per server. Because every machine ois always different. Now you can go with the corporate standard: we do every Friday and we’re going to keep the depth of 20. So now I have twenty weeks. But then look at the other side of it, which ties to SQL Server auditing itself, which I’ll kind of drill into know. I have to know where the information is going. It’s the feature, right? Not database auditing, we’ll get there in a moment. But auditing at the SQL Server level. I want to know who makes a database and when, right? That’s kind of a basic auditing feature.
Carlos: Right. DDL changes.
Brad: DDL Changes. But I’m going to use the SQL server audit feature and not a trigger, because a trigger pushes to a table and I’m talking about auditing which pushes to a file. So in SQL Server auditing when I push this stuff to a file, I now have the exact same problem that I have with the SQL Server error log. It’s data on disk that can be manipulated by a Sys Admin or someone with rights. And insider attacks are a very real things. So the first thing that even an entry level auditor learns to do is, when it comes to electronic records or electronic communications, what are the rights associated with this. We know that it’s a valid record. So how do you know that the record itself is valid, that the record itself wasn’t changed? If you have some smarty pants and drops a 3 TB database and then knows the logs are there…
Carlos: …and covers his tracks…
Brad: … covers his tracks and makes a little creative auditing and all of a sudden it’s your fault because the log says so. You have a problem, because the log says so. Not because you actually did anything, Mister or Missus DBA. But because the log says you did.
Carlos: Kind of a like who’s watching the watchers.
Brad: That’s right! That’s absolutely right. So when you first talk about auditing you first have to start to talk about: where’s my stuff going, where are my files stored, what is the security surrounding that landing zone? And this comes back to the error logs. Because I may want to capture them as soon as they roll off as a part of the same rollover script, I may want to fire a PowerShell step inside the SQL agent job that rolls the things over and I may want to catch them and move them someplace. So I roll it over and I take the first most recent, which is now .1 and I put it someplace. And I put it someplace that sys admins can’t get to, right? So it’s controlled with AD permissions. Yes, a sys admin of a Windows domain can always go in and do any network share and yada yada yada. But if I have tracking turned on in that directory and I have the necessary permissions turned on that directory, then I should be able to see if somebody modifies the security of that folder. There should be system monitor on it or Microsoft network monitoring or systemic logging when it pushes to the SAN. So sys admins don’t have the SAN permissions so you have to be a SAN admin. Right? So there’s kind of a check and balance going on.
Carlos: That’s seems a little, and I don’t want to say nuanced, right? So let’s say that we are in a trustworthy environment…
Brad: And that’s all relative too. If you’re the only DBA, you know?
Carlos: What other kinds of things, to go back to the SQL Server auditing feature, what other kinds of things am I going to be putting in there. You mentioned creating or dropping tables.
Brad: That’s right. So, well at the server level it would be creating or dropping databases. Creating or dropping logins, modifying system configuration, creating an end point. It’s instance level. Am I creating an end point, right? Am I creating credentials? Am I altering credentials? Am I resigning things using certificates? Am I recreating certificates so I can sign it? Why did it get unsigned in the first place can be an interesting conversation. Because maybe you know about it. Like, we have this stored proc and we’re doing delegation, we’re doing substitution and delegation service accounts are the only ones with the rights to run the thing or whatever. But we’ve got to mod the stored proc. And all the paperwork has been filled out because it’s a production level change. But when you modify the stored proc, when you modify the proc to get the latest business logic in it’s gonna break the certificate because it’s signed. So I’ve got to go back to sign it. Turns out, nobody knows the password to the cert, right? Look, these things happen. Like, DBA Bob who originally signed it five years ago has left the firm. But his is a known business case and it’s trackable. So when the auditors come through and they say, “Show me any security exceptions or exemptions in the last twelve months.” Well, here you go. We had to resign the certificate.
Carlos: You document the reason why.
Brad: That’s right. It all comes down to, “We know what happened.” This isn’t a fly by night operation and we know what’s going on in our organization and when there are exceptions, we know why and we can run them down.
Carlos: Yeah that’s interesting. And I think, knock on wood, I’ve been lucky there that I haven’t been in environments where auditors have been busting down my doors wanting to know that level of detail. And I think that’s the trick or the gotcha, if you will, with auditing that’s the instance level. A lot more common question I get is, “Who changed this data?” My number was six today and now it’s seven. What happened? And auditing unfortunately doesn’t give us that granular of detail.
Brad: Well, that’s intended for change data capture. Which if you marry that with a database level database audit, but basically, you have to audit the database too. If those rights get elevated or get escalated, now all of a sudden I have a way of saying, “You shouldn’t have the ability.” Let’s talk about the HR database and the HR table, to get specific. If you have somebody who’s modifying the salary numbers of the people table, you want to know who, you want to know when. You don’t know why, but that’s why we turn these things on and ask why these things happen.
Carlos: And it would be helpful if you knew what the original values were.
Brad: That’s right. That’s exactly where change data capture comes in. Another interesting piece o information to know is when did you get rights to do that? Or how did you get rights to do that? Because that’s why I would turn on database auditing. Because if somebody granted that login permission an hour before the change is made and we’re talking production HR stuff, now I have a very interesting conversation. Because I’m going to have a conversation with the grantor and with the grantee. Like, “Hold on a second. This is people data in the HR system in production. This is not a paperwork free zone. To change this manually through SQL and not through he application and not through the application logins and we need to see a paper trail. Because this is actually really serious.”
Carlos: That sounds like a better scenario for creating a new server, like I create a server for change auditing. I can capture all of that. I’ve got SQL servers now that have users. What’s auditing doing for me other than capturing the changes. How can I get to that level? I guess it’s not going to help me inspect who currently has write access to those tables.
Brad: That’s true. This is sort of one of the logical fallacies of all of it. Which is, I have to know what I need to know before I need to prove it. And so that’s the real challenge of any existing environment. Or, a cots product. Where I can’t necessarily turn change data capture on, because if I turn the shadow table on, now any failure in the cots product makes the entire thing unsupportable. I can’t open a support case with vendor x because I added the change data capture shadow table. And I can’t add a trigger on the table.
Carlos: Right. Make modifications to the table. They frown upon that.
Brad: So the simplest of attempts to understand a performance issue, the vendor considers invasive. Lot of people are in that situation. Lots of DBAs find themselves the sole supporter of an application or the sole DBAs. You’re the lone DBA on the ranch. How are you then going to get meaningful information about your systems, because you can’t turn on change data capture, let’s say. And so, there’s sort of this in between space that exists where you can turn on things like DDL triggers and you define it and you can say, Well OK, I can’t necessarily turn on change data capture, but what I can do is I can find out if anyone alters columns in the database. And I can find out if anyone adds databases and I can push this to someplace so it’s non-impactful in the application. And I can stand up a database named, DBAs Only and I can have the trigger fire and push the stuff in there and I can go look and review it every day.
Carlos: Get an email when a record gets added.
Brad: There’s lots of lightweight stuff that you can do so that it doesn’t touch the cots system. You can define an extended events session and then you can have a watcher watch the extended events session. And you can rotate that file and define a depth or a history or stuff like that. And that would be a way, because you can actually watch the inserts on the database. Like say, “I’m really concerned about the people table in HR. We believe somebody is monkeying the people table.” Okay, so I’m going to define an extended events session that I’m going to use that as a form of sort of lightweight watchering.
Carlos: Now that has to be table-by-table, right? That’s one of the downsides. You have to know what you want to watch. It’s the same with change data capture, don’t get me wrong, I have to know what I want to watch. Starting with the end in mind. What questions do I want to answer and again, so the developer folks come to me and go, “Hey, how did this change?” It’s like, “Well, that wasn’t on my list of things that I was going to give you an answer about.”
Brad: That’s exactly right. It’s that sort of you have to know it before you need to know it, that you need to know it in order to answer it. I know, isn’t that crazy, right?
Carlos: Isn’t that the definition of experience? Experience is the thing that you get right after when you need it.
Brad: Yes, yes, yes.
Carlos: So, let’s talk a little bit about costs. The costs of putting these in?
Brad: Almost everything is available as a part of Standard Edition. Which means that automatically it’s included in Enterprise Edition. The common criteria compliance, which is in my mind, a wonderful feature for QA.
Carlos: I’m not familiar with this. Do tell!
Brad: Common criteria compliance is…
Carlos: Are we starting in 2012 here?
Brad: It might be 2008 R2. I don’t remember when it was first implemented.
Carlos: Yet another feature I’m not familiar with.
Brad: It’s very cool, because it turns on, in terms of auditing, unsuccessful logins inside of sys.dm_exec_sessions, so it starts to record the unsuccessful and successful logins of the sessions inside the DMV.
Carlos: Isn’t that the default?
Brad: They’re there but it’s not recorded. The columns are there but they’re all null by default.
Carlos: You’ll forgive me. In the agent log I get unsuccessful login records.
Brad: Which is configurable. So you go into system and you go into properties and you go under the security tab, that’s configurable. So you set it to successful only or unsuccessful only. This actually goes per session, right? So once the session is established I can start to see, “Oh well hey, hang on a second. This guy keeps bashing the server.”
Carlos: So they’ve created a connection and they’re allowed public access, if you will. They’re trying to get access to a table and they don’t have permission on it. That’s what it will give you?
Brad: Right. It’s that sort of, if you’re going to implement it you need to probably test it and make sure that I’m not remembering it incorrectly. You always got to test. The other thing that it does is it creates residual memory production, which is where SQL Server goes to grab a new block of RAM and wipes it before it uses it, which you can kind of be forgiven for thinking, “Why wouldn’t you do that?” And you say, “Performance.” Because it’s faster. So you just assume the memory is good and you just start writing your stuff into it. What it does is it wipes it before it uses any block of RAM, again, you have to test.
Carlos: That’s a big change.
Brad: For some systems. Like for a data warehouse, I may be willing to accept the overhead for the security. But for an OLTP, I really wouldn’t suggest it necessarily. Now, it depends on your OLTP, because if you’re sitting on some insane amount of RAM and you’re dealing with like a TB of RAM and your databases are 100 GB, yes you’re okay. But the reality is that most of us have a database size that is closer to or exceeds our RAM count. So that’s where it starts to, you gotta test this. Then the other thing is that the default behavior in SQL Server is that a column over grant overrides a table deny. So back to the people table. I can deny you access on the people table and grant you permission on the zip codes. So you can then go over here and run the stored proc or run this view and group by zip code, gimme a count and show me across the entire world, or the country, or the state or whatever. Drill it down, and show me all the counts. So you have no permission to the rest of the table, but you can still see the zip codes. That’s just the default behavior in SQL Server.
Carlos: I always thought that the denies overrided everything. If there is a deny, there’s no way you’re gonna get in.
Brad: When you turn on common criteria compliance, that is exactly what happens. A table-level deny overrides a column-level grant. Which is, when I started by database career, exactly what I expected to happen. When I pulled up my big boy junior DBA pants and said, “But I denied you! You can’t get there.” Wait a minute there junior, not so fast. The default behavior is that the column-level grant overrides the table-level deny.
Carlos: Kind of let’s circle back, right? When you are setting these up are you sort of doing this by default on all your servers? Auditing as part of your server setup? Or is it still kind of case-by-case?
Brad: In my current environment where I am, it’s definitely case-by-case. Because we are dealing with significant IO considerations in terms of the way the application behaves.
Carlos: A lot more records writing to disk, right? I mean, outside of SQL Server I mean if you’re going to put them in the log, you’re still writing to disk.
Brad: It’s still going to disk, and it is sequential, right? But if you haven’t moved your log’s default location, your log’s default location is in your C drive. So you start to get into this consideration of “hang on a second. How fast is my C drive?” Sometimes it’s the fastest drive in the thing.
Carlos: So that was going to be my question. You’re using a third party tool and you’ve implemented auditing and you turn on the triggers, if you will, but then you are using a third party tool to capture it and monitor it. Maybe capture is the wrong word; take the logs that are there and monitor them or review them.
Brad: No, no, that’s exactly right. And we have enabled the monitoring and the system that it’s on to be historically aware, right? So that it doesn’t just roll off the next there. There’s a depth there to the records. So if we need to answer the question of, “Hey, what happened yesterday?” we have enough depth there to be able to say, “Okay, not only did this happen yesterday but I can see it X number of times in the past period of time.”
Carlos: And I think we’re going to talk about that tool in a second in SQL Family. Let’s wrap up our auditing conversation. Any other thoughts or gotchas that you’ve been working with auditing?
Brad: You know, the only thing worth mentioning really is that it all depends. I know, isn’t that the single DBA answer? “Well, it all depends!” What I mean by that is that you have to know what you’re trying to capture. In suggesting something, I would really just suggest that you start someplace. Start with the lightest weight thing that you can to answer the questions that you get. Whether that’s how many people are connected, how much ram are we using, this database has this thing that we need to know. Whatever the lightest weight thing is you can, for the most frequent question, just start there. Make it the smallest, simplest. But just get started. Because what will happen over time is you will find questions that you didn’t know you needed to answer using the data capturing. Figure out the questions you get the most and then get the smallest unit of data that lets you gather that answer for the business and then start taking that back to the business.
Carlos: Well, there you go. We’ll see where it goes then.
Brad: Right. And the business then will tell you and they will guide you and they will say, “Well that’s useful but what we need is, find the number of widget we make a day and then we need to know what color the widgets are.” Awesome. Now I know what two data points I need to get and then you can start to talk about, “Okay, how are we going to protect the information?” Then it leads to other conversations that lets the DBA be seen as a proactive business-focused individual instead of just a resource or instead of just an expense. Now you’re adding value to the business. Now you’re protecting the lines of business. Now you’re enhancing the agility and the flexibility of the business to respond to the market. Like all these things that the business wants from its IT resources, I think you can find value inside of the auditing records. Just by capturing little bits of information about what’s happening inside the server.
Carlos: Very good. As always, just get started and see what’s there and move forward. Brad, let’s do SQL Family.
Carlos: So one of the things I always like to talk with people is how they work and how they get their job done. So what’s your favorite SQL tool?
Carlos: Spotlight. So a paid tool. Why do you like it and what’s it doing for you?
Brad: It does two things: one, it draws pretty graphs.
Carlos: There you go, that’s always nice.
Brad: Which it sounds really off-the-cuff or it sounds really flippant, but those who I’m trying to communicate to specifically in my environment or in general, respond very strongly to a visual representation of a technical problem.
Carlos: Our next question: if you could change one thing about SQL Server, what would it be?
Brad: [sighs] Especially surrounding the auditing, I would love to have a control panel. A security-related control panel so that I don’t have to go into Security. Think tree view, but for users and logins. So, you go here and you say, “I want the pretty method.” But again, this goes back to the ability of the screen capture. What I want to do is I want to go into security and I want to grab a principal, whether that’s the service account or login or SQL login, and I want to be able to display down in a graphical format that I can easily communicate to people how much access this login or principal has and what securables it has access to. I would love to have the ability to graphically represent the cascading nature of the change that was made.
Carlos: What’s the best piece of career advice that you’ve received?
Brad: Don’t mix your honey with your money.
Carlos: [laughing] You’re going to have to help me understand here. Don’t mix my honey with my money.
Brad: Don’t mix your honey with your money. It gets sticky.
Carlos: Are you talking about work relationships?
Brad: Be very careful.
Carlos: Stay away. Keep a ten-foot pole, especially if you’re married already.
Brad: Well yeah. Yeah. Just stay away, that’s all we can say.
Carlos: Our last question for you: if you could have one superhero power, what would it be and why would you want it?
Brad: The whole thing embodied by Wolverine, the whole super healing. It would be possible to do a lot of good in the world if you were able to go places with less fear of self harm and to do things to help people at a greater scale because you could go places and do things that ordinary firefighters can’t do. You’d be able to solve problems that others can’t because you know that you can bounce back.
Carlos: Interesting. Very good. Well Brad, thanks so much for being with us today.
Brad: Thanks for inviting me.
Carlos: We do appreciate it, and it was great to have you on the show.
Brad: Thanks again.