Episode 11: SQL Server Statistics

1400So it turns out an old dog can learn new tricks–or at least that is what happened to me when I chatted with Grant Fritchey from Red Gate software about SQL Server Statistics.  Always taking an opportunity to teach, Grant talks about why statistics are important and we go over some fundamental items you may just want to know about.  As always compañeros, have fun on the SQL trail.

Show Notes

Grant on Twitter
Grant’s blog
Grant talks about Minion ReIndex
SQL Server Statistics Questions We Were Too Shy to Ask
traceflag 2371
DBCC Show_Statistics

Transcription: SQL Server Statistics

Carlos L Chacon: Welcome to the SQL Server data partners podcast. My name is Carlos L Chacon, your host, and this is episode 11. Today were talking about statistics, with Grant Fritchey from Red gate software. Grant is a Microsoft SQL Server MVP, and product evangelist for Red gate.

Now, you can teach an old dog new tricks, and I found out with my conversation with Grant that there’s a few things about statistics that I still didn’t know. Even after my preparation for our conversation. I hope that you will find it enjoyable. Grant is always a funny guy to chat with.

We’ll have show notes available for you at sqldatapartners.com/podcast. We are going a little bit longer than normal. We’re actually going to push about 40 minutes today, so hang in there till the end. It will be well worth it.

You may notice that the sound quality is a little bit better here in this opening, giving you a little bit of preview if you will for the new microphone that I’m using. This will be going full time in the next episode, episode number 12.

Hang in there with us for one more episode and we’ll start getting quality better in episode number 12. Again, this is episode number 11, compañeros, as always, welcome to the show.

Children: SQL Data Partners[music]

Carlos: Grant, welcome to the show.

Grant Fritchey: Hey, thank you. Thanks for having me.

Carlos: I remember a job interview that I had several years ago. One of the things they asked me was about statistics.I don’t know that I remember my exact comment, but I know that I was little bit nervous about it because I had grabbed something online, and I think I gave an answer about helping the query, but I didn’t necessarily have a really, really good grasp.

Over the years, some of the content that you’ve put out has really helped me to understand statistics better. I’m super glad that we can have you on the show here today.

Grant: Thanks, I appreciate it.

Carlos: Let’s go ahead and jump into that. What would we give as a better description of statistics than I gave in that interview those many years ago.

Grant: [laughs] The best way to put it, is that statistics are the principal driver for the decisions made by the query optimizer. They’re the first thing that’s going to look at it, in addition the objects you’re referencing, it cares about statistics, because the statistics are what allows it to make its row target.Its guesses as to how many rows it’s like to retrieve based on the indexes, the columns, your where clause, all that fun stuff, and how you’re doing your joins. All that plays into a factor, but it all feeds off of statistics.

If you’ve got good statistics, the optimizer’s going to make better choices. If you’ve got no or questionable statistics, the optimizer might make really, really bad choices. It all comes back to statistics driving the optimizer.

Carlos: That’s right, so I think I’ll go with that. The statistics are used or a sample data if you will. In the table that I have, how many of this value do I have? I’m going to keep a record of some of that. I think we can go as high up to 200 records? Now that hasn’t changed, right?

Grant: No, no that hasn’t changed yet. You’re talking about the histogram.

Carlos: The histogram? That’s right. It keeps that information up to 200 lines of what my estimated data is. Then it uses that, as you mentioned, to then you formulate those queries and be able to know based on what you’ve just requested of me, how my best able to give you back that data.

Grant: Absolutely. The funny things is, because you’ve just hit the most important thing about statistics. You created index, person ID. You expect the person ID index to be accurately reflected in the statistics and it will be.Let’s say you create another index. You create this index on entry date and person name or something. The interesting bit is that the histogram, which you already nailed on as being extremely important, is only going to be on that first column.

Even though you a compound index, even though you got a high degree of selectivity and possibly a very accurate representation of data possible, the histogram is only on the first column.

It limits…I don’t want to say limits the capability, but it certainly means that that first column and column choice in your indexes is extremely important because of how statistics deals with it.

Carlos: That’s an interesting though there, and that may be why I think sometimes we can craple with our indexes and which ones we should be using. You mentioned column choice there.We’re getting into that. Why these statistics are then important is because ultimately they’re going to affect our queries, right? They may affect our index choices or the way we create those? That order…

Grant: Yup.

Carlos: …that you mentioned. That’s why we’re caring about them. Now you mentioned that they’re only going to be created on that first column of my index. How can I look at the statistics of my index or my table to see what statistics are there?

Grant: It’s actually funny, you’re going to be looking at the database consistency check, or, no they don’t call it that anymore. It’s the database control console or something, DBCC, I forgotten what the new name is. It was database consistency check for 20 years, so I’m old.[laughter]

Grant: Anyway, DBCC…

Carlos: I hadn’t realized they renamed it, so I guess that shows how much attention I’m paying to there.

Grant: Yeah, there is a new name. DBCC has a command called “show statistics.” That’s your baby. It’s really simple command that calls DBCC show statistics, pass it in an object name, table.Then it will show you the statistics on the table. Passing an index, it will show you statistics on that index. You can also look at just statistics because statistics are created on a index automatically.

But, assuming you’ve got the defaults on SQL server when you reference a column in a where clause, if it does not have an index, SQL server will go and create statistics for the column. It’s auto-creative statistics.

It’s a good thing, don’t panic, but it is something going on. You’ll get these funky system statistics names on your tables if you go and look at the statistics there. You can look at the management studio, they’ll you which statistics you’ve got.

Carlos: Those auto-generated ones are the ones that start with WA.

Grant: Yeah, WA, which means the state of Washington.

Carlos: State of Washington. Good old Washington. My wife’s from that state, so I can’t complain too much.[laughter]

Grant: I still don’t know quite why it’s in a system name, but there it is.

Carlos: There you go. Naming choices can be important, right? You’ll never know when that’s going to come back to show up in millions of people’s environments.

Grant: Some developer having a fun one day, now you live with it forever.

Carlos: Yeah, that’s right. We’ve talked a little bit about those statistics, where they live, how we want to see them. We’ve talked about changing those statistics or maybe updating them. What kind of influence do we have over those statistics?You mentioned they get created by an index. They get created…excuse me…on an index by default. Do we have any control over when they get updated or how to change.

Grant: No, none at all. [laughs]

Carlos: I’m like, “Ugh, I should think I’ve got one.”[crosstalk]

Grant: [inaudible 9:05] . Of course you could. You’ve got quite a few switches and knobs that you can turn. Probably not as many as people would like. As important as statistics are, the level of control and degree of control we have is somewhat limited.As you said, statistics are created automatically on one indexes or created automatically on those columns. They’re also maintained automatically. As your data’s modified, add, delete, edit, you cross certain thresholds inside that data and then it updates the statistics.

The rules are simple. If there’s no data at all and you add data, you get new statistics. That one’s easy. If you are less than 500 rows in a table and you modify up to, you modify more than 500, by modify I mean add, edit, delete…

Carlos: Changes, 500 changes.

Grant: Five hundred changes. Then you will get a statistics update. Then above 500 rows, it’s…

Carlos: I think it goes to 20 percent then, 20 percent of change?

Grant: Yeah, 20 percent. Once it’s above 500 rows, it’s 20 percent of the table’s changed. Thank you, gosh. I need a new brain.That’s where things get fun. This is the automatic process. Two things to know about it. One, that’s the automated process by which it goes on. Those are going to be occurring while you’re working, but two things to know.

One is that 20 percent, when you’re looking at 500 or 1,000 rows, it’s not that many rows that have to get modified. Ten thousand, hundred thousand, it’s still not that many modifications.

When you get to a million rows, 10 million rows, 500 million rows, suddenly 20 percent gets extremely, excruciatingly painful to wait for a statistics update. That’s an issue, and we’ll talk about that in one second, because it’s one of the knobs you can’t tweak.

The other issue that you’re going to run into is that the statistics update, the automatic statistics update is sampled. Meaning, it doesn’t read the entire data set like it does when, let’s say you create an index, it’s going to read the entire data set, come up with a distribution of that data and give you what the statistics look like.

Whereas, the automatic updates are sampled, so you are going to see a slightly different data values, because it’s going to just take 200 random points inside you data set and come up with what you’ve got. That could be problematic.

Now you’re going to talk about the controls that we have. We do have a few controls, and the first one addresses that 20 percent.

Carlos: Yes.

Grant: What we can do is we’ve got a traceflag 2371. It’s available in 2008 R2SP1 or greater. [laughs] No I don’t remember that, I looked it up.

Carlos: You have to look it up every time?

Grant: Yeah, I have to look it up every time. What it does is after 25,000 rows, it turns the percentage down over time. As the number of rows you have grows, the percentage of changes required before an automatic update occurs is reduced and reduced and reduced.I think somebody calculated what their calculation is, but the calculation’s not published so I don’t bother telling people. I just say, “It works better.”

Carlos: It works better, there you go. You get beyond that 20 percent quicker and…

Grant: Yeah.

Carlos: …It enable you to have better stats there.

Grant: Absolutely. Then when you get into other controls, you can do is you get into…this stuff gets a little bit more…how best to put it…esoteric and arcane. There are other traceflags, 4137. For 2014 and greater, 9471 and 9472.These affect the way the optimizer makes calculations based on compound predicates. Meaning, I’ve got two columns. I’m doing “and”, or I’m doing an “or”. In that case, you can affect the way it makes those calculations by changing these traceflags.

They’re used pretty rarely. You’ve got to be in a pretty particular situation and most of the time, these traceflags are only…They’re not set on the server, most of the time. Most of the time, they recalled it as a quitter end.

It’s pretty rare to use them. I like to bring them and mention that they’re there, and talk to you, “Hey when you’re hitting possible selectivity issues in your and or clauses, they’re going way…I don’t know why it’s doing this. Maybe try some traceflags, maybe. Experiment anyway.”

Carlos: Very good. I haven’t used those traceflags, to be honest at all, in doing what I’m doing. Maybe working with a little bit smaller potatoes. [laughs]

Grant: Yeah. I’ve not personally used them, I’ve seen them used.

Carlos: Ultimately, that discussion was…

Grant: Knob twisting.

Carlos: That’s a little bit of knob twisting. We can’t influence the way SQL server chooses to do the…build those statistics. For example, you talked about sampling versus full scan.

Grant: That was the other thing I meant to bring up. Go ahead, or you want me to go ahead?

Carlos: Oh no, go ahead.

Grant: I did mention knobs, right? The first knobs you have are these traceflags to affect how these things are consumed, or indicates the 2371 how they’re generated. Past that, you’ve got a command called UPDATE_STATISTICS.Direct, create…and you guys also create statistics for that matter, or drop statistics. But the UPDATE_STATISTICS is the interesting one, because the UPDATE_STATISTICS lets us determine how we want to do our statistics changes.

You can say, “UPDATE_STATISTICS,” and then define a sample rate say, “Oh, 50 percent of the data. Look at half of it. I can’t afford you look at all of it because I’m feeling It affect my performance when I do that.”


Grant: Yeah, we can do that, or you can say, “Hey, UPDATE_STATISTICS against this particular table or against this statistics with full scan,” meaning look at all the data I need. Accurate, accurate, accurate statistics. You can make adjustments up and down there from that.You can also turn off Auto Update Statistics individually. You can have it on for everything, but then you can do an UPDATE_STATISTICS against a particular table, and then tell it NORECOMPUTE. What that does, it disables the Auto Update Statistics.

If you’re doing the end memory tables by the way, you can…there is no automatic update on statistics. None. You do need to maintain your statistics yourself. However, I’m going to test this on 2016, but on 2014, the end memory tables, you can’t run DBCC SHOW_STATISTICS.

Carlos: Interesting.

Grant: You can’t see how out of date or off your statistics are, you just have to make a guess and say, “Yeah, we’ll update them now.”

Carlos: My DBA aura is telling me that, “Yes now is the time.”

Grant: Yeah, right, exactly. It’s just feeling it one with the universe.

Carlos: That’s right.

Grant: Yes, now we should update our statistics.

Carlos: Interesting. That’s something that I was going to wear off, is being able to disable the automatics statistics on an individual table, or I guess I’m assuming, an index as well.

Grant: Or just a single statistic, yeah.

Carlos: Or a single statistic. What scenarios would that be most likely?

Grant: You’re talking high end stuff, right? You’re getting up in the “We’ve got millions of rows and we’ve set the statistics during what we consider out downtime or our maintenance window. We’re so big, statistics updates are painful.”For most people, they won’t even know the statistics update. You won’t even feel the fact that they’re occurring, but you’re in that situation where statistics update’s painful so, “I’m going to disable it.”

Or, you’re in a situation where, “We need to sample on a very particular rate, whether it’s a full scan or reduced value, and we don’t want to rely on the auto stats update,” which is uncontrolled to a very large degree.

“We’re going to control it. It’s going to do exactly this way, exactly like this, but we’re going to disable it.” But again, that’s where you either got a very odd data distribution, or you’ve got big, big, big data. That’s usually what it is.

If you got really odd data distribution and small data, then just update the statistics a lot. It’s your best… [laughs] We had a situation that was a horrible database design. Horrible. I freely admit this, it wasn’t my design, but I was trying to fix it and failing. I freely admit.

I didn’t know what to do to fix it. The situation was so bad, the data coming in, it would age the statistics and then we would get a bad parameter sniffing issue would occur, it would make really horrible gasses on the statistics and we would get bad execution plans.

The queries were running…normally running, you know 100 milliseconds suddenly running in 10 seconds, it was screaming bad. We were doing update statistics with full scan every five minutes on one table.

Carlos: I was going to say, that should probably be a very unique situation in a one off, I guess on one table, right?

Grant: It absolutely was, it was a horrible, horrible compromise but it actually fixed the problem that we were having because we got clean statistics on a regular basis. There’s a little bit of locking, a little bit of blocking every five minutes but not as noticeable as when that query suddenly became a 10-second query.

Carlos: That’s an interesting thought there. You mentioned the influence that would feel on updating statistics.On an individual instance, I think I would agree that the one area that I have seen where the statistics can potentially…or updating them and this is your maintenance strategy is if you have even 20, let’s just say even 20 instances in your environment and you have your maintenance to kick off on all 20 at the same time.

You are starting to update those statistics and let’s say you are doing the full table scans. Your SUN operators are going to come back and say, “Hey, you know I’m noticing this huge spike one o’clock in the morning.”

Grant: We need to talk.

Carlos: We need to talk, that’s right, you are overloading my buffer with cash and everything. Now as we talk a little bit about maintenance and doing that…

Grant: Sure, that’s one of the issues.

Carlos: One of the issues that can come back to bite you. I think we need to take a look at updating statistics. How often should we be updating them? Luckily SQL Server does that for us in a sense automatically. There’s some care and feeding involved that automatically is done with the statistics.

Grant: In most systems most of the time that’s actually adequate.

Carlos: I would say combined with my index rebuilds. If I’m doing index rebuilds, my statistics get automatically updated.

Grant: When you rebuild an index it does a full scan on the statistics that it recreates because rebuilding the index is basically a new thing, it’s literally rebuilding the index. When it does that, it also does a full data scan for the statistics that it creates.You get the most accurate statistics after an update like that which brings up…one of the classic, classic problems that people have is that they will say, “OK, well, I’m going to rebuild indexes then I’m going to do my statistics maintenance.”

They will rebuild their index, that’s takes a period of time or whatever, it finishes and all great, you’ve got all of these wonderfully new up to date statistics on any of the index that were rebuilt.

Then you run sp_updatestats which is the shorthand that everybody uses for doing update statistics instead of doing the update statistics command. Microsoft gave us a tool sp_updatestats, it works.

Carlos: It’s simple, straight forward, we will get everything.

Grant: It’s all clean, it helps you out, it’s all great except that they run that immediately after rebuilding all of their indexes and there’s a threshold that sp_updatestats has, is that if you cross that threshold it would then sample update your statistics. The threshold is really simple, if the ROMAD counter is greater than one.

Carlos: [laughs] I did not know that.

Grant: If you’ve done a full scan statistics update as part of your index rebuild, now you’ve got the most accurate statistics you could have, then you run sp_updatestats against it, everybody has touched a row [inaudible 22:17] you now get sample statistics in place of your beautiful perfect set of statistics.

Carlos: All right, yeah. Let’s talk about how do we get around some of that? I think part of that is not running my update statistics or not running update stats every night as a nightly process.

Grant: That can [inaudible 23:46] a dance. Most of our systems we found we needed to do index rebuilds once a week, needed to. I’ve heard people say, “Oh, our index rebuild doesn’t help that much,” well it can, I’ve seen it help a lot.We were doing that about once a week but then statistics were aging a lot faster than the indexes were fragmenting. We were doing stat updates once a day

Carlos: There was enough data being added that your stats were becoming a little bit worked and you needed to update those but your indexes from a fragmentation perspective still looked OK.

Grant: Yeah, no big deal. We were doing much more frequent stats updates than index rebuilds.

Carlos: I think again that’s probably a little bit more of knowing your environment. I would think it goes to further that that’s probably on a table by table basis potentially as well.

Grant: Potentially, yeah. You may have hotspots right, like we had that bad design, I bring that up because at this point it’s 10 years ago when we had the problem and it ultimately got fixed by redesigning the entire structure

Carlos: Very good.

Grant: [laughs] That was the solution. The issue there is that we had a hotspot, we had a particular area that needed tender loving care.You do have to be aware of your systems and how they behave to understand that, this area most everything is going to work fine. We do an index rebuild once a day, it’s not hurting any performance issues, the SUN guy is not complaining, everything is fine.

We probably don’t need to do statistics maintenance that often, you can look into other alternatives or you may find that you are not doing those updates very often on the indexes but you still got a lot of data volatility, either it’s getting modified, deleted, inserted, you need…

Carlos: This is going to lead us to…you mentioned the control that we didn’t have and I know that…actually talking with some of my Oracle buddies.We had a statistics problem one time…anyway he mentioned…the Oracle guy was laughing at us because they can actually import and export statistics, I was like, “Whoa! Whoa! That’s pretty cool, we are working at their mercy,” right but one of the things…

Grant: We can do that.

Carlos: You can import and export statistics?

Grant: Yeah.

Carlos: Then you’ll have to point me to where I can find that.

Grant: You just gave me a good idea for a blog post.

Carlos: There you go, that’s right, because I have not seen that. That would be interesting to know. One of the things that we can do to that effort is if we see our data changing in a specific area is that we could also then start looking at either filtered stats or…

Grant: I left out filtered stats, thank you.

Carlos: Potentially, I guess I’m assuming that’s maybe a date parameter or something, that that new data is coming in and that’s where you are having your problems or your stats are going to have a date. Filtered stats could potentially help you with that.

Grant: Absolutely, that’s what they are there for. They were created the same time as they created filtered indexes. It’s the same idea, the filtered index is going to have automatically filtered statistics but you can’t just create filtered statistics independent of indexes.That will give the optimizer more tools for figuring out how many rows you are getting back. That’s a huge win depending on your data distribution, it’s not something that everybody runs out and starts applying, statistic filters.

There are a lot of tools, it’s just that it’s not always straightforward what the problem is, it becomes one of those things of being able to identify what the query itself is fairly well written, the structure is in place, the indexes are in place.

Why does it think it’s going to get a million rows when it’s only going to get one or vice versa, why does it think it’s going to get one row when it’s going to get a million? It’s making that determination of why is it making those choices lead you into statistics and then you can figure out what you have to do to fix them.

We do have quite a few tools, we don’t have…like we do have the 200-steps histogram, Oracle can adjust that. They can say, “Well, for this index we wanted to have 500 steps.” I’m not an Oracle guy, I don’t know what the limits are but they can make adjustments on that.

They do have more knobs to tweak than we do. Believe me there are occasions where I would love to be able to go, “Yeah, I don’t care if it costs extra maintenance overhead, I want 1,000 rows on my histogram”

Carlos: Yeah I want a little bit more. That’s a question for you there, off the cuff. When we have a…let’s say we have a medium sized table, let’s just say 10,000 rows, 15,000 rows, we know that we can have up to 200 steps in our histogram.Why would there be occasions when there’s only, let’s say 150 and it’s not using all 200?

Grant: It’s actually funny, it largely depends on the data and the distribution of that data. If you’ve got a lot of duplicates in your data it’s not going to use 200 steps to define it.If you’ve got constraints in place that says only certain data types can go in here, it actually uses those in front of the calculation and it can figure out how many rows it puts in. I have a demo I do where I create statistics and I get two steps inside the histogram on a million rows.

Funny thing is it’s not because I put it as a [inaudible 30:12] , I would have done that too. It has to do with the fact that I put a constraint in place that it could only ever have one value but I still got two rows in the histogram which I thought was hilarious.

Carlos: Interesting.

Grant: If you look at it, it’s determination is the same, it knows that it’s only going to ever return one row but for some reason it had to give me two steps, I don’t know why.

Carlos: Because one just wasn’t enough in that case.

Grant: It should have been able to do with a single step in the histogram but I think it needs to have a start and stop point, so it created two.

Carlos: There you go compañeros. I think you should be able to tell the statistics while maybe in principle it’s a fairly straightforward idea, lots of different considerations to keep in mind and we are not going to be able to get to all of them today.

Grant: No.

Carlos: There are a couple of opportunities for you to learn a little bit more about statistics and we’ll invite you to check out sqldatapartners.com/podcast for all the show notes from today’s episodes.One of the articles that we will point you to is an article Grant wrote for Simple Talk called “SQL Server Statistics Questions We Were Too Shy To Ask”. It goes through some of the things we talked about today but also includes some additional detail.

For those of you who are headed to the PASS Summit in Seattle this October, Grant will be giving a session there on statistics for the new data pro. That maybe of interest to you, and we’ve talked about several different things, but again there will always be more.

Before we let you go Grant, just a couple of standard questions, one of the things we would like to do is to try and provide some value to folks. We want would like to talk about favorite tools. What’s your favorite SQL tool? Could be a page tool, free tool, why do you like it and how do you use it?

Grant: I’m going to come off prejudiced, but my favorite SQL tool is Red Gate SQL Prompt, and I work for Regus Software, I feel bad saying that, but I can’t write TC Code without it.[laughter]

Grant: That’s a fact. If I stopped working for the company, if they fired me, I would still buy the product, because it makes it so that I can write TC Code.

Carlos: We would like to hear one of your favorite stories. Ultimately, we would like to hear about why you enjoy being a database professional?

Grant: Why do I enjoy being a database professional? I think those are personal issues. [laughs] because I do enjoy it, I really do. I don’t remember the specifics on this one, but I’ll tell the story anyway because it illustrates why I enjoy being a DBA. The DBA’s job is dull and invisible, until the emergency happen.

Carlos: Till it hits the fan.

Grant: To a large degree, a good DBA is an invisible DBA although, I don’t believe in that, you should be engaged with everyone and very much involved in what you do.What the company does is a business and what the developers are doing and all that stuff, but by and large you can lie around in the background and be a happy camper, until everything goes south.

It was Friday, at five o’clock, I swear, walking out the door with some friends, and we hear a long stream of curses, coming from the stand hand maintain.

You are saying, “OK, whatever,” and you keep walking, until every single one of us…we were all two DBA’s, myself and another DBA, a couple of admin guys, everyone’s phone went off at once.


Carlos: Oh boy, database down.

Grant: You know that, Oh, there is a problem. [laughs] I’m looking at the door, it’s right there. I turned around going back, turn on your computer. What the heck is going on? Wow, we can’t get to like half, more than half of our servers are offline.They’re all gone, and we don’t know why. The SQL Server Instances are offline, the Windows Instances are offline, we’re going round and round.

I miss Friday afternoon, I’m punched drunk [inaudible 35:04] , but I start cracking jokes, I’m having a good time and I’m really enjoying myself, I’m getting very excited. We are figuring out what happened just like the SAN guy turned off the SAN. [laughs]

We really quickly had to get all our servers back online, we really quickly had to go through all the databases and do all the DBCC consistency checks, because they had a hard crash on the disk. We had to have script written and all this stuff done, very quickly for the short recovery. It was just so much fun.

I had a really good time, because it’s the adversity seems to be the moment when things are entertaining and shining, you get to step up and do stuff that you just don’t normally get to do. It was a lot of fun, and it’s the one thing that makes to me, I’m a database professional because, when the stuff hits the fun, I get pumped.


Carlos: Pump starts raising and things start happening. Compañeros, we do have another opportunity for you to learn about SQL Server. You can’t see it, but Grant’s actually wearing a SQLCruise shirt at the moment. Grant, you want to talk about SQLCruise for a second?

Grant: Yes, I take every opportunity I can to talk about a SQLCruise. Let’s put it this way, SQLCruise is a very intense and long-form classes SQL Server training with I would consider some of the better trainers out there, and somehow, I’m in the list too.[background music]

Carlos: [laughs] That’s right.

Grant: It’s great classroom time, some very serious training from very serious people, Kevin Klein, [inaudible 36:51] ,Tim Ford also does some of the training, Tim Ford of SQLCruise. Some of the best people out there are going to teach your classes, and that’s wrong.

Carlos: [laughs]

Grant: What you end up with is literally changes people’s lives because, you get intense classroom time and you learn stuff and you develop your skills.But then you get the networking time with people like Kevin Klein, who 20-minutes talk with the man and you could change your life, you could change the direction, the path, the approach things that you do, he is that inspiring, he really is.

I’ve watched people come back and not change their jobs, but get more involved in the job that they are in, step up and become technology leaders within their communities, technology leaders within their company.

The last time we were there, it was 100 degrees warmer [laughs] in the Caribbean, than it was back here in Massachusetts and Michigan. It’s a glorious time to go and get your SQLserver learning on. [laughs]

Carlos: If you are there from the North East, and you want to get out of the cold, this might be a way to do it. In fact, you and I, met on SQLCruise and it’s probably a big reason why you agreed to come on the podcast here.If you’d like to learn more information about how you can actually save a $100, by going to SQLdatapartners.com/SQLCruise.

The team has afforded us that opportunity to give a $100-discount, and there are some directions on there if you can find out more about the Cruise and get the $100-discount as well, that’s something you wanted to do.

Grant, we do have one last question for you. That is, if you could have one super hero power, what would it be and why would you want it?

Grant: Of course, the one I’ve always gone with. I’ve always wanted to be the flash, I’ve always wanted to be the flash.

Carlos: Interesting.

Grant: I want to be able to move quickly from place to place, hey, if I can get there as fast as flash does, I don’t have to fly anymore and that would be great.[laughter]

Carlos: Very good, very cool. Grant, thanks again for being on the show, we do appreciate it.Grant: No, thank you, thanks for having me. I appreciate it.

Carlos: Compañeros, we will see you on the SQL trail.[music]

Children: SQL Data Partners.

Episode 10: Test Anxiety and Test Preparation

1400We mix things up a bit as I chat with Thomas Frank from collegeinfogeek.com about test anxiety and test preparation.  Taking a test can be a big deal and it is always interesting to get other options on preparation.  As data professionals, of course we want a way to be able to advance our careers and a certification can play a role in that, but we might have a bit of anxiety around it.  The biggest take away for me from this session was Thomas’s tool recommendation and how he was able to start better habits.  I took his advice and have seen some results I may share in a latter episode.  Have fun on the SQL trail compañeros!

Show Notes

Thomas on Twitter
How To Beat Test Anxiety And Stay Cool

Transcription: Test Anxiety and Preparation

Carlos L Chacon: Welcome to The SQL Data Partners podcast. My name is Carlos L Chacon, your host, and this is episode 10. Super excited that you joined in today, we’re glad to have you.

Today we’re going to change things up a little bit, and talk about Test Anxiety when it comes to Certification Testing. We’re also going to discuss some preparation techniques that we can use as well. Before we get into the guest for today’s show, we do have a few changes coming, and that is, I have a new microphone.

I know that some of you have been struggling with the volume on the podcast. Hopefully we have a resolution there and so the next couple of weeks, we’ll start hearing episodes that will be recorded with new mic and that will get better.

Today, my guest is Thomas Frank. He is the founder of collegeinfogeek.com, and he’s a 2013 graduate of Iowa State University. We are super excited to have Thomas on the show.

He provides a slightly different way to approach some of these things and I’d like you to listen into the end. He’s favorite tool today is actually something very different. It’s not a SQL Server tool.

It’s something that I’ve started using and has actually been able to help me to increase the things that I wanted to do or I should say, “It has helped me to begin develop better habits.” I hope you all check that out and listen. Of course we’ll have the link in the show notes at sqldatapartners.com/podcast.

Again this is episode 10 and Compañeros, welcome to the show.

Children: SQL Data Partners.[music]

Carlos: Before we get into the meat of the show, give us a little insight on to what you are doing there at College Info Geek and maybe how that’s changed a little now that you’ve graduated?

Thomas Frank: Sure thing. I guess to start this website began as an expression of a failure I had. I applied to work for a different site.I had written them this article and they rejected me, short story. I put up my own site because wanted to not have this article go to waste.

Throughout college I was just writing about the things that I was doing to be a better student. I focused a lot on career stuff, and money stuff and not a whole lot on the actual academics. I was one of those students who was more interested in finding internships and getting outside of the class experience than getting amazing grades.

Interestingly enough and I was actually very scared that when I would graduate I will become this old fogey who would not be relevant to students anymore. The complete opposite has happened actually.

I decided to focus on academic content for the most part in the past year. Alongside that, I also started focusing on making videos instead of articles. Right now, I typically do a podcast every week, and then a video every week.

I don’t write a whole ton anymore. I want to get back in writing a little bit, but mainly video and audio. In addition, I also do the “Listen Money Matters” podcast.

I’ve branched out a little bit from my own stuff. It’s a lot of fun. I wrote a book that came out in the beginning of this year about earning better grades and made it free.

That made the site just take off, for whatever reason. I guess the next project will be to start a second book.

Carlos: Wow, very nice. One of the reasons I wanted to have you on the show today was a post you have on your site that starts with “Death is coming, death is here. Death is outside, there is no escape.”

Thomas: Who will be next. [laughs]

Carlos: “Who’s that knocking at the door? Yes, it’s death.” [laughs] You take that, you’re like, “OK.” If you exchange that for finals, or in this case, testing, paints a pretty good picture about how we may feel about testing.[crosstalk]

Thomas: Finals are coming for you.

Carlos: That’s right. Particularly for those folks who have already graduated college and now have to go back and take certification tests.

Thomas: I have heard that some of those certification tests will make college tests look pretty easy by comparison.

Carlos: A lot of them can be very, very specific and a little bit nuanced. Just like with your chem finals, it can be a little bit challenging to know, one, what to study, then two, just getting through.

Thomas: Yeah, for sure. I’ll tell you a secret. I actually never took any chemistry.I feel bad about it.


Carlos: Lucky you.

Thomas: In high school, I think we’re required to take two out of the three — bio, physics and chem. I took physics after bio, instead of the typical path chemistry first because I thought physics would include hitting things and… [laughs] …moving cars, and stuff, which didn’t.

Carlos: You mean you didn’t get to drop the egg from the stadium? The top of football stadium?

Thomas: We did that with physics class. That was great. I’ve put my egg in the center of a giant box and it had rubber bands coming from all directions.It worked pretty well. Then senior year came around and I literally could not take chem because the only time period open was this early morning, before school even started slot that they were going to offer. I think I was the only one who signed up for it because canceled it.

It’s just like, “Well, I’m not going to know anything about the elements or H2O, hydrocarbons, whatever. I’ll just read books about that later on.”


Carlos: There we go. Let’s go back to the Greeks, right? You’ve got earth, wind, and fire.

Thomas: Yup. The last “Airbender.”[laughter]

Carlos: I wanted to talk a little bit about maybe some of the reasons why we have an anxiety about testing, in general. A big one, obviously, is a fear of failure.You’ve get through college, you’re a professional now, and to go back and to think, “Oh gosh, I don’t know this material,” or, “I’ve spend this amount of time trying to prepare for something, and now I’ve failed.”

Thomas: Exactly. For me, I break test anxiety down to several different factors, but definitely fear of failure is one of them. I consider that the fear of inadequacy, that you personally are unprepared to meet the challenge of the material.It is also the fear of the unknown. Anything that’s unknown is going to be scary to us. Tests by definition are full of specific questions that we don’t know.

We know that the material that will be on the test is some subset of what’s been presented in class or in our learning materials, but you don’t know what’s exactly going to be there. I think, also, especially with the certification tests where it’s your career on the line, and you’ve pointed up a pretty big chunk of money to take them, for most of them.

It’s the fear of mistakes. You got a lot running along the line for this, so there’s a lot of anxiety that comes through from that.

Carlos: Particularly, if you’re in a company who’s agreed to reimburse you for that testing.

Thomas: Yes, definitely.

Carlos: You’ve already let them know that you’re going to take the test, then you have to come back and say, “Well, I didn’t pass.” [laughs] You may be the only one of your co-workers who are taking that exam, who are willing to do that, but again, there’s kind of that, “Oh, we didn’t pass the test.”

Thomas: You want to look like you’re competent. That you are able to take on the challenges you accept.

Carlos: From a preparation perspective, what are some of the tools that we might have available to help us to deal with that anxiety, or take a little of bit of that anxiety out of it?

Thomas: There are actually a lot of ways to cut down on test anxiety. The biggest ones are preparation. I think anybody listening to this is going to be very familiar with the fact that you need to prepare for these tests.My only experience with certifications is I studied for the A+ at one point, which is probably the easiest of all tech certifications. That very bare bones entry level one. There’s a lot of very specific material and you need to know it.

Upgrading your study methods and studying in the most efficient way possible, but also just putting in time and making sure you’re not cramming up close to that test date is going to cut a lot of anxiety. Thing about test anxiety is it not only heightens your nervousness and your fear, but it also prevents the brain from making connections that otherwise would have been able to make.

When you’re in a nerve-racking situation, biologically, you are not able to make those connections and recall as much information because the stress keeps you from doing it. By over preparing, you’re forging more connections.

I guess to make it very simple and probably very dumb-down and not accurate, it’s almost like there are more paths to get to that information. If stress is cutting off a few of them, you got more that you can go around that stress. For sure, prepare.

Another cool thing, I’ve read about study done by University of Chicago couple of years ago, where they had students do an experiment. All they did was pull a piece of scratch paper before test. In 10 minutes, they would just spend writing all of their fears.

Demonstratively, they got better grades. Just the simple act of pouring out your anxieties and placing them in some solid piece of paper, getting them out of your head, it allows your brain to not dwell on so much. You’ve taken care of it before you actually go on to the test.

Carlos: Kind of acknowledging them to say, “Yes, I recognize I have these fears, but I’m not going to let that stop me. I’m going to plow ahead.”

Thomas: The first step to recovery is acknowledging you have a problem. In terms of test, I think most of us have that same problem, so it’s a good thing to do.

Carlos: I think from that preparation perspective, in our college days, in theory we were in class. That was our preparation. If we did no other studying, we were at least there to absorb some of that.As professionals, we need to make a commitment to do a little bit of studying each day. One of the advantages that we have as professionals, hopefully, if we have a career in that same vein is that we should be trying to apply the things that we’re learning in our job. That will help reinforce what it is that we’re trying to learn.

Thomas: Exactly. Learning is context-based. It’s analogy-based.It’s learned better through practice, than through just routine memorization, especially with professional types of tests. Like, the SQL examination, or any sort of tech examination. It’s less fact-based.

There are definitely a lot of facts to it. You have to learn the libraries of commands and everything. It’s also, you’re forging those neural pathways simply by using those commands in a context, where you’re doing something useful.

You create some useful, final product. You can remember what it is you did with those certain commands.

Carlos: If you can try to solve a problem, that helps increase the learning, because you have completed something. Now you know the steps, in order to complete it.

Thomas: As a good example, I never took the Cisco…what is it? The CCNA exam. I definitely poured throughout forums. I was a tech support guy.My first year college, my first job was to pick up the phones, and tell people how to [laughs] connect their Internet backup and everything. I always grant aspirations of learning everything there is to know about technology at the time. I was like, “I’m going to take all examinations. A+, Network+, CCNA, MSDNAA, all that stuff.”

Time and time again, the people that had the CCNA would say, “Go out and buy yourself old Cisco matter equipment. Set up a network in your home that somewhat resembles a corporate network.” We’re not talking about your like little Walmart linksys router with a very simple firmware.

Set up something that you’re going to have to configure things on, and then play, because doing that is going to massively help you understand how they work, as supposed to just sitting in reading the CCNA study book and trying to memorize all the terms.

I’m like the furthest away from an ESCO expert. I know how to do a join, and that’s about it.


Thomas: I would have imagined that you can set up any sort of SQL, MySQL, or oracle-type database on your own computer, and start trying to build something out of it.

Carlos: One of the dings that technology certifications will get is that, there’s a lot of brain dump information out there. There’s kind of a debate, if you will, as to how valuable those certifications really are, from a technology perspective.Ultimately, if you are willing to take the time to take the certification, it will only help you in your career, if you’re willing to take that time to learn the information. Brain dumping it, you might have that certification. From an advancement perspective, not a lot of opportunities [inaudible 13:52] .


Thomas: Exactly. A tech certification is valuable in the same way that a college degree is valuable. It’s got signaling value.The signaling model of education says that you are more valuable as an employer or a potential employee candidate, if you have something like this, because it signals that you are a kind of person that’s able to put in the work you have determination to learn a certain body material.

It doesn’t guarantee that you know everything there, but it signals that you probably do. It signals that you have the tenacity to learn. You probably will be able to learn further material that’s going to help the company.

Even if that certificate’s not going to necessarily make you a better SQL programmer, maybe actually building something and your free time would make you better in a strict sense that certification is going to signal to the person who wants to hire you or promote you that you got the jobs.

Carlos: Anyone who wants to stay in technology for any length of time is going to have to make a dedication to learn. You will quickly fall behind, if you’re not willing to learn new things and strategies and technologies.

Thomas: That’s something that I think about a lot, because a lot of people, they get out of college, or they get their initial certification or initial job. Then, they do that job.

Carlos: They’ve arrived. [laughs]

Thomas: They get good at that job. The people, like their immediate circle of peers and maybe superiors, knows how good they are at that one thing. That’s a very deep level of concentration.I’m all about being a T-shaped person. I want to have few levels of deep specialization that I’m very, very good at. I also want to be competent in multiple different levels.

I want to stay abreast of all the things that are happening in my industry. Because if whatever reason I find myself out of my current job, that inner circle of peers who knows how good I am at that one thing, might not help me, because that one thing might not be relevant anymore, or maybe nobody’s looking for it or I just can’t find it. I need to go to something else.

Carlos: That’s a great point. Even to extend on that is by being comfortable in different technologies or areas is it gives you the ability to network with different folks in those different areas. As you mentioned is that, down the line they might say, “Hey, we remember Thomas. He helped us in this specific thing, which may have been your forte, but let’s talk to him and see if he can help us again.”

Thomas: There’s massive value of just being able to talk to people, even if it’s at a shallow level about something that they’re interested in. Like completely away from technology, but I recently got into cars. [laughs]I immediately noticed that a lot of guys like cars. Previously, I didn’t have a whole lot of interests that were relevant to most normal guys, because I play video games and read very weird books. [laughs]

When I meet a person on the street or at a conference, it be kind of like a small talk, and that’s it. I don’t have football now, but at least I have cars. A lot of guys are into that.

There’s a conversation topic. It’s the same with being familiar with different kinds of technology. You can talk shop with people who aren’t in your own area of specialization.

Carlos: Another idea on the preparation front, and it’s actually a suggestion you make on your blog for graduates. An idea that you have on your site for graduates is to start a personal blog. I think this idea could apply to test preparation, although may not be the easiest route.This is definitely an A game type approach. As an example, Colleen Morrow over at colleenmorrow.com, published a series of articles about the Microsoft 70-462 exam, which that exam is kind of a doozy.

From a preparation perspective, she just put those out and said, “This is what I’m learning, and I hope it helps you. But I really hope it helps me to pass the exam.” [laughs]

That was an interesting idea.

Thomas: Super cool. That’s analogous to one of the people, who’s been the biggest and one of the biggest inspirations for my professional career. There’s a guy named Pat Flynn.I’m not sure if you’ve heard of him. He is the person who got me in the podcasting. He has got a site called Smart Passive Income.

He got his start, by basically starting a blog, where he posts his notes for the lead architectural exam. It’s examination for green practices in architecture. It was pretty new.

He wanted to take this test to get ahead in his firm, so he can skip a lot of the years of slogging that most architects have to do, to become senior architects. He just put all his notes out there, barely even formatted them. They noticed a couple of years later that thousands of people were looking at them a day.

There’s definitely the potential that building a blog about whatever you’re trying to specialize in will turn you into a thought leader. Also just the act of publishing something and maybe even making just a token effort to format the content in such a way that you can teach somebody else, will solidify in your mind that much better because you have to think about it from the opposite side.

I’m no longer learning it. I’m now teaching it to somebody who doesn’t know anything.

Carlos: Exactly, and trying to express what it is that you’ve learned will help you understand that further. I don’t know what Colleen’s situation is, but I would think again, from that fear of of failure, you could at least go back and say, “There’s a plethora of work here that I’ve put into that. No one can charge me of not putting an effort in that route.”

Thomas: It also makes you a better writer.

Carlos: That’s right.

Thomas: I see a lot of benefits to it. The drawback is that it does take more time. There’s extra aspects of having a website that do take up time you could be using to study in other ways, so it’s all just whatever works best for you I think.

Carlos: The other component, one of the suggestions you make, is again, just to know the sakes of the exam. Ultimately, we’re talking about 150 bucks and a few hours at the testing center depending on the exam. If you go in and it just doesn’t go your way that day, they’re not going to throw tomatoes at you. [laughs]

Thomas: It’s a speeding ticket, and there’s no bad mark on your driving record. Maybe you got to wait two weeks before you can take it again. I’m not sure what the restrictions are for the test that your audience is taking.

Carlos: I think there’s different requirements for each one.

Thomas: That’s a good thing to think about. The test doesn’t define you as a person. I read a book as a kid.It was called “The City of Ember.” When every kid in this weird underground city became 13 years old, they would pick a job out of hat, and that was their job for the rest of their lives. That is a test that defines you.

Carlos: No question.

Thomas: This isn’t that. This is I want professional certification that’s going to help me down a certain specific path, but if I fail it, I can try again or I can pivot.Don’t let the pressure get onto you so much that it’s crushing you, but at [inaudible 21:40] token, I think you should feed off the pressure. Let it hone your senses and focus your mind because pressure is the antithesis of apathy. If you’re feeling a little bit of anxiety, it’s probably a good thing.

It means you care about what you’re doing.

Carlos: That’s an interesting point. Another idea there is that many of the exams will offer, irregularly, but they’ll offer what they call a second shot or an opportunity to take it again for free should you not pass it the first time. If you can arrange your testing around one of those offers.I’ve used that I know in the past, and that’s what helped alleviate a little bit of that stress as well because at least I know I can take it again.

Thomas: As somebody who has never actually taken one of these exams, do you get your results back in the form of the actual questions that you got right and wrong, or is it just the number?

Carlos: No, unfortunately it’s just the number. Almost like the scantron exams in college, you take it in, you submit it there, and, in a moment, they tell you whether you passed or failed.

Thomas: That decides my next tip here then. Since you can’t see the questions that you got wrong individually, what I would do is, if you go into the test and you don’t feel like you knocked it out of the park, the moment you get outside the testing center, pull out your laptop and just brain dump all of the questions you weren’t sure on, that you were having a lot of trouble with.Then, if you have practice tests or old test material that you’ve been studying from, see if you can find similar questions and focus on those, because you already did well on the ones you did well on. If you can focus on those ones you weren’t sure on, then the next time you go in, maybe if you could even do that little freebie retry, then you’re going to have a much easier time.

Carlos: That’s an excellent idea. I would add just a little bit onto that is that normally they give you a scratch of paper or actually a white erase type board. Start there writing those things down if you have enough time while you’re in the test.You can’t take it out with you, but at least you started writing it down, here’s X topic that I’m pretty sure I didn’t get this one right because I totally guessed. Then you can do the brain dump again so at least you have a little bit more of that information there with you.

Thomas: Since you can’t take it out with you, maybe, at the end, you could write a little summary box where it’s just like the bullet list of the topics that I really wasn’t sure on. Then you can go Google around and find them later.

Carlos: Ultimately with these exams, if you fail an exam, you’re not alone. Lots of folks have bombed the test. I actually wanted to tell a little story here.From the certification perspective, the Querying Microsoft SQL Server 2012. I have been a DBA for 10 years, and I looked at the name of that exam, and I saw Querying Microsoft SQL Sever 2012. I’ve been doing that for 10 years, so I said, “Let me just take a look, look at some of the topics.”

There were some new material which I covered. I went in, did not pass that exam. You’re not alone there.

Everyone’s done that. By taking advantage of some of the things that we’ve talked about from a preparation perspective I think can hopefully help you achieve the result that you’re looking for.

Thomas: For sure. You heard it from Carlos. Even the 10 year DBA could fail an exam because there’s a lot of new material and there’s things you don’t expect.It happens.

Carlos: Thanks for the discussion, Thomas. I think it’s been great.Before we get into some of the other questions, here’s another way for some of my compañeros to learn and take advantage of a unique opportunity.

[background music]

Carlos: Hello there, compañeros. I want to tell you about a unique SQL server training opportunity that is unlike anything you’ve encountered. As a listener of this podcast, you’re entitled to a special offer.SQL Cruise is premier training experience set in the most unlikely of places, a cruise ship. Tim and Amy Ford have created a wonderful culture on SQL Cruise.

With five or six technical leads from various industry sectors, you and roughly 20 other students will talk shop in classes while at sea, and while you are in port, you can enjoy the wonderful world we live in, either by yourself or with the group. The beauty of SQL Cruise is the unique relationship you will form with other SQL cruisers.

Worried about leaving your spouse? Bring them along! I did, and she had a great time with the group.

In fact, I have been on SQL Cruise twice, and I’m looking to go a third time. You may be wondering if this is serious training, and I can assure you it is as good as conference you will attain on land. It never crossed my mind that I could talking about SQL server with Microsoft MVPs on the beaches of St. Johns.

I know you must have other questions, so I suggest you head over to SQLCruise.com to check out the site. Once you are satisfied and you want to sign up, when you get to the question, would you be so kind to share the name of who referred you and what source material led you to us? You enter SQL Data Partners, and Tim will ensure you get $100 off the training registration. This is a great offer!

I hope this $100 savings will entice you to check out the site. I did and went on the Caribbean cruise, and I had a great experience, and I know you will, too. Maybe I’ll see you on board!

So head on over to SQLCruise.com and use the code SQL Data Partners to save $100.

[background music ends]

Carlos: To kind of wrap up, one of the things I like to do here in our podcast is try to provide some value to our listeners. One of the ways that I try to do that is to look at some different tools or some different things that our listeners may not be using at the moment.My question for you is what’s your favorite tool? It could be a paid tool or a free tool, but why do you use it and why do you like it?

Thomas: The one thing that I want to suggest to your listeners is an app called HabitRPG. This is basically a habit tracker that utilizes video game elements to help you build habits more successfully.If you’ve ever played any kind of RPGs, like Final Fantasy or Pokemon as a kid, I know I did, they were amazing at getting you hooked and building habits of playing the game because you had all sorts of progress indicators and all sorts of small rewards. HabitRPG does the exact same thing.

You can basically create a list of daily habits you have to do every day. When you check them off, you get experience, you get gold, you can buy gear, all that good stuff.

The cool thing though is you can get in a party with people, and if you don’t do your habits, then they will get hurt as well as you.

Carlos: Yikes.

Thomas: It’s sort of like putting social pressure to it. It’s super geeky, but I love it because it helped me build ridiculous habits that I had never been able to do before, helped me start waking up at 6:00 AM, working out everyday, meditation.It’s very closely correlated with my business basically taking off last year. I was very stagnant for about two years and then, in the last eight or nine months, it’s like a hockey stick.

I used it to write my book as well. I put just a simple “I must write 500 words a day” habit on there, and I found myself at 11:50 PM some nights writing 500 words, just dumping them out, getting them done because I had to do it. I got a book out.

If you have trouble motivating yourself to study on a consistent basis, it’s definitely something to look into.

Carlos: Very cool. I’ve not used it. Is it a paid tool or a free tool?

Thomas: It’s free. I subscribed for $5 a month mainly to support the developers, but it’s actually open source.

Carlos: I’ll have to check that one out.I was hoping you’d share with us a story from your experience with the College Info Geek, maybe something that would help illustrate why you continue to enjoy doing what you do and putting out all that material.

Thomas: Good question. I think the most recent kind of adventure I’ve embarked on was building the YouTube channel, which is interesting. I guess I can tell you the story of how I almost set my room on fire. [laughs]The thing with video is, I’ve been blogging since 2010 and looking up to mentors in that space since 2010. I would see these people who are awesome bloggers start to make the occasional video, like the Fizzle guys are a good example, Steve and Pat Flynn. The thing about those guys is they make a lot of money, so they’re able to buy $3,500 Canon Mark III 5D cameras.

Carlos: High end stuff.

Thomas: Super amazing quality gear. I watch these videos and I’m like, “Well, I can’t do that because I can’t afford that, so there’s no reason to do videos.”September last year I think it was, I started just watching YouTube way more than I was reading blogs for whatever reason, and I found one particular guy who just kind of stood in front of a cheap camera and talked, but he had a million subscribers, and his videos were super engaging. I realized that as long as my content is engaging and I put some creativity into it, I can do it on a budget.

I put my own little camera up on a stand. My camera’s not terrible. It just isn’t the $3,500 camera.

I was lighting myself with reading lamps, and it was just terrible. I’m like, “I got to build myself a proper film light.” I go out to Target and I buy this hamper cardboard.

Then I sawed a hole in the bottom of it and installed one of those work site flood lamps you get at the hardware store and then covered the inside with tin foil and then covered the top with a curtain. I’m like, “Cool, film light, built it.”

Then I smelled smoke. I realized these work site lamps, they’re high watt…

Carlos: Yes, very intensive.

Thomas: …lamps and there’s heat visors on them, so putting a cardboard and tin foil, I basically built an oven. That was a learning experience.

Carlos: You wanted to get that Easy Bake Oven you never had as a kid.

Thomas: Yes. I think the oven a little safer. I was like, “Hey, I’m going to do maybe one a month to just, sort of, mix in with the blog posts and podcasts.”People really liked them. The first few, I wanted to integrate my love of video games. I would try to be like, “Here’s the top five note-taking tips and each one is going to be representative of a Street Fighter character.”

I unwittingly attracted this video game playing audience who, I should have known this before…

Carlos: [laughs]

Thomas: …they’re are all in school.

Carlos: There you go.

Thomas: So it’s perfectly matched, and the channel’s just exploded. I’m having fun making videos every week. It’s great, like we said earlier, when we talked about the actual deliberate practice, getting better at something.Making a video every week gives me the opportunity to improve, in some way, every single time. Every week’s just like, “Oh! I can’t believe I was able to make that. I was never able to make that before.”

Then the feedback on it is really cool, too. Whenever I get an email from somebody who said, “Hey, my grades are better this semester, thank you.” That’s amazing to hear.

It’s ridiculously inspiring.

Carlos: Very nice. Well good. Thomas, thanks again for being on the podcast.We appreciate it. I think you’ve shared some great information.

Thomas: Yeah, for sure, thanks, Carlos.

Carlos: We do have one last question for you. That is, if you could have one superhero power, what would it be and why would you want it?

Thomas: The ability to change my molecular density.

Carlos: [laughs] OK.

Thomas: Because then I would be able to make myself really light and fly, or make myself really heavy and crash through walls.I found this out…my brother and I had lots of superhero cards growing up. We had all the X-Men cards. There was one guy and I think he was the brother of some really powerful mutant.

Carlos: [laughs]

Thomas: His only power was changing his molecular density. I’m like, “That’s a cool power! It’s not super crazy, over the top, but if I want to, I can walk through a wall.”

Carlos: There you go. [laughs]

Thomas: Or fly.

Carlos: [laughs] Very cool.

Thomas: Either that or I always wanted to be able to touch books and know their contents, but I think that’s kind of cheating at life.

Carlos: There you go.[laughter]

Carlos: Have the power of osmosis.

Thomas: Learn by osmosis. I think I would get bored, though, after I went through my university library. Just running my finger down the shelves. [laughs]I’m going to go with molecular density.

Carlos: You’d make an amazing tutor, I guess.[laughter]

Carlos: Very good. Thanks again. We do appreciate it. Compañeros, thanks for tuning in.We’ll see you on the SQL trail.


Ryan J. Adams for PASS Board

Voting for the PASS board has opened and I ask you to vote for Ryan this election.  Our SQL community is the BEST on the planet and I know Ryan will do everything in his power to help grow the community and help new members feel right at home.

Volunteer positions are difficult and can be demanding.  They take time.  Ryan has PROVEN he will devote the required time in his role as regional mentor.  He is regarded as a thought leader among his peers and regularly helps other mentors with ideas to help their chapters.  He has high marks from the chapters he serves and the praise is well deserved.

Ryan has enjoyed the same position with Verizon for over 10 years and plans to stay with them.  This is important as the he has a solid personal foundation that enables to have sufficient time and resources to help support the community.

I hope you will join me and vote for Ryan this election cycle.  Our community will only benefit from it.

Episode 09: Database Automation

1400Joey D’Antoni has been doing quite a bit with database automation and I caught with him to discuss his experiences about how database automation has helped him be more efficient and have fewer support problems.  We discuss strategies for starting your automation techniques and he gives a few pie in the sky examples if you are running a large environment.  Compañeros, I hope you enjoy the program.


Show Notes:

Central Management Server
Building Perfect SQL Servers, Every Time
Build scriptsT-SQL Tuesday: Managing the Enterprise
September 2015 T-SQL Tuesday
Joey’s Personal Blog

Transcription: Database Automation

Carlos L. Chacon: Welcome to the SQL Data Partners podcast, my name is Carlos L. Chacon, your host. This is episode 9, and today we’re talking about automation.Interestingly enough, this September 2015 T-SQL Tuesday theme was, “Strategies for Managing an Enterprise.” I ended up writing a little bit about automation, and SQL server setup in particular, and we’re going to continue with that topic or that theme today.

If you’ve never heard about the T-SQL Tuesday theme, this was brainchild of Adam Machanic, where several years ago, he thought of the idea of asking bloggers to host a topic or a theme, and bloggers could submit to this page, and the community could increase their knowledge about that specific theme.

The September theme was hosted by MidnightDBA, and again, they chose strategies for managing an enterprise. Today, we have Joey Dantoni with us. I’m super excited to have Joey. He is well-versed in performance tuning and blogs, over at Joeydantoni.com.

He talks about several items, or several scripts that he makes available, several sessions that are available from some of his different conferences that he attends. All of those will be available on the show notes, at sqldatapartners.com/podcast. You can go there and get information about today’s show.

As always, thanks for tuning in. If you like what you hear on today’s episode, we invite you to give us feedback via iTunes or Stitcher. You can also leave some comments on the website. We’d love to hear from you.

If there’s some potential topics you want us to cover, please let us know about those as well. You can hit me up on twitter at @carloslchacon. Lots of information in today’s session, and as always, compañeros, welcome to the show.

Children: SQL Data Partners.[music]

Carlos: Joey, welcome to the show!

Joey: Thanks, Carlos.

Carlos: One of the things we wanted to talk a little bit about today was automation. I know you’ve shared some stories, and I’ve heard some things about one of your previous employers, working at Comcast. Kind of a large environment.

Joey: Yeah, we had roughly close to 1,000 servers total. I was the architect, but I had 4 DBAs working on a team under me. We didn’t have a whole lot of people to maintain that kind of environment. Realistically, when you get into larger environments, you never do, because people are in scale, and your budget is in scale.We had to automate a lot, just so we didn’t have guys working 24-7. We had our build process, and our trade deployment process, fully automated. Trade deployment was interesting. We used SSIS for that. From a build perspective, it was pretty straightforward.

As part of our personalization infrastructure, we built a service management layer, where business users or developers could come in and request a new virtual machine. We had that process build that from end to end, from the Windows perspective.

Using a lot of power shell, particularly in and around Windows, to do things like do an offline domain join, to get the server joined to the domain, grab an IP address. The only issue we had with that was, we couldn’t get…The DS is fully automated. It’s a long story.

Carlos: [laughs]

Joey: It was a long political story. SQL-wise, what we did was, we automated the installation. It’s a two-faced approach that I took to doing SQL automation. You do the install. SQL has a ton of add defaults. We all know that.The install process itself is pretty vanilla. The only thing that matters before you start using SQL, is that you get the right drives configured. You want the right volumes configured for [inaudible 04:33] sets, and where AF files are going to be set.

After, what we did was…and I still use this with my clients, have the script that does the best practices after the fact, and goes in and sets max memory, sets max [inaudible 04:51] parallelism. All plays on what exists in the server, and what is [inaudible 04:54] the DMVs.

Carlos: The ability to automate, because there is lots of things that we can automate. The idea of why automate, you hit on a little bit of that. Simply your scale that require that you do that. Another reason I can think of to help us automate, or why you would automate, is to have a consistent experience.As we get into, you mentioned 1,000 servers, I don’t want to have to have my data files, for example, even in three different places over 1,000 servers.

Joey: Totally. I don’t even like it with 10.

Carlos: [laughs]

Joey: That was really the other goal of our whole project, client project. We had our build process, for physical servers, pretty automated. There were only a couple of manual steps. Even then, still you had weird things happen, on the Windows side, some hash cache doesn’t get applied, or server has the wrong DNS address. We went through a DNS migration at some point.The other part of automation is, you need to build in some intelligence in your health check process, to make sure everything…I read an article about this the other day, and it really wasn’t related directly to what we’re doing in terms of SQL service in Windows, but was more talking about automation, using [inaudible 06:26] and things like that.

One of the messages that the article said, was you still need to have a process to go back, and make sure nothing’s changed, everything is good, everything is happy in your environment.

Carlos: As much time as we spend, trying to set up our automation process, we still need to have someone to check in, and make sure that everything has done what we think it has.

Joey: We were using a system center configuration manager when I was at Comcast. It was really good, except for one problem. It does not know about the existence of SQL server clustered, several clustered instances, which were 80 percent of my production environment.It can only connect to a default instance, which is a shame, because it worked great for Windows. The tracks were very easy to build, for any kind of Windows task, anything you need to look in the registry, anything you get from [inaudible 07:20] line, about the [inaudible 07:21] .

Since we couldn’t connect to SQL clusters, it was virtually useless to do that. What we ended up doing was, and I like this as the core of any real automation strategy. We had an infrastructure server that the DVA team used, and that was the hub for all our scripts. They were all mounted off of there.

There was also our metadata repository for our SQL servers. Since everything went back to that, that’s where my health check ran. I had a process that ran every morning, that would grab a list of all of our instances, and then dynamically create a batch file that would run SQLCMD.

Which if you’re working on automation, you need to get familiar with SQLCMD, and then run that batch file, and I put the results. As part of that process, it would dynamically create a link server back to our infrastructure server.

This was done that way. I didn’t log, doing that, but we had some network issues. We have different networks in various things, and that infrastructure server was…Any SQL server can always talk back to it. It was clear. I didn’t have to worry about it.

It ran through all of our instances, run our health check script that we put together, and then returned it. The other [inaudible 08:42] was we have a nice create a tabular report that the managers and VP’s to look at and say, “Hey, we’re in compliance.”

We created a scorecard. It gave something that was a measurable goal, the kind of crap I used to have to do when I worked in big companies.


Carlos: That’s an interesting process. Even add to that, that automation process is then providing a way to report on that, so that you can ensure that, “Yes we’re in compliance.”

Joey: Yeah.

Carlos: Interesting. You described what I’m going to consider a fairly complex, or at least robust environment. Some of our listeners are thinking, “Wow! That could be a little bit tough.”Before we get to some of the specific things that we could do to help start our automation processes, let’s talk a little bit first about, what it is that we want to automate.

Joey: There’s a few things, to be honest, to go a little bit on the side, we were stuck down that path, strictly because of our scale. There’s a lot of great tools that are built in, like central management server and policy-based management, that worked really well.None of these will scale beyond 200 servers. Don’t be scared off if you have to build your own thing. Microsoft has some stuff that’s built in. The number one thing I always tell people to automate is…I do a lot of health checks with a consultant, and I do a lot of the same write ups every time.

Automate your build process, and automate the “best practices” script. It’s not that hard. You can download it off my blog, or watch my presentation from Summit, last year, it’s on YouTube. You don’t even have to pay for it.

It’s pretty easy. SQL server generates an INI file for you. If you go through the install process, and set everything up the way you’d want it and click “Cancel” at the end, there’s an INI there. All you have to do at the point is run setup.exe from the command line, and it passes the config file.

If people did that, they would eliminate many percent findings I find in a lot of health checks, the common ones. The temp [inaudible 11:06] is going away in 2016, but one temp.dat file maxes off at zero, cross ratio [inaudible 11:12] at the five.

Data files and temp.dat on the same drive that one is…If you want, this really depends on your SAN configuration, but a lot of the common ones. You hear the guys say to make sure they’re friends with their Windows team.

Because a lot of these automation things derive from making sure you’re a Windows guys. Even if it’s inconsistent, it’s harder to automate. You want to make sure you’re drives are always configured the same way on every server.

You want, typically like some of the other things that are associated with build, that are Windows related, or configuring an instant file on initialization, and a lot of pages and memory if you’re wanting to do that.

If you can get those things configured AD level, that’s one last thing you have to worry about when you’re configuring your servers.

Carlos: Compañeros, listen to that. Go grab your Windows guys, grab them, sing “Kumbaya” and make it happen. [laughs] That’s mostly on the build side, then you mentioned putting in some process from SQL Server Standards, or the default settings.When I tend to think about this as, “These are all my jobs and things as well that I’m going to put out there.” For example my backup routines and things like that. After my build process, I have another set of scripts that run, that will go and do all of that index maintenance, things of that nature.

Joey: I throw all that in my build script, because the one I did at Summit last year, I messed around with it. I use a lot of online scripts. I heard there are a few other ones out there, many, and a couple other things. I’ve been really happy with all the stuff.What I do with that is, I have a part in my build script. It’s a batch file. The batch file does a few things, a call setup that EXE with the config file. Then I ran a PowerShell script, that’s an always website, grabs the latest version of these scripts, which he promises me is a consistent URL.

Writes these to a SQL file, and then after I ran the best practices script, that it sets max memory and a lot. I do all the script. The one thing that’s a little challenging with that, I haven’t taken the time to do. I do it for one point, but I haven’t done is, automating the schedules.

It’s all a dozen schedule which are…he doesn’t schedule the jobs by default. Building a script for that. That’s one of those things that you have to customize for your environment, because not everybody is going to be doing their index rebuild on Sunday morning at 3:00 AM. Most people aren’t.

Carlos: That’s an important process, is then building in your own, if you will, customizations into that environment. Even in something like that, like a schedule, your larger environments might be different from the single instance. The ones it uses.

Joey: Definitely. Then you have things like that that may or may not be consistent across the environment. You may have to get a different places, or different servers, and things like that.

Carlos: My thought there is that I generally have…it is a couple of buckets for those schedules, and then I can then deploy those schedules depending on, “This is a single install. It’s going to be for some small department,” and you’re looking at deploying schedule one.If this is the cost for my production environment, where the whole company uses it, LR, HR, then I am on bucket three, and I can use those processes there.

Joey: That’s a really good approach. You can easily deploy or lay as many schedules as you like. There’s no real negative to that, other than making the table in a CDL a bit bigger. I really have bought it in.I like that approach because you can have…and really from an automation perspective, if you had something in your naming convention, and tie back to the environment. I hate naming convention debates. They’re my least favorite thing in IT, because everybody is going to get point.

This is one of those places where it matters, because if you’re using some sort of regular expression in your naming convention, even in the case you’re talking about. I like the number idea. That you want that number to be like save a third character, or whatever the server was [inaudible 15:50] .

You can parse on that, and then as part of your automation you can apply those schedules likewise, because you string out the server name, and a case statement, and then ruin the following [inaudible 16:03] .

Carlos: You’ve talked a little bit about some of the things that you’ve used. You mentioned the SQLCMD, and of course, PowerShell has come up. The thoughts on getting started with that process, I’m not the PowerShell guy, and you’ve probably forgotten more about PowerShell than I know currently.If I was starting out, start with the automation process, and you’ve mentioned even the install process of getting an INI. What would be some of the steps, or what was your experiences going through that process of, “Here’s how I’m going to start automating some things?”

Joey: Good question. First of all, PowerShell, especially with SQL Server, is like this weird religion, and I’m definitely not a part of. The guys that really like to do PowerShell, really like to do PowerShell. I really like to do it from a Windows perspective, I find to me it’s a painful process, the SQL Server.I do it when I have to. Microsoft does a really good job with instrumenting SQL Server, with T-SQL to do about everything. It tends to be, especially for DBAs, you can go to T-SQL and live in T-SQL every day. It tends to be a lot easier for them to live in that world.

What I typically tell people, and Michael and Nelson is if you do something easily in T-SQL, do it in T-SQL. It’s pretty easier to automate there. However, for tasks that you’re doing with the operating system, or you’re making calls…

One of the things that I worked on last year, was this script for a client, where we would dynamically extend a volume. Call that using a PowerCLI, which is VMware’s implementation. VMware set up command lines for PowerShell.

Basically if we detected that a database volume was getting too full, making OS call, extend the volume, which would in turn make a [inaudible 18:14] call. These kinds of things, are the things I focused on. It’s figure out a task that you…It’ll tell you the exact way on PowerShell.

I forget what it was, but I had to do some pass on about 150 servers when I was at Comcast. It was…


Carlos: [laughs] I don’t want to do this 150 times.

Joey: Yeah. It was like changed DNS, or something stupid like that. Yeah, I went and figured out how to do it. The language for the basics is pretty simple. Then the other thing I’d say, especially with cloud becoming so prevalent, and this is slightly changing. The new Azure portal doesn’t really work for half the stuff they say it does.A lot of the stuff in Azure isn’t exposed through the portal, and you can only do it in PowerShell. I will say Azure’s PowerShell is probably a little bit simpler than regular Windows PowerShell. It’s the same language, but a list of the things I had to do with Azure or PowerShell, are pretty basic.

The basic syntax of PowerShell is, get dash something. Get process will get you a list of process for your desk services, will get you a list of services. There’s a crazy language involved there. It’s really finding a couple of tasks that you want to do.

One of the things I do that will be a nice entry to PowerShell is, I wrote a script to automate patching. Applying service to [inaudible 19:47] is the SQL server. There’s two conditions that Microsoft tells you that you need to meet for a service pack to be applied successfully, or sealed.

It says the build number of SQL server matches the expected build, and that the service is online. I wrote a quick script that sets the expected build number as a variable. You have to pass that [inaudible 20:16] , because it had time to grab it. I’m sure I can get it from setup somehow.

Take that as a build number, run the patch, I did it really inelegantly. I had it sleep for roughly how long the patch takes to apply, and then do a check again. SQL Server, make sure the service is running, if the service isn’t running, email me.

Skip that check if the service is running, and then go on to do a build check, and make sure it all matches, and then email me the builds, and the expected builds. Find a task you want to do. That’s the best way to learn a language I find. Is find something you need to do, and then do it.

I forget a lot. It’s funny, I did a lot of 1.x and Oracle, and UNIX. That whole scripting is a little bit more intuitive to me. They’re [inaudible 21:11] and stuff. I was working on Nobel UNIX client the other day, and I had hadn’t been on a UNIX for like eight months, and I was pretty quick to give out the scripting. PowerShell, I had to do crazy time.


Joey: I knew the concepts, but there’s still a little bit of subtleties.

Carlos: Anytime you have a problem to solve right, it makes it a little bit easier, because you know that there is something that you’re solving. There’s a goal there.Something that you said was important, is that you want to do that process once, figure out how the process can be done in a language like T-SQL, which a lot of our listeners are going to be using, or wanting to be using.

Even inside something as simple as SQL Server Management Studio, you can set that up, script out that process, take a look at it, and start to pull it a part to say, “OK. What are these pieces doing here?” Now that you have that script, you can save it and use it again.

Joey: Definitely. That’s the whole thing with scripting. It’s the repeatability that you don’t have to deal with that going forward. One of my favorite scripting techniques, if you want to do a quick little automation task, is building three SQLs.Using [inaudible 22:33] Nation to build SQL statements. The example I showed in the presentation, I do have [inaudible 22:37] , a real basic back up automation script. I say, “Select ‘Back up data base.'” Plus sign is the [inaudible 22:47] automation operator in SQL Server.

The only thing that gets a little dicey there, is you have to convert any [inaudible 22:54] values to…


Joey: …because SQL Server will try to add the [inaudible 22:59] to the…[crosstalk]

Joey: You create a script, or you create a file that’s a bunch of back up data base statements that are successful. With SQLCMD having an output flag, you can create a quick back strip to say, “SQLCMD run generate backup. SQL, output the results to run backup.SQL.”Then in your next script you say, “Run backup.SQL,” and output those results to a log file. If you want to take that one step further, you can use PowerShell to read the log file, and look for any errors.

Carlos: [laughs]

Joey: That’s a real pretty quick example, but…

Carlos: A step through that. I’m taking the script to create the backup statements. I have another script that executes those statements, and then puts them into a log.

Joey: Right. No. See, you write the scripts that would generate the statements, you output that result to a SQL file, you then run that SQL file, and then you output that result. It’s two lines of SQLCMD. SQLCMD line one, gen.backup.SQL, and that outputs to run backup.SQL. Then the second line is run backup.SQL, and that backup that log.

Carlos: To start automating, we talked a little bit about SQLCMD. Assistance from the community, you mentioned you had some scripts. Are those scripts available to share? You mentioned one was on your website.

Joey: Yeah. I’ll find the URL for that. Definitely my bill process is on my website. Like I said, the presentation, I did a Pass last year on automating your bill process. It’s on Pass TV, which I forget if you have to register for. I know you don’t have to pay for it.

Carlos: We can put those in the show notes. We’ll make sure that those get available.

Joey: Allen Light’s got some pretty good sessions on doing some automation with PowerShell. I’m not sure if any of those are available online for free, but he’s helped me a lot with some of the PowerShell stuff. I still docile or still winded. [laughs]

Carlos: There you go.

Joey: I still use batch files. Anytime you’re combining three or four things, like SQLCMD. I don’t run these through PowerShell. One of the things that SQLCMD does, that say for example we are talking about, one of the challenges to it is headers, and counts, and things like that.Because if you’re automating the building of a file, and you don’t want to have headers in your results, when you are going to run that, SQL is going to show up, it’s like it’s got nothing to do. In SQLCMD, you can pass a -h -1 command to strip the headers out, and there’s also a no count option.

You add that no count in your SQL strip. You’re getting the results you’re looking for. In those, SQLCMD doesn’t have that remove headers option.

Carlos: Ah.

Joey: Some people have rewritten it. There up in like SQLCMD four that people have written themselves, but I generally don’t like to recommend people third-party stuff, because you never know what you can do in client [inaudible 26:28] . I try to make stuff work with the core products stuff.

Carlos: It’s another thing you have to keep up with in your environment.

Joey: Yeah. If Microsoft’s auto updating it, that’s fine, but if it’s something I have to go download from somebody, it’s not as good.

Carlos: Great. Joey, I appreciate the conversation. It’s been some interesting information. We’ll make sure we can share some of those ideas in the scripts that you mentioned there in the show notes.We have a couple more questions for you, one of the things we want to try and do is provide a little bit more value for folks who are listening. We’ve mentioned some great ideas now, but one of the things we’d like to talk a little bit about is your favorite SQL tool. This could be a paid tool, a free tool, but why do you like it, and why are you using it?

Joey: This is dicey, because I’ve done work for every vendor this year.

Carlos: [laughs]

Joey: I really like SQLCMD, for the automation value.

Carlos: There you go.

Joey: I’ll name two other ones, Red Gate…I’m going to fall down at this, they’re steam and compare tool. I don’t think it’s called steam and compare. The tool that I use more than any other tool would be SQL Sentry Plan Explorer. It’s really nice, especially the complex execution plans.

Carlos: Much easier to digest.

Joey: Yeah.

Carlos: Very good. Thanks for sharing that. One other thing, you’re now working with Denny Cherry and Associates…

Joey: Yeah.

Carlos: …or with them. This doesn’t all have to be with them, but I we’re looking for an experience that you’ve had that helps illustrates why you enjoy being a data base administrator, or a data professional.

Joey: It’s helping deliver business value to my client’s. The feeling you get when you take a query down from…Actually specifically around this warehouse we architecture, the client’s seeing their run time speed a 100 to 1,000 times better for some things.They got under our hardware, but some of it’s…we’re using 2016 with column index, and it’s really fast. Seeing those kinds of benefits. This a data company, they sell data as their business. Them being able to deliver results to their customers faster, is adding value.

Carlos: Very good. Before we ask you the last question, let’s take a look at one more opportunity folks have to learn about SQL Server.

Announcer: Hello there compañeros. I want to tell you about a unique SQL Server training opportunity that’s unlike anything you’ve encountered. As a listener of this podcast, you’re entitled to a special offer. SQL Cruise is a premier training experience set in the most unlikely of places, a cruise ship.Tim and Amy Ford have created a wonderful culture on SQL Cruise, with five or six technical leads from various industry sectors, you and roughly 20 other students will talk shop in classes while at sea, and while you’re in port you can enjoy the wonderful world we live in, either by yourself or with a group.

The beauty of SQL Cruise is, the unique relationships you will form with other SQL Cruisers. Worried about leaving your spouse? Bring them along! I did, and she had a great time with the group. In fact, I have been on SQL Cruise twice, and I’m looking to go a third time.

You may be wondering if this is serious training, and I can assure you it is as good as any conference you would attend on land. It never crossed my mind that I could be talking about SQL Server, with several Microsoft MVPs on the beaches of Saint John’s.

I know you must have other questions, so I suggest you head over to sqlcruise.com, and check out the site. Once you are satisfied and you want to sign up, when you get to the question, “would you be so kind to share the name of who referred you, and what source material led you to us?”

You enter Sequel Data Partners, and Tim will ensure you get $100 off the training registration. This is a great offer, and I hope this $100 savings will entice you to check out the site. I did, and went on the Caribbean cruise and had a great experience, and I know you will to. Maybe I’ll see you on board.

Head on over to sqlcruise.com, and use the code “SQL data partners” to save $100.

Carlos: Joey, our final question. If you could have one super hero power what would it be, and why would you want it?

Joey: To bend the laws of physics so data goes faster.[laughter]

Carlos: That’s very interesting. You’re the first one…

Joey: Actually, my other super hero power that I’d want would be a private jet. That’s not really a super hero power, I know, but to greatly simplify aviation.

Carlos: [laughs] There you go. You’ve been the first one who’s indicated that their power would go to helping the client, with the getting data moved faster. That shows your commitment there. Hey, thanks for another great episode. Joey, it’s been great having you on the show, we’ve really appreciated it.

Joey: Cool.

Carlos: Compañeros, thanks again, and we’ll see you on the SQL trail.[music]

Children: SQL Data Partners…

Episode 08: Database Design

1400I chat will Louis Davidson about database design and get his thoughts on some strategies about how to go about making your design and what tools are available to help you.  It may surprise you that, according to Louis, the data type is not the most important component of database design.  I hope you enjoy the show.

Show Notes

Louis on Twitter
SQL Server 2012 Relational Database Design and Implementation
Codd’s 12 Rules
Embarcadero Data Architect

Transcription: Database Design with Louis Davidson
Carlos L Chacon: This is SQL Data Partners podcast. My name is Carlos L. Chacon and your host and this is episode eight. I am super excited to have Louis Davidson on the show today. We’re talking about database design and architecture and Louis has got some interesting things to share.

As always, we are interested in your feedback and you can leave that on iTunes or on Stitcher. If you would like to reach out via twitter, I am @CarlosLChacon. Please be sure to check out all the show notes at sqldatapartners.com/podcast and as always, Compañeros, welcome to the show.

Carlos: Today our guest is Louis Davidson. Today we are talking about logical and physical database design. Louis has been in the IT industry for about 18 years, as a corporate database developer and architect. The majority of his experience has been with the Microsoft SQL Server in every version that has been released since 4.21a.Now I didn’t take the time to look that up, but I have no idea when version 4.21a came out. I’m going to take a guess that it’s been a while ago.

Louis Davidson: It was a while ago.[laughter]

Carlos: Currently he is the senior data architect for the Christian Broadcasting Network. Louis hails from Nashville Tennessee and he’s claimed fame as he can put his pants on two legs at a time. Louis, welcome to the program.

Louis: How’s it going?[laughter]

Carlos: If you want to connect with Louis, he’s on Twitter @drsql and he blogs at SQLblog.com. Today Louis like we mentioned we want to talk about database design and this can be a big topic. One that I’m not sure that a lot of our data professionals are paying as much attention to as maybe they should.One of the reasons I wanted to have you on the program today was you’ve written a couple of books on this subject, the most recent being, “Pro SQL Server 2012 Relational Database Design and Implementation.” You even invited Jessica Moss who hails from Richmond Virginia.

She’s near and dear to our community as our heart to write a chapter in there about reporting design. I can only assume that bringing those books together is quite a bit of work?

Louis: It’s not easy. It’s really hard. I had lots of different co-writers. I really liked Jessica.

Carlos: Very good.

Louis: Some others and I actually invited her to write more, but I couldn’t get her to do it. You really need more than one chapter on reporting design if that’s going to really inform you.

Carlos: Very good. That book is available on Amazon.com. It’s also available from Apress if you’d like to go check that out. Let’s start off with talking about those two items, right? Logical versus physical design, what those differences might be, and why folks might want to use both.Now my experience is; done some development work for a dot com, right, for a start-up, the business folks seem to understand better the logical model, and by the logical model I think we mean all the words are spelled out, trying to save syntax or space, kind of talking about big picture stuff, and really those containers.

Louis: Yeah, and let’s even start before that.

Carlos: OK.

Louis: There’s a conceptual model, is probably the best place to start, and there’s a lot of discussion about what that means, and whether it’s just a napkin design or if it’s something more formalized. I like the definition that it’s basically tables and relationships. You sit down with your user, or your documentation, where ever you’re getting your design; your requirements from, magical requirements that everybody has, beautifully documented.

Carlos: Right, that’s right. That’s exactly what you want.

Louis: [sarcastically] Yeah, it’s in like two or three pages of prose someone’s put together for you nicely, and then the Easter Bunny brings you lunch, because all sorts of magical things are occurring here.You sit down with somebody, and you say, “What do you have?” “We have customers, and they order products.” You write a little box, and you write “Customer,” and you connect it to “order” and you go OK. What does that mean? How do you do that? How do they pay for it?

Can they pay for it with multiple ways? Can they do this? How do you invoice people? How do you ship people? What do you make, or what if it’s partial shipment? You start drawing that out, until you have this picture of what the database is going to look like, you know 80% right?

Carlos: Sure.

Louis: You test it, you have your requirements, you have your scenarios that customers want to do, and you say “Customers needs to order three products, this way, and they need to pay for it with two credit cards and a cheque”, and you just say “Can I do that?” OK I would put this in this table, this in this table, this in this table. So once you run through all these scenarios, you have that. You said wireframe earlier, you have that frame right?

Carlos: Right.

Louis: For how the database is going to look, so once you get to that, you then go to the logical model, and if you’ve done a really, really great job at doing the conceptual model, and working with the customer, the logical model’s easy, it’s just decorating it with attributes about things you want to store.You have a customer, how do you know their name? They have first name, last name, middle name? Oh well, only Americans have that one, Only Americans have that one, but that’s our standard and there’s other standard around the world.

There’s different ways you would do that. Some people might have a table for each name, or you might have 15, what the customer wants to get you start modeling that out and start dealing with the details, and that’s where you get a lot of annoyances with the people who want to build and the people who want to see something is…life is messy.

And all those messiness are going to come out sooner or later.

Carlos: I think one important concept that you mentioned there was the scenarios.Too many times we do maybe try to get in there and start throwing some structure around some things, but being able to understand how a user is going to interact with this application, all of the…maybe not all but at least 80% was what you…the 80/20 rule that you mentioned.

Here are the major ways in which this person is going to interact with them and the things that the business is going to want either capture or be able to report on, describe, offer to that end user and that will help dictate how that goes…

Louis: …because guys who are building this are starting to think about an architecture and if you’re used to our procedures one of the things I might do is go to those scenarios and say user clicks web page, web page causes this store procedure.I don’t necessarily have all the tables designed or ready to go or anything, but I know we need an interface that looks like this and it needs to have these answers. When you have all the scenarios that we’re going to implement, you have all the interfaces you need to build, everything else is just, details.

Carlos: We have a little bit of a chicken and egg scenario here a lot of times and I know a lot from the data professional side or those who mostly deal with data, and then we have those the writers, the developers and then the business, they kind of all each have their unique mindset of how they want to start putting all this together.Again as being a data person I thought or had the opinion at one time that, yes we should get all of the data structure complete before we then pass on or engage the business or the developers because once it kind of gets out, once the cow gets out of the barn, you can’t get it back in that thing.

Louis: Wouldn’t that be nice? OK, here’s done. Then we move on the next project, you guys just implement my design and you can’t do that. The argument between the people that have the waterfall myth, I find horrifying, in some respects because other steps of the waterfall method are necessary.You can’t build something if you don’t know what the requirements are and do a little design. Obviously we can’t spend too years designing something either, so we do little bits and pieces along, but you still need to think about where you’re going or you do end up…because nobody destroys, it’s as bad as building a house.

You want to build a house, you don’t start with one bedroom and then add a bedroom and then add another bedroom, and then now we need a bathroom, so you add a bathroom. I’ve seen houses that are built like this and they’re a nightmare.

The walls are all a different shade because there’s something going on, and so what ideally you do is just tear down the house and build another one. You can’t do that because of the state of the data and the code.

We already had that code written, why would we replace that? Just add another table, add another or that, and so you end up with this mess because we didn’t realize we’re going to have more than 100 customers. We didn’t realize that they needed to order multiple products.

When we built this thing we only had one product so we didn’t have to worry about other multiple product line orders or something. You can go too far with that, because you might also want to say, “Maybe one day we’ll have thousands of products and that we’ll need to be able to order thousands of products on a single order.” Nobody does that.

There’s a reasonable real limit to what people do. You have to balance the future versus getting something done, but there’s the level in there that’s the right level, that feels right, you don’t have to completely rebuild everything every spread if you are using agile or whatever methodology you’re using.

Carlos: I think it’s a little bit more of an art than a science. I think the important thing is to get some agreement right from the different groups that you’re willing to at least move forward with these things, but I think as someone who’s responsible for design, you always kind of want to be looking for the growth or for the new opportunities or new requirements that might come your way and don’t paint yourself into a corner so to speak.

Louis: It’s a lot like sports. Let’s take basketball. If you have five great players and none of them are working together, you’re going to lose. If you have five mediocre players playing those five great players, but they’re all working together on the same page. And they’re thinking the same way, they’re going to do a lot better.The real problem I find is every DBA makes fun of developers. Every database developer makes fun of the developers and back and forth, “The DBAs are evil and we need to go around them and get things done.” If you have people who work together, “I know what to do with the database. I know how to make it work for you so that you have to do less work.”

While they know how to make the UI do things. I don’t want a bad UI. I don’t want to go back to the clunky days where we had these green-blue screens that had a bunch of…And if you had five tables, you had to go to five different screens to get to the data.

Nobody wants that. Just because you can have two addresses, doesn’t mean you have to shove that to everybody. 99 percent of your customers have one address. Build a UI for that, you can’t not build through the second address if you allow it as a business.

You can’t have to have some really horrible method of going to the back end and entering it, and then every report’s messed up. I’m bringing up reports because that’s the number one reason why we store data. It’s to report it.

Carlos: Yeah, get it out and report on it. It’s like we were saying…

Louis: We just take cash and…Give us your product, give us some cash and move on. [laughs]

Carlos: It’s interesting that I get that idea, which I agree with, is the siloed nature of our development teams. They can be very territorial. Instead of sharing and passing the ball, to use your basketball example, they want to hold it. They’re like, “No, this is mine and you can’t have it.”There’s probably a little bit more soft skills that are needed in the database design than, “Here’s the ABCs of how to put that together.”

One of the components that I’ve seen, again working at a dot com was, before I wanted to have all the data, have all that completed. We used the word “wire-framing.” I think of wire-framing as drawing up your application on a napkin type. You get the business has a little bit of input to say, “OK, I think it should flow like this,” or, “This is what would happen when they click on this screen.”

A lot of that comes because we don’t necessarily have our scenarios well-defined. Doing a little bit of that wire-framing with the screening and what it would look like. Putting the data together at the same time helps things move forward and I think it also helps force the teams to talk to each other, to work together.

If being a data modeler is not my full-time position, I’m a DBA, or, I’m even a developer. What are some things that you as a database developer or database design modeler might recommend that these folks know or become a little bit familiar with? So they might better contribute to the application for being better?

Louis: I always suggest people understand normalization. I know it’s a boring topic and it sounds arcane. It’s been around for longer than I’ve been doing databases. If you understand that and you get a taste of why databases are built the way they are, you’ll probably do OK.You’ll know that storing five bits of data in the same row, that should be in five different rows. Is going to mess you up, is going to be a problem. To the way you work.

Carlos: When you want to get it back out, like you mentioned with a Georgia reporting.

Louis: Yes. I always like cards’ 12 rules, the card rules for the database, to just talk about. That’s one of the things I do when I do a pre-count on this, on the database design subject. You talk about what the idea is behind building databases.There’s two fundamental properties that I hope I’ll get right here because I’m pulling this out of my head. One is logical data independence, and that is, I shouldn’t have to know anything about the data, other than what SQL can tell me. In physical data independence, I shouldn’t know the physical attributes of the data.

I should just be able to ask it. At one level, you should be able to write an application that runs on the Express edition or the Enterprise edition of SQL server and they both just run, depending on how much data you have. There are some features in enterprise edition that it doesn’t quite fit, maybe partitioning.

Most of the things are physical attributes that ideally you should be able to change it run-time. If you were to not have any hands, or any of that stuff and that’s your primary goal. Then you’re in much better shape than if you have…If you thought like a program. I was writing a cursor today to run through a procedure, run this procedure that’s built.

I was thinking, “This is so much easier in some respects than writing SQL because it’s iterative, I get to see each row, every row I was putting out, I was sticking it out to a bunch of variables and I’m thinking, “This is why people do this because it’s easy, and it’s natural as a programmer.”

It was something I couldn’t write in regular SQL because I had to call an interface server procedure. If you think like that, you sort of write like that. Really understanding that is the power of SQL. To be able to put things in one statement and have it do millions of things per your one statement is a fantastic thing.

Understanding how SQL works and why SQL is written the way it is. Understanding that while SQL server lets you do things in many different ways, they keep adding features to the product that say, “Let’s make it easier for the cursor. Let’s make it easier for people to do this, let’s put documents in there in a certain way.”

That’s not the fundamental choice to go for a really normalized database first. There’s things you can’t do in that paradigm. Sometimes, it would be better to just use Excel, sometimes it’s probably better to use no SQL.

Don’t try to use SQL Server for everything. Understand what it is your target is and how it works and the best way to use it. Just learn everything and do it good.

Carlos: [laughs] No problem, I know that’s been a bit of a challenge. As an SQL Server guy, if you will, I think SQL Server should be able to do it all. I have learned, particularly if you mention no SQL. That big data approach…That there are instances where SQL Server’s not the right place.I think this kind of goes back to our team approach is that if you can get a few more heads into the mix and those with some different experiences then you’ll be able to make the best of that judgment there. Ultimately I think we’re doing all these database design, we’re trying to hold hands with everybody, sing kumbaya.

Ultimately, a good database design will lead to good performance, a good application. Or at least, an application that we can tune a little bit easier. Is that…?

Louis: Yes. Once you have a properly normalized database, if you have one table for every concept, generally adding information about a concept is simple and normal. Or we discover that customers have such and such an attribute.If you have to go in and say, “Oh, we have a new concept,” if it’s really a new concept. Then adding a new table makes sense, adding new screens and changing your business makes sense because it’s a new concept.

If it’s not a new concept, and it’s just something you didn’t think of like, “Oh, customers have more than one phone number? Oh, that’s interesting, we didn’t even think about that.”

Maybe you may have said, “The rules said, ‘No, don’t do it,’ this is exactly how we want it, it would be more problem.” If it fits the requirement, and it’s too much to implement, then OK, fine. That’s a change in business. You shouldn’t be smacked up with just little changes to your process. This is true for any design.

Carlos: What happens when you, and let’s use that telephone example just because that’s fairly straight forward, what happens when you come across that instance where, “Oops, I didn’t know about cell phones.” [laughs] “I didn’t know I could get a number through Skype.” And these changes come…

Louis: Then you’re stuck; you want to say, “OK, now we have a customer, we should have a customer phone number table.” A lot of times you end up in a compromise, and they go, “No, we can’t build another table, just add another phone number column.” OK great. Now we want to want them to have three phone numbers. OK, great.We’re allowing them to have one home phone, one cell phone, one fax number, OK great. We can put those. Then they go, “No, you know, sometimes people have two phone numbers.” Now you need to make some data about that.

You end up with phone number one, phone number one status, phone number on type, phone number one this. You end up with fifty, sixty columns out there, we actually had this in a database we work on. It was a third-party thing, with email addresses. This is this email address, and it’s this status and this status. It does this and you can use it for this purpose but can’t use it for that purpose.

It just becomes a mess instead of having just one table that it kind of flows into. Again, you should be ready to…If you’re going to change the business, be ready to change the database, change the data structures.

Carlos: I’m curious to get your take. Fortunately, I think the business…A lot of times we get into a jam because they see the level of effort as the percentage change of a screen, “Well, it’s just another button,” right?

Louis: Yes.

Carlos: How hard can that be? Any ideas or any thoughts on how you can approach the business and say, “You know, what you’ve requested here is a fairly large change. We’re happy to do it, but we’re going to have to create the phone numbers table and makes some changes there.”

Louis: That’s tough. Usually, I’m thinking…The business users I can build that table, I can have that thing changed. I can have those structures set up in about a couple of hours. Or it may take a couple of days, but it won’t take weeks.

Carlos: There’s two components there. One is the data component which…Sure, we can add the table, move it around. But now we’re putting the developers in a pinch because they may have to change one of their processes or store procedures or things like that, reporting.

Louis: Yeah, I’m always wondering why it takes long truthfully. A lot of times just because you have…It’s just like a bad database. If you have a poorly-built infrastructure for your application, you’re going to have the exact same problems as a poorly-built database. It’s unchangeable. It’s too rigid.

Carlos: My thought there is kind of going back to the team approach. You’ve got to make sure that you’re not trying to throw anybody under the bus. [laughs] That you state, that “This is what I’m seeing, I can make these changes, but I also think it will affect this other group.”Instead of saying, “I can have my changes done by the close of business. I’m not sure why they can’t have it done by close of business, but I can do that.” I think, again you don’t want to silo yourself to saying that, “Yes, I can do this”, not including the whole team. Is that fair?

Louis: If I’m in a happy place, of never having to actually be in those conversations.

Carlos: You must just be in a very happy place Louis. [laughs]

Louis: Those are tough conversations, the ones we have to deal with either the politics of it all…I really hate politics, you have to deal with the timing and the…

Carlos: When everything’s going to happen.

Louis: The money and all. I’m a developer, an architect/developer. I’m not a politician.

Carlos: Getting back into some of that design, let’s talk a little bit about how users of SQL Server can start looking at their logical designs, their data models. In SQL Server, we have database diagrams. My question is, are you using database diagrams or you’re using a third-party tool?

Louis: [laughs] I would never use database diagrams.

Carlos: [laughs] Never use database diagrams, Oh boy.

Louis: I’m transitioning from one database diagram to another. We’ve had a bit of time getting it set up and using the high-Faluting…Where you could save it in a back-end repository.We’ve had a little issue in getting that setup. I’m actually building a database right now using what I call the Paul Neilson method. I don’t know if you’ve ever met Paul.

Carlos: No, I’m not aware…

Louis: He’s used to be big in the community. He wrote the “SQL Server Bible Series” up to 2008. He was the lead on those up until 2008, I think it was. That is just, build one script, have a bunch of drops at the top, a bunch of creates come next, load some data, have some tests go, all in one script.The database diagrams don’t show you enough and you don’t feel…I’m not saying that people can’t make use of them. You can’t connect to the database you’re using because they’re it’s a live editor. The worst part about them is, everything you want to show, you have to put in there.

When you use a proper data modeling tool, you can put all sorts of stuff in the model that’s never going to be implemented. Documentation for starters, if you’re stuck in a place where they think foreign keys are bad, and they do exist.

Or maybe you’re using the newer memory stuff, and you can’t actually implement them. You can still put them in the model and see that they need to exist, and know you have to write code so you can make that change.

You can put all that stuff in the model and see it, and edit it, and refine it over and over and over until you get to the place where you want.

Carlos: If I’m hearing you correctly, if you’re serious about modeling or that’s something that you’re doing more and more of, you will have to use a third party tool. You mentioned you were changing…I use the Embarcadero Architect’s Tool. That’s my tool of choice at the moment.

Louis: I was trying no to be.

Carlos: [laughs] No, we can throw it out there, right?

Louis: That’s where we’re actually going to because they have a, SQL Server centric product. That is…

Carlos: Maybe that’s why I’ve been so happy with it. It’s worked out really nice for me. Not cheap but it does the job. My impression of database diagrams, and you actually went a little bit further that I’ve thought, was more of, If you don’t know what you’re data model looks like, you can use database diagrams to at least put some pieces together.Become more familiar. Either you’re a new developer on the team, or heck you don’t have any documentation to this point so you might as well throw something together, you can use this diagrams for that.

Louis: I mean it’s a solid viewer of the structure, but it’s not what I would use as a design tool.[crosstalk]

Carlos: …The viewer is probably the only way that I use Database Diagrams now. If I can’t reverse-engineer, into tool like Embarcadero.

Louis: You can use it on a clean database. Just generate the structures there, see it. You could use it that way if you wanted to. Not everybody has access to those Embarcadero tools because some of them are not exactly free or anything.

Carlos: They’re not cheap, no. I think your point was valid, we should reiterate that again. That the database Diagram tool is a live editor. You make a change to that. You’re changing your database.You can’t do that in your production environment unless you…That’s just the way you roll. [laughs] You have instant access to your developers like, “Hey, by the way, just made this change, FYI.”


Louis: That’s the DBA thing. Every DBA has done it at least once I think.

Carlos: That’s true. In a previous episode, I did admit to obfuscating changing some code around in production that I’d been working on at a test. I actually never made changes to SQL Server using database diagrams at a broad environment. Luckily, I guess I managed to figure that out in test soon enough, luckily for me.

Louis: That change manager would yell it’s all anyways because they add some objects into the database, [laughs] “What are these new objects in the database? Who did this?”

Carlos: “It was me probably. I hope you can’t audit that back to me.” [laughs] Thanks, I do appreciate the conversation Louis, again. For those who want to take a deeper dive into the “Pro SQL Server 2012, Relational Database design, and Implementation” book available from Apress or Amazon.Before we let you go, I wanted to ask, what’s your favorite SQL tool? This can be paid or free. Why do you like it and how do you use it?

Louis: The one that’s changed our lives the most has been SQL Source Control from Redgate. I always had a problem trying to figure out how to check things in. How to do a really good source control?You try to make scripts and then you check in the files. It does that for you in a way that works with the person whose used management studio forever.

You just go into management studio, you check it out, you make the changes, it sees the problems. Everything’s right there in the tool the way you like it.

Carlos: You can use it right there in SQL Server Management Studios without having to change anything or go to a different application to get your stuff. Your favorite experience as a database administrator, I guess, a story that would help illustrate why it is that you like to do what you do?

Louis: That one’s tough. I’m really that bad at telling stories or remembering what happened yesterday.

Carlos: [laughs]

Louis: I like being in DBA because I like the ability to shape the data, and get things right. The best experiences have been working with another designer who’s building the front end, and we built these stored procedures. We built this UI, this whole system where the screens…The thing just came right off the stored procedures. We generated those out of a data modeling tool. It just, “Boom!” Everything worked with very little override.

Carlos: Very nice, very nice. Getting to work with people that know what they’re doing, and can help move things forward, always good experiences.

Louis: Yes.

Carlos: To close out, if you had one superhero power, what would it be, and why would you want it?

Louis: [laughs] So many terrible answers to this question.[laughter]

Louis: I don’t know. For me, if I wanted to be realistic about it, it would be the ability to go to these conferences that I go to and actually enjoy the entire time. [laughs] But, I guess it’s not superhero because I see more than my superheroes, Caitlin, Brent, and Itzhak, all out doing these things. I don’t guess that’s a superhero. That’s just me not being old.[laughter]

Louis: Old and crotchety.

Carlos: OK. So, unlimited energy, or something like that.

Louis: Unlimited energy would be it, right?

Carlos: There you go.

Louis: Super speed, unlimited energy.

Carlos: Louis, thanks so much for being on the show. We do appreciate it. It’s been great having you.  Compañeros, we’ll see you on The SQL Trail.