After 170 episodes, we bring Louis Davidson back to talk about database design and get his thoughts on how we can get better at planning for our data.
Is Normalization Still a Relevant Topic? by Louis Davidson
I have been in the IT industry for more than 18 years as a corporate database developer, architect, as well as serving on the permanent unofficial committee to make sure the Internet still works. The majority of my experience has been with Microsoft SQL Server, in every version that has been released since 4.21a. Currently I am a senior data architect for the Christian Broadcasting Network in Virginia Beach, Virginia. I actually reside in Nashville, Tennessee, sometimes supporting our call center there.
I have a bachelor’s degree in computer science from the University of Tennessee at Chattanooga, with a minor in mathematics. I have been volunteering with the Professional Association for SQL Server (PASS) for way over 10 years now. I have spoke at most of the PASS conferences, and a large number of SQL Saturday events.
In October 2004, I was awarded the Most Valuable Professional (MVP) award for SQL Server by Microsoft, an honor I am very proud to have been given and have so far continued to receive. In my “free” time, I write articles for my blog on SQLBlog.com, and write a blog on Simple-Talk’s website that is themed to What Counts For A DBA, and speak at SQL Saturday events. I also slept in one weekend, and I didn’t hate it.
I have written a book on database design for each of the last five major releases of Microsoft SQL Server, and am (always) starting to gather information about the next version of SQL Server as information starts to come out. I also wrote a co-book with Tim Ford called Performance Tuning with SQL Server Dynamic Management Views that I am very proud of, but it was the most difficult project I ever worked on.
Never Miss An Episode
Subscribe to get podcast notifications by email.
“Normalization matters even for databases that aren’t relational.”
Read the full transcript of this episode.
Carlos: Compañeros! Welcome to another edition of the SQL Data Partners Podcast. It is good to be back. Of course, Louis doesn’t know this, but we’ve been taking a respite from the podcast this summer, and so this will not be available until August. We’re taking a little bit of time, so it’s nice to be publishing these again. I may have said August, I meant September, actually late September, but this one’s not going to go ’til October.
Eugene: Through the magic of editing–
Louis: Are you sure it’s not November?
Eugene: Julian will just fix whatever you said.
Carlos: It’s one of those months.
Louis: One of the 12 months.
Kevin: The year is 2021.
Eugene: This won’t come out ’til December 2019.
Carlos: Yes, the good news is, it’ll still be sometime in 2019. So, now Louis, you’re in my top 10. So, my top 10 are the first 10 people that I interviewed on the podcast, and I was looking back through the notes. Louis, you were episode number 8, so you hold a special place in the podcast lore, for being one of my very first interviews. So, we appreciate you taking this time to come back on.
Louis: And you asked me back, that’s the scary part.
Kevin: But it took however how many hundred episodes.
Carlos: It’s 170 episodes later. This is Episode 178.
Louis: Wow, that’s incredible.
Carlos: Well, thank you. Luckily, I am joined by Kevin Feasel.
Carlos: And Eugene Meidinger.
Eugene: Howdy everyone.
Carlos: And they have been helping me press on. So yeah, it’s fun to do this. So ultimately, we’re going to be talking with you about database design. We have some questions and some thoughts, particularly as Microsoft has announced things like Java integration and even Polybase to a certain degree, JSON in SQL Server, so database design all of a sudden becomes kind of an interesting question again. Before we get there, however, I do have a couple of shout-outs that I want to give. I want to give a shout-out to Frank La Vigne, Peter Sammut and Rachel Laula. So, thanks, everyone, for connecting with us on LinkedIn. Our show notes for today’s episode is going to be sqldatapartners.com/databasedesign. All one word of course, because hey, it’s a URL, or at sqldatapartners.com/178. Now, we’ve been talking a little bit about database design and someone, I won’t tell you who just yet, wrote the following. “Normalization. It sounds like a cold, dry concept, and in fact…it is. Yet, at its core, normalization is simply about making sure a relational database design makes sense, represents what the designer expected, and follows a pattern of implementation that matches the need of the RDBMS engine. When we normalize a database, the process works to eliminate redundant and overcomplicated data in multiple ways, making sure your rows and columns have a single meaning.” Do those words sound familiar? So that was taken, actually, Louis, from a write-up that you had posted, not all too long ago, which we’ll of course link to in the show notes for today’s episode. But when we think about database design, I guess where I want to start with, right, again, this idea of there are so many different designs, but ultimately, when it all comes back to it, we’re talking about SQL Server, which is a relational database engine. Does normalization still play a role in today’s databases?
Louis: Have we gotten to the point where I’m supposed to answer now?
Carlos: Yeah, this is–
Louis: Okay, I wasn’t sure.
Eugene: I’m waiting for you to shout YES really loudly.
Louis: I’d say wait, who wrote that? I’m not sure.
Carlos: Yeah, welcome to the program.
Louis: Is that me? I’ve taken a lot of drugs recently for a knee replaced about 4 weeks ago. I waited the proper amount of time.
Carlos: So does that mean it might depend on what the answer is?
Louis: It depends.
Kevin: The answer is currently purple.
Louis: So, normalization matters even for databases that aren’t relational, right? If you ask someone to build a spreadsheet to catalogue their Lego collection, they’re not going to put one big blob of text, right, and just write it out in text?
Carlos: I don’t know.
Louis: I mean, they’re going to make a list, and a list is a bunch of rows and columns. And then, if they find themselves repeating the same text over and over, like you have 1000 Legos and 1000 Duplos and you want to write something about what Legos are, you aren’t going to repeat that thousands and thousands of times, right? Redundancy is bad in almost any type of storage. So yes, it obviously makes a difference. We don’t want to duplicate anything we can, not duplicate, right?
Carlos: Well, it seems like it’s specific to the relational side, because I feel like our analytics folks or our analytics brethren and sisters would say, “no, no, we want to duplicate.” It’s okay to have that Power BI. We’ve talked about that, right, Eugene? That sometimes that duplication makes things better? But I feel like we have to remember the purpose of where the data’s stored and how that data’s going to be used, plays a role, here.
Louis: So, what I meant by ‘it matters’, for a fully un-normalized database, you’d have one table and one column. In a dimensional structure, you’d have a table for each of your facts and each of your dimensions. You may repeat data, because you’re using a different pattern to maintain that data. You build in the tools to handle that duplication to make sure the redundancies don’t blow up your– like if you’re supposed to update a thousand rows with that description of what a Lego is, you build that into your tools. But you still would not have just one table with one– you don’t start rolling up data, for example, in a dimensional structure, you try to have one row per event. The more you go away from that, it becomes a problem. Every kind of database has its own patterns. I know a lot more about relational than I know about anything else, and dimensional, really well. I mean, I’ve seen XML, JSON, all these others. They’re all storage and there’s redundancy in all of it. Even the best relational database is probably going to have some redundancy, but the goal is to either have no redundancy or understand how you’re dealing with it.
Kevin: Yeah, on the Kimball model, bouncing off of what Louis said, that whole Kimball model of facts and dimensions, it’s very explicit in the books that the reason that it can work is because there is a central, single process, which is allowed to modify the data. If you have multiple processes that are allowed to modify the data, now you’re going to set yourself up for failure; the types of failures that proper normalization prevents.
Louis: But even that’s going to be in first normal form, right? You don’t usually have an array of data in a column in a dimensional model.
Kevin: This is true.
Carlos: So, it’s interesting, and again, in this piece, you mention a couple of the goals or the ideas for normalization. So column value should represent only one value and should not need to be broken down by the user for use. I guess as I was thinking about this, preparing for this episode, the goals came down to, you still have to know what you want to capture and then how you want to then give that back to the user, or in the application or how it’s going to be used, ultimately. And if you can describe that, then you’re going to be in good shape.
Louis: Sounds easy, right?
Carlos: Sounds easy.
Louis: Because luckily, customers know exactly what they want, at all times, which is completely not true, right? They change their minds, they want something else. I mean, occasionally it’s okay to break those rules. If you have a paragraph, it’s okay to look in the paragraph for the letter V, if you’re trying to find, or like when you use things like sentiment analysis. You know, you’re not going to necessarily break down every paragraph into tables and put the words in different columns to do that sort of stuff. There’s algorithms, there’s tools that would work well with blobs of text. It’s a really complicated thing to do right.
Carlos: Right, so do you think– is part of that complication, rather, that it is then hard for us to update tables once they’ve– like they’re in the wild or in production, and that’s why people maybe don’t get good at defining some of this?
Louis: Yes. Backwards compatibility is the destroyer of humanity, right? Every time you build something and the next second, they want to change it. I always give an example of an email address. You have a customer, you put in an email address column in the table. Then they want to know things about the email address. What’s the email address used for? Is it still good? And then you want to have another email address. And so you go, “should I pull that email address out and make another email address table? Nah, we’ll just put an Email Address 2.” And then they want three, and then they want five and then you have 10 columns, 10 email addresses in the table, and you have 20 columns each to describe how the email is used. And then you’re like, “why did I do this to myself? Because now every time I want to look for an email, I’ve got to look through 10 different columns.” But it’s so much easier to do that while you’re building, and programmers don’t care so much at times. Not all of them, but you know, many programmers, their goal is to do what the customer wanted immediately.
Carlos: That’s right, to build it.
Louis: And that’s good, because business gets done, but it’s bad when you are trying to optimize as you grow larger and larger and larger. CPU can only outpace stupidity just a little bit.
Eugene: So I guess I have a question there, because I think we all agree that database normalization or good database modeling is like motherhood and apple pie. I think we all agree it’s a good thing. But I guess my question is, which of these things do we optimize for the most, or at least what are we concerned about the most? Is it the performance of the server? Is it making the database developer’s life easier or is it avoiding incorrect data? You know, cause I’m just thinking there’s certain scenarios where, okay, let’s imagine that you did just have one big table instead of maybe some headers and line items or something like that. That would make report writing a pain in certain scenarios, but it may not be tremendously difficult for reading data out of the database or that sort of thing. So I guess, what’s the biggest pain point you think you’re trying to avoid with promoting good database modeling?
Louis: You said quality–
Eugene: I think like are we optimizing for performance, are we optimizing for making the human performance, the database developer or are we trying to avoid like incorrect data?
Louis: So, it’s quality performance or ease of development?
Eugene: Yeah, yeah, I would say so.
Louis: The problem with ease of development is you’re catering to the lowest denominator. What’s your worst programmer on the team? We need to make sure it works for them. Well, that’s easy, I can do that. But yeah, that one’s always tough.
Carlos: I feel like a good database design will help you with all three things.
Eugene: Yeah, absolutely.
Carlos: Because it makes the data much more approachable, so from a usability perspective. Again, I think about from a relational perspective, you’re trying to describe it in such a way, and particularly from a transaction perspective, that I can transact that with exactly what I need and no more. Or if I could separate those two things, then I can, then I will. And then of course, making sure that the data is correct is again, because I can understand it, I can then ensure that I report it correctly.
Eugene: Yeah, so I wasn’t trying to frame it as like the iron triangle of tradeoffs of fast, cheap, and easy or something like that, but more like, okay, what’s our biggest concern? What’s the biggest pain point or issue that we’re trying to avoid? Because it has a lot of benefits, but from listening to Louis talk, it sounds like there’s a lot of pain points he’s run into with people not following good practices when it comes to their database modeling.
Louis: Yes, and I will admit I’ve worked for the same company for 20 some years, now, and yes, through the many, many hundreds of people I’ve worked with, that’s always kind of the problem is, you build something that works well enough for the situation you need, and it doesn’t grow well. And it’s true, it is hard, because if you don’t get it right in the design phase and again, the system I’m working on– one of the systems we work on was designed in the 90’s and it was ported over and then it was ported over and now it’s in Microsoft CRM, but it could very easily be the same system that was there in the 90’s, with the same structure that was in SQL Server 1.0 or something. Because they were trying to put it in a Java– into Sun version of SQL Server. Was it Sun? It was Sun, right?
Carlos: Yeah, yeah.
Louis: In SQL Server? Yeah. And so you make mistakes and you don’t quite get it right, but then you try to add things to it. One of the things I always notice about bad designs is if you can’t figure out where to put the next bit of information, there’s a problem. If you have a book table and I put the author in the table and I realize now I need two authors, and I just make Author 1 and Author 2. And now when it’s inside, how are you going to add data about what they did in the book? And now I have to add more columns to the table? This feels wrong. I need to add one column to the table for one bit of information. That’s when you know it’s right. But getting the time to create the new screens and get the programming done and change the EPL. You know, it’s hard. I was just thinking about this a second ago. What’s really strange is back in the late 90’s and early 00’s, I guess they’re called, we did a lot of denormalization for reporting and we did all of our work on the same server. And you know, because okay, we had like 50MB of disc on our first SQL Server. We ran the entire (unintelligible) International headquarters. We got rid of a mainframe, we were running on this 50GB thing with like 16MB of RAM. 16mg. My cup here that I’m drinking out of is like 16MB of RAM, probably. But we would do all of our reporting in the very same place and we would denormalize for it. And now, we have these paradigms where we know we pull the data out, we make reporting systems for doing that reporting, but we still had the problem of how do we make the changes? And kind of those dependencies make it even harder to make changes, because you’ve already used that data somewhere else. I would have to change EPL, I would have to change structure, then I’m going to have to change reporting. And it seems like we’re trying to get rid of that, kind of bring it all back into the main server and just use more power, machine learning and things. Which is good, I mean I’m all for that. If you get good data, you can do good machine learning and good reporting.
Carlos: Well, and that’s an interesting concept of walking through that and asking yourself, “okay, well, as I describe this data,” so you mentioned books and authors. “Okay, right now, I don’t have to answer any questions about authors,” but it’s not a super big giant leap to think that, “you know what? At some point, they’re going to ask me something about the author.” And then trying to decide, “how would I then describe that?” Now, granted, it’s easier when you have the large concepts. I’m sure it’s the smaller stuff, email is probably a better example, where it’s a description of something else, that then all of a sudden becomes its own, I’ll use the word object, although that’s not quite the best word. But it becomes its own piece of information that you want to collect information about. So yeah, tough.
Louis: And then we came up with this idea of agile, because waterfall was terrible.
Carlos: Too long, too expensive, yeah.
Louis: I literally don’t know how anybody got away with that, thinking about it. It’s like telling your dad, “I’m going to go clean my room, but I’m going to spend like 6 years designing how I’m going to clean up my room, and I’m just going to live in the squalor until I get done.” That really doesn’t quite make any sense. But at the same time, we would then say, “okay, I need to model books and I’m going to do that, and that’s all I’m going to think about. I’m not going to think about anything else.” And then they would say, “okay, now what’s next?” “Well, now we need to talk about something about books, which requires a complete, total change to the database.” “We can’t do that. It’s already in production, already people are using it.” So you kind of need this kind of architect mind to go, “this is the thing someone’s going to probably want.” It’s kind of like playing chess. If you’re not exactly thinking a few moves ahead in chess, you die pretty quick, because the other person’s thinking ahead. “Oh, he’s going to move here, and here and here, and then I’m going to take that, and we’re done.” The people that are running organizations have to think, “where do we need to be? What are we trying to accomplish, for real? Even if we’re going to tell the programmers, ‘do it the best you can, but get to market tomorrow.’”
Kevin: Right, and also keeping in mind that many of the techniques that were designed are around stateless services. I can change a stateless service, pop out a new version of it and nobody cares, because the state only exists in a transient fashion. So, for example, web application, stand up a new version of it, I can start pushing people over to the new version and it’s just there. And by contrast, in a stateful service is quite a bit more of that architectural design necessity, because you can’t simply pop someone over to the new version. You have to think about the process of getting to that new version and design choices you made years ago, or somebody before you made years ago, can possibly constrain your answers. Where somebody who’s done something that’s particularly bad, but it’s heavily engrained, you might not be able to fix that.
Louis: So do you think there’s ever a case where a schema-less design for customers makes more sense, then? Or even books or stuff like that, where you kind of version the schema inside a document?
Kevin: It can make sense, I think, in certain cases. In general, I don’t like that idea, but if you’re at a startup, or you’re doing prototype code, “okay, I can throw this out, I just want to have something there.” But the problem is that people tend to take prototypes and make them production, and then you’re stuck with this awful schema where it’s like the customer also has favorite dog breed, and then off of favorite dog breed, we have all of the dogs of that breed, regardless of who owns them. And now you have this really terrible design that doesn’t work and when somebody asks for something slightly different from what the schema was designed for, that’s four months of effort.
Louis: And it’s really hard to argue, sometimes, because the thing I said in my head about document database is it kind of makes sense. You know, why not just put all of it in the document and when we’re building our code, we’ll just interrogate it and see what’s there and put it on the screen and let them edit it and use it. And you realize that when you start using this, unless everybody follows a set pattern, it’s going to fall apart and be a mess, so the best way is to set up a pattern.
Carlos: Yeah, that’s right, that’s a fair point. So having people can be– I can hear them out there saying, “yeah, that’s what, you know, non-relational systems are for and it’s just so easy.” I had an experience working with Snagajob, well, it’s now Snag. But they’re based here in Richmond, Virginia, so they started out on SQL Server and then went to MongoDB because that was so much better, but then tried to start making changes to their documents, or, because they didn’t necessarily– again, this idea of the stateless, nobody was driving the ship on what the document structure should be, so developers started changing them. And so then all of a sudden you had documents of different types and it didn’t help at the end. Sure, it made it easier to roll out a new feature, perhaps, but then when you wanted to get reporting and all the downstream pieces, yeah, it was still as challenging as if it had been a relational system.
Louis: Do developers realize that the reason we store data is to report on it?
Carlos: It’s all IoT, man, it just goes into a data lake, you know?
Louis: There’s literally no value in capturing that a customer made an order, other than taxes and to get them to do it again, right? “Hey, you’ve bought this before, want to buy it again?” “Yes.” And it used to be done by mom and pop in the shop. You walk in, they go, “hey, Louis, how you doing?” You go, “ah yeah, I’ll take a couple bags of seed.” “You know, your cows are doing–” well, no, never mind cows. But you know what I mean? It was done by people and faces. When you went into the store, they knew you, because you lived in a small town and such, but now we’re all online, they do it by scraping all the data you possibly do. We do marketing for our work. It’s one of the things– I’m on the marketing team and where you capture what people do on the internet and try to stitch it together and serve them better.
Carlos: To offer them something, yeah, that’s compelling.
Kevin: Right. Now, this isn’t the fault of DocumentDB or CosmosDB now, or MongoDB or any of the other schema-less systems. Because in the relational world, you have people coming up with the entity attribute value pattern, where, “I’m just going to do this same thing, but in a relational database, because I don’t want to have to think about relations.”
Louis: I’m absolutely not against any of the patterns. I mean, every pattern has a good value, has good use. It’s just if you use it wrong, that’s bad. Like, what was the example someone was giving me about gaming? Like if you’re building a gaming system where you’re wanting to capture the people as they’re playing the game, (unintelligible) document sounds perfect, because you can throw that away, that’s the throw-away data. My process through the game, all the bits and details. That’s great, you can throw that away. You lose it, who cares? That’s a perfect use for it.
Kevin: Yeah, or YouTube comments. You lose those, who cares?
Louis: But if you lose someone’s order, but you took the money.
Kevin: They care.
Louis: They care. They really care if you take their money.
Carlos: Yeah, they tend to frown on, like, “what do you mean you don’t have my order?”
Louis: We have no record of this.
Carlos: Okay, so I guess we’ve defined some of the challenges and I would also say one that we haven’t really talked about is then organizational structure. Because then the question is “well, who is responsible for this?” And it’s oh, man, you know, it will depend. The answer to that question is as varied as the organizations of the people that are listening to the podcast. But I would definitely recommend speaking up. Saying, “hey, we need to think about how this is going to be used in the future.” Take some of that time to do that. Are there any positives? So instead of, maybe beating everybody up on why they’re not doing it well, what are some tactics they might employ to help themselves or help the future Louis’s that they’re going to work with that are going to stay at the company for 28 years after they’ve gone?
Louis: Yeah, you see a lot of bad code you’ve written in your past. There’s still management tools that just come up that, “who wrote this?” I’m like, “I wrote that 20 years ago.” That was a long time ago. I’ve slept since then. And new body parts have been replaced since then. Yeah, if you have that mindset that you’re going to be there for the rest of your life, not that you have to be, and I only stay because I like it. When you were saying, “what are some things about the structure files, like, “I can’t answer that question about how much I hate my employer, because I don’t really.” There have been times in the past where we had less good management. They’re all gone now. Most of the people I work for have been there for 10 years as well. But if you have that mindset that, “I’m going to be there forever” and think about what you’re doing to yourself– it’s hard to put right. Make sure when you’re naming things, you’re naming it so you will understand it and everybody else will understand it. Make sure that things are common. I mean, comments are a really good thing. Make sure when you’re designing, you’re thinking about that. “What if I have to add to this?” And we all get short-timer’s disease when we think we’re going to go somewhere. I’ve done that many times in my life. I get short-timer disease when I go on a vacation. “I’m not going to make it back from vacation. I’m going to stop worrying about things.”
Carlos: So when you think about ways to start or low-hanging fruit, naming conventions are probably the easiest one, because it’s something you can do now. Don’t short-change yourself, because you don’t want to type 20 more characters. We don’t have the restrictions anymore on the whatever it was, 30, was it 20 character column names?
Kevin: Eight characters on those mainframes.
Carlos: Or eight characters, even.
Louis: Yes, the eight character names are the worst. Thirty character names were super hard, but eight were–
Carlos: Yeah, we can go big time. You can put, maybe not quite a novel, but you can be descriptive, there.
Kevin: 128 NVARCHAR bytes, so double bytes.
Carlos: Yeah, with the tools that we have now, it’s fairly straightforward. I’m not saying you’re never going to have to retype that again, but can we all agree that it’s of benefit to go ahead and spell that out, rather than saving yourself 10 or 15 keystrokes?
Louis: Yes, I’ll agree to that all day long.
Kevin: Yeah, yeah, now naming is unfortunately also one of the big holy wars, because, “well, I want to use ID with a capital D and somebody else wants to use it with a lowercase d and I’m like, that’s an Id. That’s totally different.” And then someone else wants to call it a code and somebody else wants to call it something else.
Louis: Yeah, so it’s better to have one bad naming standard than to have many good ones that you’re running at the same time, right?
Carlos: There you go, so getting some consensus, and throwing it in and being, “okay, here’s the winner. How are we going to decide to do that? Arm wrestling?”
Louis: Because there will always be somebody who wants to put spaces in their name because they think it looks better that way.
Kevin: Yeah, or even something that I’m dealing with currently, actually this sprint, which was that I designed the entire database myself. I did every statement myself, so I can’t even blame anybody on my team. And they’re like, “you do realize that you called this thing something here and something else here and something else in a different place?” I’m like, “yeah, of course I did, because that was three different Me’s.”
Carlos: Yes, three different iterations. Yeah, that’s tough, because then there’s no, I shouldn’t say value, but from like making an enhancement or like a sprint, you can’t cross something off on the sprint. It’s technical debt, if you will, at that point, and that generally doesn’t make the top of the list for the sprint items.
Kevin: Don’t tell anybody, but it does for my team.
Louis: The worst thing is spelling. You ever misspell a word?
Kevin: Oh yeah, oh yeah.
Louis: And you don’t notice it, because– well, Intellisense is awesome. I could mention other company’s products, but there’s other company’s products that are super awesome at this. You type in and it just fills it in for you, joins everything and a couple of days later you’re like, “wait, that’s not how you spell solicitation.” That one’s the worst. I’ve spelled that a hundred times different wrong. Is a different wrong a right? I don’t even know. I’m a little confused at that point. But yeah, the spelling, naming things, yeah, using a capital D or a little d. And it’s all really annoying stuff that will get you yelled at by people because they don’t really care. “Just get it right and leave us out of this. We don’t care.” Until you– then you try to make them type something really long. They say, “no, don’t make me do that.”
Carlos: I think what we’re suggesting, perhaps, is if you don’t have a standard, create one, suggest one, put one forward.
Carlos: So experiences doing that? I’ve had mediocre experiences, but at least I can say I’ve gotten the ball rolling.
Louis: Yeah, I have a naming standard that I put in my presentations, and it’s certainly in my book. It’s kind of hard to describe in words, but it follows a certain pattern. I’m told there’s an ISO standard that it mimics, by the person who told me about this like 20 years ago. By you know, you just have certain parts of the name that you always use and the class word goes at the end and the purpose of the attribute goes before that and you follow the same standard, even if it feels weird. For example, let’s say you have a customer ID. The customer’s the attribute and the ID’s the class. We don’t know if it’s an integer or a unique identifier, but we know it identifies something, because ID means it’s an identifier. So then what if you have an IS customer, the Boolean IS customer? That is a different pattern, the IS is the class word telling you that it’s– so if you do it like that, then you don’t have this kind of standard lexicon that follows along. Where a flag is another way to say IS, so you may say Customer Flag, and then you know, ON is yes, you’re a customer, OFF is you’re not a customer. It’s kind of like your mailbox. There’s mail in my mailbox for the mailman, mailman’s been, kind of flag thing. So following a set way of building the words to the name is really helpful. It gets really complicated, because you have these naming standards that have been around forever that have good points and bad points. I’m a IDEF1X model kind of guy with the lines with a big dot on the end and dashes. But I have a naming standard there where every name should be different. So if you have a name column in one table, you can’t have a name column in any other table, so you end up prefixing it with an entire table, which I find really annoying. But on the other hand, I like the way you name tables singular, because it allows you to make sentences. So you say “I have one customer row, I have two customer rows,” and so you never have to pluralize the name when you’re writing these documents, so you can generate documents describing the relationships. So you have the one row in this table is related through a phrase, to another table, and then you have the rows and you pluralize it as you go. So having a naming standard and kind of understanding all the points of the naming standard is also incredibly important, because when you look at somethings initially, if feels dumb, like saying “Customer Flag” instead of “is a customer”. But that sentence “is a customer” gets confusing if you put it into a lexicon that follows a pattern that you can then read with a computer, because ideally, if you can read your name, but with a computer it’s even better, right?
Carlos: Ideally, but it’s almost like you’ve gone to another plateau there, because now you’re talking about documentation on that structure, which–
Louis: Yeah, the perfect database would need no documentation on it.
Carlos: Yeah, well, I don’t know, that’s interesting. That’s an interesting thought.
Louis: “What does the value of one mean in this column?”
Carlos: Yeah, that’s fair.
Louis: And I’m sure we’ve all come across that; the status column. If we look at the SQL Server metadata, sometimes there’s columns that have ‘value’ in it. It’s the status. Is it 0, 1, 12, 2, 2 of 46 or 13 and you’re, “I don’t know what any of that means. I’ll have to look it up.” And then you realize you have to use an ampersand to AND the values together to find out what all the different settings are packed into that one value.
Kevin: Yeah, yeah, let me just come out strongly against bitwise notation in SQL Server and say, if you’re doing that, you’re probably doing a lot of stuff wrong, so this is your Public Service Announcement: If you have to use bitwise operations to get your data out of your table, you’re probably doing it wrong.
Louis: I’m going to go ahead and say, “you’re definitely doing it wrong.”
Kevin: This is true.
Louis: I don’t know if they still have this in SQL Server. They probably have it in some tables, but now they have the system catalog views, they unpack all the values for you. So much better with a– extended the object property, column property functions, but yeah, that used to be a thing. I remember there was some pattern you could solve better with a bitwise iteration than you could with anything else.
Carlos: Okay, so now, for all the knuckle-dragging Neanderthals out there, or maybe just me, help me make sure that I know what bitwise– what you’re talking about here?
Kevin: Yeah, so the idea here would be you have, for example, a tinyint. A tinyint ranges from 0-255. There are 8 bits in that tinyint. If you set the first bit, the value’s going to be 1 and so that represents that one thing is turned on or not, so ‘the customer has blue hair’. The second bit may represent ‘the customer walks with a gait’. So if the person walks with a gait, but doesn’t have blue hair, I use 2. If they have both, the value is 3. But you add the next bit, so now you have 8 separate flags, Boolean flags, that are stored in one tinyint value.
Carlos: One column that you then have to decipher what the heck that thing is.
Louis: Because you can save a lot of space like that. Or you could just use a bit data type, which actually does the exact same thing.
Carlos: Exact same thing, yes. All those bits, because we don’t have enough NVARCHAR maxes around. Okay, so last thoughts on database design?
Louis: Do it. And understand what engine’s best for the job you’re trying to do. You know, understand what– if you’re starting to build a new system, find out what SQL can do, find out what all these other systems can do, if you’re that kind of architect. SQL Server does some stuff great, and it does some stuff not so great. I wouldn’t use it to capture every single click on your webpage unless it’s my webpage. I have a Disney webpage and I get one or two clicks a week. that’s okay, I can do that in SQL. But you know, our corporate webpage, we get millions of hits a day, and that gets really complicated, and you get so much data that it becomes unusable in normal kind of queries. And it’s not very structured, so SQL doesn’t work great with not structured data. It works okay, but there are tools that do work great with that.
Carlos: That work better.
Louis: Break out the tools.
Carlos: So, use the right tool for the right thing. Who would have thought?
Kevin: Yeah, and also, there are several levels of this you can get down into. So I mean, I’d love to be able to just say, “look, go learn the math of set theory and relational algebra and then life gets so much easier, because you’ll actually understand what’s going on.” I understand a lot of people aren’t going to want to do that. Louis’s book is outstanding for database design. I’ve got at least two versions– two editions of it, here, going back– I want to say I had a– was it a 2005 version. But a couple different versions, really good, goes very into depth on design, on normalization, on dealing with a relational database in what I would say is a production-style environment where it’s not just as easy as, “oh, well let’s see, all you have to do is find all the determinants and then that combined with the candidate keys is going to be able to give you the understanding of how your relation should be designed,” which is technically accurate and sometimes workable, but it gets a little tough if you don’t know what a determinant is.
Louis: I can say this just took a nerdy turn, didn’t it?
Kevin: It’s math. Again, going back to that relational algebra, like oh, determinants, part of set theory, and if you understand that, then Boyce-Codd Normal Form makes a huge amount of sense and gets really easy to do, and then you have the generalized form of third-normal form.
Louis: But I would like to say, you don’t have to understand relational algebra to buy my book.
Kevin: That is true.
Louis: I do not get into any of that. I actually make a C in my database class in college because of how crazy that stuff is. I have a minor in math, I have no idea how. I got a C in every last math class that I took, but yeah, that stuff just goes right over my head.
Kevin: Admittedly, part of my answer is because I’m getting knee-deep into category theory now, so I’m going even more overboard in crazy math land.
Louis: Feasel is a little smarter than I am, to say the least. I’m not even somewhat kidding, and it doesn’t hurt my feelings whatsoever if someone says, “you know he’s smarter than you?”
Carlos: We know who we go to on the podcast when we have to clarify a question. “Judge, is that correct?” Okay, shall we do SQL Family? (silence) The correct answer is, “yes, Carlos, we should.”
Louis: Are you asking me?
Carlos: I am, as a matter of fact.
Louis: Okay, then, sure.
Carlos: As the guest, my questions generally fall to you.
Louis: I thought you wanted to see if anybody else had any other torturous questions? And by the way–
Kevin: Please leave this un-edited, including the awkward silence. Oh, and Eugene waking up.
Carlos: Yeah, we’re done with the math, Eugene.
Eugene: Yeah, after the category theory, once we start talking about like commutative groups, you know, I just– I get a little tired.
Carlos: Okay, Louis, these questions are for you. All of them. All-time favorite movie?
Louis: So I think at one time I had an all-time favorite movie, and it was Star Wars. But, nowadays it’s more like the Star Wars universe and the Marvel universe. I can’t get enough of those things. Every time they come out with a movie, I’m there the night of.
Louis: Like when The Force Awakens came out, I was there at 12:30 in the morning. We had tickets at 8 o’clock and we cancelled them, because we had a concert on that night. and we were driving home from the concert and my wife said, “we’ve really got to go to the movie tonight, don’t we?” I was like, “yes,” so we veered off and went to the movies. I stayed awake, too, that was kind of scary.
Carlos: There you go. Okay, so the Star Wars universe. What are they calling it? They don’t call it the universe. What do they call it? Family?
Louis: The Galaxy.
Carlos: The Galaxy.
Kevin: It’s the expanded universe, yeah.
Carlos: There we go, okay. City or place you most want to visit?
Louis: Do you mean like places I haven’t visited before or places I–
Carlos: No, that you– here’s a ticket to anywhere you want to go, where would you want to go?
Kevin: It’s on Carlos, don’t worry.
Carlos: You can have been there, before.
Louis: Because I literally have three places I go. I go to Orlando to DisneyWorld, I go to Pigeon Forge to Dollywood, and I go to Seattle. So, I would choose to go to Orlando, first. Not consequently, I have blogs and Twitter feeds about each of those topics. Disney and Dollywood, I tweet every day on the two fun topics, and occasionally on conferences. I go to conferences and things on the SQL one.
Louis: Yeah, and I mean I went to California this year. That was good, too, but I don’t know, I’m not a big travelling guy, just because it’s a hassle. I like to go to places where it’s not a hassle, because you know, I have plenty of work to do when I’m here at home, not trying to figure out where I’m at and what’s going to happen and what’s the exchange rate and am I going to be stuck in this country forever, because I drove on the wrong side of the road or something? I don’t know.
Carlos: Man, that’s quite a bit of anxiety, there.
Louis: I’m lame, that’s all I’m saying.
Carlos: A food that reminds you of your childhood.
Louis: So, weirdly, it’s seafood. My mom and dad did not take me out to restaurants. I don’t know if it was me. It probably was, but the only place we ever went to was a seafood restaurant, like once or twice a year. Everywhere else we were eating in the car. I mean, my parents were, you know, very southern, nice people and didn’t want me to annoy the other people in the place.
Carlos: When you got to go out, it was seafood.
Louis: It was seafood, and there was this big salad buffet that I would go to town on.
Carlos: There you go, interesting. So now how did you first get started with SQL Server?
Louis: So, back in the mid- to late-90’s, I worked for a company and we had a mainframe that was humongous and I was a LAN administrator. I occasionally had to do things with the mainframe, like change tapes or something, and it was costing us $15,000 a month. Which, we were non-profit and $15,000 a month is a lot, to say the least back then, for sure. And we went to COMDEX and we saw a presentation on SQL Server 1.0, you know, this new thing that they were building. And we came back and they put it together and I went back to being a LAN administrator. The guy who was writing all the stored procedures and triggers, he quit pretty much halfway through. Got a better job, I don’t know. It happens, I think. I remember that happening once or twice in my life. And I took over, and I wrote and changed like 800– I used to put this in my resume and astound people, but it was like 8 or 900 triggers and stored procedures. And they were all terrible, because they all assumed one row at a time. You know, if it was more than one row at a time, it would crash, but I got involved there, and I just never turned back. I loved every minute of that work.
Carlos: If you could change one thing about SQL Server what would it be? And we have lots of variations now, so we won’t hold you to 2019, yet, but what would you change if you could change something?
Louis: I’d go back to the days where it was just a relational engine and life was so much easier. There was a time, I think it was, probably 2008 was when it started changing. You know, there was a time when you could know everything about SQL Server. Books online would come out and you would read it, cover to cover, and then it was webpage to webpage, and then it just– it’s insane now. If I know half of what it does, it would be amazing. And all of this new stuff with Polybase and what’s the new thing they’re doing?
Listen to Learn
00:38 Intro to the guest & team
02:20 Compañero Shout-Outs
03:52 Does normalization still play a role in today’s databases?
07:04 Know what data you want and what you’re going to do with it
09:30 Avoiding pain points with good database modeling
14:22 Think ahead, like chess, for possible future data needs
16:19 Getting stuck with a bad schema
20:13 Every pattern has value and use if you use it correctly
21:14 Tactics to employ in your modeling
23:15 Naming conventions are a good place to start
26:24 If you don’t have a standard already…
29:43 Bitwise operations? You’re doing it wrong.
31:31 Use the right tool for the right thing
32:18 Kevin gets math nerdy
35:00 SQL Family Questions
45:45 Closing Thoughts
Leave a Reply
You must belogged in to post a comment.