They’re just numbers, right? A date’s a date. It’s just string data, who cares? I admit I have posed those questions before and only much later did I understand how complicated it can get. Angela Henry, our guest for today, has heard these same phrases many times before and she chats with us to enlighten us on the issue and ways we can get around it. Perhaps you have heard about terms like SARGable and implicit conversions. This episode seeks to tie these terms to one of the very first decisions you have to make about your data — the data types.
Episode Quotes
“The biggest peril in using the incorrect data type is that you can actually get incorrect query results.”
“A number can always be converted into character data, but character data cannot always be converted to a number.”
“Some of the things that can really trip you up is if you leave it in that character data type and then you’re trying to do comparisons against that.”
“When [SQL Server] doesn’t have to do any of those conversions, when it just can do the straight COMPARE for the exact same data types, then your queries are going to be SARGable and you’re not going to take that extra performance hit by having to convert one data type or the other.”
Listen to Learn
00:40 Intro
01:39 Compañero Shout-Outs
02:50 Conference
04:46 Intro to the guest and topic
06:22 If you use the wrong data type, you’re going to get incorrect query results
11:06 Converting data types – it comes at a cost
13:59 You want your queries to be SARGable
16:52 You’ll find the information about a performance hit in your execution plan
19:02 Avoid the implicit conversion? Change the data types on the tables
22:18 Data types to avoid using when possible
25:46 Do you really need 8000 characters?
28:42 Why not change characters back into integers?
32:19 Fat tables – how to avoid them
36:04 How compression options can change data type decisions
39:45 SQL Family Questions
48:44 Closing Thoughts
About Angela Henry
Angela is a DBA/BI Developer, living in High Point, NC and loves what she does. She’s worked with all versions of SQL Server & worn all the hats that come with dealing with SQL Server throughout the years: developer, administrator, data architect and BI developer. She has her MSCE: Business Intelligence and is Data Platform MVP. She volunteers with PASS in various capacities, ranging from Program Committee to cleanup at local SQL Saturday events. She is also the chapter leader for the PASS Triad SQL User Group in Greensboro, NC. In her spare time you can probably find her in or at the pool, she’s an avid US Masters Swimmer, Coach and Instructor.
Blog: https://sqlswimmer.wordpress.com/
Twitter: @SQLSwimmer
Linked In: https://www.linkedin.com/in/angelahenrydba/
*Untranscribed Introduction*
Carlos: Compañeros, welcome to another edition of the SQL Data Partners Podcast or the SQL Trail. This is Episode 140 and it is good to have you again. We have World Cup fever on our hands and so for those around the world who are listening and tuning in, thanks for taking a little time to step away from that soccer game or that football game just for a moment to tune into this episode. We have Angela Henry with us this week. We’re exciting to have her. She is a DBA living in High Point, North Carolina, very close to where I grew up, as a matter of fact. We’re going to be talking about why data types matter and some of the issues that go around data types or some of the problems we can have. This is going to affect both our operational environments and our ETL or warehousing environments as well, so it has a cross-spectrum feel or components to it.
We want to give a couple of Shout-Outs. One to Eric Wentz for giving us some feedback on Episode 136. That was the episode with Tom Norman talking about using Visual Studio for Source Control and deployments and whatnot. It was interesting, originally Tom approached me, he’s like, “well, do you think this topic will be interesting? Do you think people will enjoy it?” I think I’ve said this before, but I have stopped trying to gauge whether I think a topic will be interesting, because there’s always interest to somebody. We don’t know what other people don’t know, and I think having a platform like this, we get to have that conversation, learn a couple of things, see different experiences, and at least in Eric’s case, he said, “well, now I have some ideas that I can take away and try to apply in my area.” I always think that’s pretty cool, and so thanks Eric, for giving some feedback, there. Another shout-out to Linghan Song and Angela Tidwell, Devon Leann Wilson for reaching out or connecting on Twitter, rather. Although, I do apologize, I’m not on Twitter quite as much as I used to be, thanks compañeros, for connecting with me, there.
I want to give a little update on SQL Trail. We continue to plug away, there, as far as getting the schedule. We know that it’s going to be October 10th through the 12th. We’re going to start at 2pm on Wednesday. SQL Trail is the event. I don’t want to call it a conference, we’ve talked about this, if you’re a long-time listener to the podcast. I don’t want to call it a conference, but it is the data platform event where we’re going to invite you, compañeros, to come and connect with others. We want it to be a very intimate environment, one where you can connect with the speakers, find your people, connect with others that might be able to help you. Whether that be in places that you might be employed at, if you wanted to make a move, just your learning, problems that you’re having, and again, this idea that you can have another friend besides just Google to reach out to when you’re in a jam. I can’t tell you how beneficial that’s been for me, and so I’d like to extend that opportunity to a few more of you. Again, we’re going to cap this at about 40 people. We still have space. We have just made arrangements at the Omni Hotel, so we do have a discount available there. Please check out sqltrail.com. But compañeros, we know that many of you don’t use your training budgets. Particularly for those who are on the East Coast, for a $1500 budget, you can come to the conference, connect with others and get that approved. The content will be great, it will be actionable. It won’t just be all the new stuff, although the MVPs that we have are working with the new stuff and if that’s something that you want to work on, it’ll be available. Even to that point, we are having the Analytics Workshop on Friday with Melissa and Meagan. That’s going to be a lot of fun, so that is pure Azure play. So yeah, we hope you’ll come join us. Check out sqltrail.com for more information.
The information for this episode will be at sqldatapartners.com/datatypes, with an “s”, all one word, datatypes, or at sqldatapartners.com/140. With that, let’s get into the conversation with Angela and Kevin.
Carlos: Angela, welcome to the program.
Angela: Well, thank you. I’m happy to be here.
Carlos: Yes, it’s nice chatting with you. Today our topic can apply to a lot of different folks, but I think it’s going to be especially pertinent for the developers out there and for folks who are moving data. You know, bringing in data from other sources, and then loading it, so the ETL, ELT, whichever you prefer there. Ultimately, we’re talking about data types and why they matter. We should throw out the caveat as well, that this conversation is specific to SQL Server and how SQL Server works with those data types or doesn’t work so well, if you will, or some of the challenges that it will have around the data types. I guess just to kick things off, big perspective, why do we care about data types? What’s the big fuss? I’m bombarding you with questions, here, but why do we have so many data types, as well?
Angela: That’s a good question, and for those that didn’t know, SQL Server actually has 35 data types, now, so we are not going to talk about all of those today. So, that’s a lot, and it’s a lot to choose from. To me, the biggest peril in using the incorrect data type is that you can actually get incorrect query results. It affects your data quality, so to me, that is the big, big thing that people really need to be aware of. Sure, it affects your storage and your performance, but who cares about your storage and performance if you get the wrong data.
Carlos: Wow. Yeah, so let’s jump into that one first, because admittedly, I thought we were going to focus on performance, but now this incorrect data. Give me an example of that. Is that when we’re working with dates and we’re trying to compare dates to each other?
Angela: Exactly. Dates and numbers. I do a lot of BI work and one of the things that I see quite a lot is when I’m on a project and I’m building a data warehouse and I have to take data from a source system and dump it into my warehouse. The most common things I see is that people, they love those character fields and they can’t figure out which date data type to use or which number type to use, so they just shove it into a character data field and say, “oh, I’ll just convert it at a later date.”
Carlos: Because they want that import to work and step one is I need the data to get into my database.
Angela: Exactly. Some of the things that can really trip you up is if you leave it in that character data type and then you’re trying to do comparisons against that. Because character data types are treated completely different than date data types, if you are comparing two dates that are stored as characters together, you may not always get the right answer. One of the common things that I like to talk about when I do present this session to a live studio audience, is in the United States, we and maybe the country of Myanmar are the only people who store dates in the month/day/year format. Everybody else almost in the entire world, probably the universe, but I haven’t met them yet, store it in the day/month/year format. One of my favorite examples is my birthday is August 8th, so that’s just 8/8/year. I’m not giving the year, I’m not giving anything away on that one. So that’s a fairly easy one but what happens if say my birthdate was August 7th? Well, we would store that as 8/7/year but if you are anywhere else, you would store that as 7/8/year. This is, again, a common example that I use, so if you were building a system and you are storing that date as a character field instead of a date field, if you have somebody from the UK entering a birthdate and yep, it’s August 7th but they put 7/8/year and then you have somebody in the US entering that same birthdate, when you try to compare those, which one is valid? How do you know which one is correct, because it’s stored as a character data type? When you go to look for anything that is August 7th, you may not get the right answer.
Carlos: Yeah, so importing, I guess, that’s a tricky question. I know that on displaying, I have seen that it does make a lot of sense to do the day dash abbreviation for month, so don’t put in an integer field for the month, actually go ahead and give the abbreviation as text, and then the year and that helps cut out a lot of that ambiguity. But you’re right, particularly if you’re not the one who imported that data, or you’re working with a system that it’s unknown to you and you have to migrate some of that data, that’s a choice you’re going to have to make, there.
Kevin: Carlos, I’m going to let you finish, but the only valid date format is the iso format of four-digit year, two-digit month, two-digit date. Everything else is just wrong.
Angela: It is wrong. Well, I actually agree with Kevin on that one, because you can convert that to a number and then life is good everywhere.
Carlos: Yeah, so I guess we talk about conversions, and that’s one of the areas that I know we want to talk about because I feel like that’s one of those low-level heats, almost, or low-hanging fruits. Particularly as we get into performance and looking at query plans and things of that nature, that getting those conversions right can sometimes be, I won’t say they’re bullet-proof. Obviously, it depends on the amount of data you have and whatnot, but they can be that easy button to help improve the performance of your queries.
Angela: Absolutely, absolutely. Let’s just keep going with this. You’ve got this data that is a date, but you’ve stored it as a character field, now you need to convert it. SQL Server specifically has some functions that are built into the engine that you can use to convert to a date data type. You think, “yay, that’s great, that’s what I need, that’s going to solve my problem.” But again, it comes with a cost; there are two costs associated with that. The first one is what happens if the data that’s stored in that character field can’t actually be converted to a date, so if somebody actually typed in the word August, the number 7, followed by a comma, followed by the year? SQL Server, the engine looks at that, the CONVERT function or even the CAST function, they look at that and they say “yeah, I don’t know what that is. I can’t do anything with that.” It’s actually going to error out, so your query won’t even work if you’re doing that. Microsoft got very nice to us because everybody complained about that for the longest time, “well, you know it won’t work, so why don’t you just give us something that tells us that it doesn’t work, but it doesn’t cause the query to actually error out.” And that’s when they gave us the TRY_CONVERT and the TRY-CAST functions that you can use so that you can see what that data is. So now, if it can’t convert that data to the data type that you’re trying to convert from, it actually gives you a NULL, which is great, because it doesn’t error out, but again, not so great because your data is wrong now. You still don’t know what that is, so you could still be getting the incorrect results for your queries. Your data quality is again, very affected, even if you’re trying to convert it.
Carlos: Yeah, there is no easy button if the data’s a mismatch. The whole premise of garbage-in, garbage-out.
Angela: Exactly. But if it can convert it, that’s when your performance is going to take a hit. There’s this concept called SARGability, so it’s the search argument. You want your queries to be SARGable. SQL Server, the engine, is not smart enough to say, “I can compare a character to a number.” It’s like, “ah, nope, I can’t do that. You have to be the same data types or I can’t compare you together. If you’re passing me a character and you want me to compare it to a number, something’s got to give here. Somebody has to be converted from one to the other.” So, of course SQL Server has a conversion precedence that you have to follow. You convert to the highest order of precedence. A number can always be converted into character data, but character data cannot always be converted to a number, because what if you have A, B, C? That’s not a number. It’s going to convert that number into character data and then it’s going to do that comparison for your query and your predicate. When it does that, it’s called an implicit data type conversion.
Carlos: That’s an interesting thought that it’s actually, I was going to say make things more difficult, so the numbers, that’s math, makes it very easy to compare numbers. Then all of a sudden you get into characters, a little bit harder, and then you’re only going to go up from there. So, as you have those different types, it’s going to require a bit more effort because of that, and again, getting into that implicit conversion idea.
Angela: Yeah. So, if possible, it will convert, but like I said, it does come at a cost and that typically comes in the form of extra CPU cycles because it has to go out there and it has to say, “oh, you’re a number, oh, you’re a character. I can’t do that, I can’t do that.” It gets a little weird on that, so it says, “okay, I can convert this number to a character,” and that takes CPU cycles, and it has to do that for every single row that’s in your query. That can be a huge, huge performance hit. If you have a small table, you’re going to say, “yeah, not that big of a deal, it only costs me a couple extra microseconds, not a big deal.” But when you have millions of records or even billions of records in a table, that is a huge, huge performance hit. That’s where this idea of SARGability comes in, that’s what it’s talking about. When it doesn’t have to do any of those conversions, when it just can do the straight COMPARE for the exact same data types, then your queries are going to be SARGable and you’re not going to take that extra performance hit by having to convert one data type or the other.
Kevin: If I’m concerned about that performance hit, where might I find information on when implicit conversion happens?
Angela: You’re going to look in your execution plan. When you send a query into SQL Server, the engine says, “you know what? I’ve got to figure out how to get this data.” I think of it as it’s building a road map, and just like when you go on a road trip in a car, you want the shortest distance to get somewhere. You don’t want to have to go off the beaten track to get there. The shortest distance between two points in a straight line, so you want to stay as close to that straight line as possible. With SQL Server in the engine, when you’re doing that and it’s trying to compare those two different data types, it can’t go in a straight line from point A to point B. It has to go up and say, “oh yep, those data types don’t match. I’ve got to put this function in here called implicit conversion and then I can keep going.” That’s a little blip that you have to take a little side road to get that and you find that in your execution plan.
Carlos: You’ll normally actually see it right there in the very beginning on that SELECT icon. It’ll give you the little yellow warning box, and if you hover over that, it will be like, “hey, you have an implicit conversion, congratulations.”
Angela: Exactly, yes. The whole point of SARGability is that there are no additional functions required to do that comparison. Now whether that’s a function that is done for you, behind the scenes, or whether that’s one that you provide directly in your query itself. When you do that, you’re still going to take that performance hit and that comes in because when we talked about SARGability, because you’re not going to be able to use any Index Seeks. You’re going to, you know, if you do have indexes, you’re not going to be able to use those Seeks. It’s going to end up doing a scan because it has to do that function for every single row.
Carlos: I’m going to cut in here for a second, compañeros, and this is a moment where, for some reason, I was thinking that if I explicitly made the conversions, then I could get away from some of the damage of this issue. And I’m not sure where that thought had come from, but it’s been something that I had been putting around in my mind. I kind of threw it out there to Angela, and she said, “no, no, no, my friend.” Through a bit more research, I’ve come to that same conclusion as well. Again, I don’t know why I had it in there, so that kind of broke up the conversation a little bit. So now, the idea or the question we were going to lead off with is, at the end of the day, what we need to do for those conversions is either to make the conversion before, so through temp table, potentially, if there’s a limited number of rows, we could do that, or we need to change the tables types.
Angela: If that’s possible, absolutely. One of the things that I do when I’m working on an ETL project, and I’ve had to pull things off from mainframes and AS/400s and I’ve had, with those types of systems, every bit counted, so they would try and shove as much stuff into a field as they could to try and save space. One of the things that I’ve run into is that they do that quite a lot and they end up storing numbers as characters and all of that kind of stuff. One of the things I quite frequently have to do is compare character fields to number fields. I have the freedom, when I’m doing my ETL process, that I typically dump all of my data into a staging area. After I do that, while I’m working on my project, I take a look and see where I’m going to have to create JOINS for my queries so that I can get everything dumped into my warehouse, and if I find a column that does not have the same data type when I’m doing a JOIN, I will actually add an additional column so that I can JOIN on that column that I’ve just added that is the correct data type, when I’m comparing it to another column.
Carlos: Okay, so that’s the work-around, then, is basically column one is of character type INTEGER, but I need it to be a character, so I’m going to create column 25, which is the last column, there. It’s going to be the same data, I’m just converting that to character.
Angela: Exactly. And I’ll do that, typically during my ETL process, so when I’m pumping that data into my staging area, that’s when I add that additional column, and I add the exact same data, but in the different data type column.
Carlos: Yeah, so then that is a very proactive approach having to deal with some of that, so then you have to make sure that you’re creating some mappings to say, “okay, now column one is now column 25” or whatever. And then, from a data lineage perspective, that’s how my data moves through the system. Interesting.
Angela: Yes.
Kevin: You mentioned earlier that there are 35 data types in SQL Server. Are there any of them that you just don’t use?
Angela: There’s a lot of them I don’t use.
Kevin: Can you give me a full list?
Angela: A full list? I avoid the XML data type, if at all possible, because trying to use the XML built-in parsing functions, whatever you want to call them. I personally call them torture. I just avoid XML if at all possible. I also avoid the smallmoney. To me, it just doesn’t have any purpose whatsoever. The Money data type can sometimes have it. That’s not on my Never Use list, but it’s on my Very Seldom Used list. Another one, text and ntext. I avoid those like the plague.
Kevin: Yeah, those were deprecated quite a while ago.
Angela: They were, but when you work on BI projects and you’re dealing with older systems, you do quite often have to deal with those data types, so that’s just past experience. Going forward, obviously I don’t use them in new stuff, but I still do have to deal with them. Up until last year, we still had an instance of SQL Server 2000 in my office, so sometimes you just have to take your lumps.
Carlos: There you go. So, beyond that conversion to varchars or nvarchars, whatever you want to do. Thoughts around, or pointers for converting text data?
Angela: Are you talking the actual text data type?
Carlos: Yeah, the text data type.
Angela: Typically, what I’ve done in the past is I’ve asked, “why are you using the text data type? Why is that there?” Typically, the answer I get is, “well, because we just weren’t sure how much data we were going to put in there.” And then I was like, “well, then sure, that makes perfect sense.”
Carlos: That’s a legitimate fear, I feel like, because when they don’t know the answer, they’re like, “well, we just want to store as much as possible, because we never want to have the situation where a user can’t put in the information that they want to.”
Angela: Exactly. That is a very common thing that happens, so, what did Microsoft do? They said, “oh, we’re going to deprecate that text data type. But now we’re going to give you varchar(max).” And everybody said, “yay!” And there was much rejoicing in the universe. And then everybody started using varchar(max) for everything, because you know, “hey, we don’t know how much data we have to store, so let’s just use varchar(max). It’s, you know, who cares?” That comes with a whole different set of problems in how SQL Server actually stores that data. Again, that affects your storage and your performance, so far as your execution plans and being able to use indexes and all of that kind of stuff. So, you’re just trading it for a different set of problems, in my mind.
Carlos: Right, so let me pose this question to you. The other downside to using the varchar, they stop at 8000 characters, right? So, you’re like, “okay, well, do I take the gamble that I’ll ever need more than 8000 characters?” Guidelines and thoughts on how to make that decision besides just, “well, let’s wait and see what happens.”
Angela: I’m not sure if everybody who is listening is familiar, but Paul White, who is sqlkiwi on twitter, has some really good White Papers that actually talks about the varchar(max) and how it’s stored and the performance implications of using varchar(max) and I would highly, highly recommend, if you want to really deep dive into that, because Paul White is one of the smartest people I know when it comes to SQL Server. He just has a plethora of information out there that I would actually suggest listeners dive into, if they do want more information on that.
Carlos: To get them jumpstarted, there, one of the basics of that architecture is that, if you think about how data is stored and you think about that 8K page, because SQL Server says, “okay, I have a varchar(max) field here” and you could be really, really big, easily bigger than 8 characters, even, or K, rather. I said characters, I meant 8K in size. “Then we’ll just store you separately.” So now, every time you have a record that has a varchar(max), you’re automatically getting two pages, if you will, because it has to store that varchar(max) there separately. And that is an example of some of the implications and then what you do with all of that and I’m sure his White Paper then jumps into all the ramifications, there.
Angela: Yeah, and that is really the biggest thing, because again, it’s a blip off of your roadmap. It’s not a straight line between those two points. It has to go out, storage-wise and do other things. It’s not a straight-forward thing.
Carlos: Yeah, it’s grabbing more stuff, and then in theory, now I’m throwing more stuff, because SQL Server reads everything from memory, now I’m throwing more stuff into memory, and now what can I no longer hold in memory because I had to throw in this extra stuff?
Angela: Exactly, and then your page life expectancy goes down and like I said, you’re trading it for a whole other different set of problems.
Carlos: Right. Okay, so we’ve talked about dates, we’ve talked about strings, and you made an interesting comparison. you said. “I have to compare a lot of my integers to characters, so I’m just going to make those conversions.” What’s the downside, or any reason why you’re not trying to change those characters back into integers? When ultimately you know that one of them is an integer, so you feel like to have a comparison, if it’s not an integer, then kick it out. I guess I’m curious to drill in on that a little bit more.
Angela: Typically, when I do that, if it is a column that I’m actually going to need in my warehouse, then I will make it the correct data type when I pump it into the warehouse. But my philosophy, when I’m working on building a warehouse and I’m building a staging area is, I like to pull the data down directly from the source, kind of unmodified, first.
Carlos: Right, so the ELT method?
Angela: Exactly. So, because I don’t want to always have to go back to that stupid AS/400, oh, did I say stupid out loud? I didn’t mean to. Because you know what, I don’t want to have to have all those IBM tools installed on my machine that take up 5G of space and all of that. And I don’t want to have to have a linked server so that I can query it, either. I don’t want to do that. I just want to be able to pull that data in as it is into my staging area, and then transform it, and then load it in. Does that make sense?
Carlos: It does, but I guess, if you know that you’re comparing integer data, I guess why go up to character? Is there a reason why you don’t stay trying to convert the other side back down to integer?
Angela: I guess I must have misspoke. A lot of times, my integers, or my numbers are stored as character data and so I can look through the data and there’s this great function out there built into SQL Server called ISNUMERIC and people love to use that. That is not always going to tell you whether something is numeric.
Kevin: It has a very broad definition of numeric.
Angela: Exactly. I have my own code that I have been using for the last, oh geez, I’m going to date myself here, almost 20 years, so I can figure out whether everything in there is a number and can be converted into an actual numeric field.
Carlos: Okay, so you’re probably running through a few cycles, then? Try this first, and then try this second?
Angela: I actually use a reg expression to do all of that comparison and it actually works really, really well. But I figured that out a long time ago that ISNUMERIC is not really the best way to go and so I came up with my own that I use.
Carlos: That is interesting. I’m curious. I have to take a peek and see if there are other, what do you call it, community approved or community options out there to help you with the ISNUMERIC function. I don’t know that I’ve been paying attention enough, but I’d be curious to see.
Kevin: Yeah, you could use TRYPARSE or TRYCAST to check to see if it’s the data type that you’re expecting. Angela’s idea of a regular expression is a great one. Wrapping that around a CLR function could be nice within a SQL Server instance.
Carlos: Yeah, if you’re down with that kind of stuff. I, myself, would not be able to write a CLR.
Kevin: I’m down with CLR.
Carlos: I’m at a disadvantage, there. the last things I want to talk about here is fat tables, which I’m curious to hear about because you said, “hey, one the ways I deal with getting around this data is add more columns to it with the correct data type.” But then, this idea of fat tables makes me think, “okay, do I have too much stuff in there?” How do you juggle that? Tell me about fat tables and why are they a problem?
Angela: Fat tables, what I consider a fat table, and I see this very frequently, we’ve already touched on the fact that SQL Server saves data in 8K pages. If the sum of the fields added up in your table are more than 8K, it’s going to take more than one page to store that data. Does that make sense? Am I saying that clearly enough? If I take all the data types and what their storage requirements are and I add them all up for a table, that’s something that you have to really pay attention to, because you don’t want one record to take up more than two pages, because your data ends up everywhere. Sometimes that’s unavoidable. You just have a really wide table. And in those cases, what I do is I typically go back and I look at my data and I say, “okay, can we make two tables out of this?”
Carlos: Make a relationship somehow.
Angela: Exactly. Is there a different relationship? Are we really looking at a single entity or are we really looking at multiple entities? If it’s a single entity, can we break it into two tables and then make a relationship between those two tables so it stores it more efficiently? When you are creating tables and when you’re doing your design work, you do have to take that into consideration. And I do take that into consideration when I am adding those additional columns, so I will look at my source system when I pull it in and see how big of a difference that’s going to make. Based on yes or no, I’ll either create an additional table and break that original table up, or if it’s going to keep me under the limit, then I will go ahead and put it in that same table.
Carlos: This is probably something that I should be paying a bit more attention to. We’ve all seen tables that have hundreds of columns in them and you’re like, “huh, that’s probably too big.” That might be an easy guess for a fat table. But is there an easy way to calculate how big your table is, a single row is?
Angela: Well, what I do, and I’m sure there’s probably an easier way to do this, but I typically take the data length of every single column. I just write a SELECT statement with the data length, then I add them all together and I get a sum and then I can see that. That’s kind of what I do. If you’ve got data in your table, and you have lots of varchars, that will kind of give you an idea of how big one record will be. If you don’t have data in your table yet, I do it the old-fashioned way with cut and paste into Excel and I add them up that way and see how wide my table ends up being.
Carlos: Okay, there you go. No, that’s good, because I think one of the challenges is that you hear some of this advice, but “okay, not that we can always have an easy button, but give me a way to make it a bit more approachable to determine some of that,” so all of those are interesting ideas. Good deal. Let’s see, Kevin, do you have any questions or other things you want to add before we wrap it up?
Kevin: Yeah, the last thing that I want to hit is to ask how compression options can possibly change data type decisions? We have row level compression and page level compression and especially with row level compression, if I have a bigint, then let’s say I have the value 2 in that bigint, row level compression is going to drop that down to the minimum number of bytes needed to store the actual value itself, the 2. In that case, is there still a benefit from me using a tinyint instead of a bigint?
Angela: If you’re using a bigint, I’m going to assume, I know that’s dangerous on my part, but I’m going to assume that you need to use it for a reason. That you have done your due diligence, you know your data domain and you know that you’re going to need up to 6 commas worth of numbers. Whereas, your tinyint, it only goes from zero to 255, so you only have 256 values that you can store in there, so my assumption is that if you are using a bigint, then you probably need to be using a bigint. But if you’re using a bigint just because it’s there, you probably need to reexamine to make sure that you really do need to be using a bigint. So as far as compression goes, I haven’t done any comparisons in any of my demos or testing to figure that out. But I can tell you in practical, real world life, page compression helps significantly with performance. A company I used to work for, when compression first came out, we used page compression and our performance went up by 40%, just by adding compression. We didn’t tweak any queries, we didn’t do anything else, just by adding that and that was back in 2008 when they added compression? I haven’t done any direct comparisons, that’s my only real-world experience in pre-compression, post-compression. But I’m going to add that, because that is actually a very interesting question.
Carlos: My thought would be that from the storage perspective, obviously you make a great point there. In the conversion perspective, that implicit conversion perspective, I wouldn’t think there would be a benefit. Our benefits are really coming mostly from the storage and utilization-type perspective.
Kevin: Yeah, from storage and from data domain definition, where if you’re saying “I know that the values will be between 0 and 200, being able to expressly identify that with a data type does have value.
Angela: Yeah, so less storage means less pages and less pages means less trips to pick up data.
Carlos: Yeah, and I think to what Kevin was saying, it helps to be able to identify that, so if you know that you’re only going to have 1 to 200, having a bigint in there. If I’m coming at it and don’t necessarily know that history, I’m thinking, “well, gosh, that number could be anything.” Whereas, understanding that no, here are the limits of this range. It provides some value, some insight there, that may or may not be useful, potentially. Well, awesome. Shall we go ahead and do SQL Family?
Angela: Sure.
Carlos: Okay. Your all-time favorite movie?
Angela: My all-time favorite movie is Stripes, with Bill Murray.
Carlos: Okay, I have to admit, I’m not sure that I’ve even seen that one.
Angela: Oh my gosh.
Kevin: The first half of that is great. I think it’s about the time when they get to the– what was it called, the vehicle that they were in?
Angela: Germany. Yeah, the souped-up RV. I can’t remember what they call it, either.
Kevin: Yeah, pretty much, yeah. That’s where the movie falls off a bit, for me.
Angela: It does fall off a bit, but it’s still, to this day, one of my all-time favorite, favorite movies. You could turn the sound off and I could speak everybody’s part. That’s a fact, Jack.
Carlos: That’s a good candidate, then, for favorite movie. A city or place you’d most like to visit.
Angela: This is tough, because it’s a whole big world out there, and I’ve been to a lot of different places. I’ve done a lot of travel. I guess I would probably have to say, I need to go to a new continent, because I’ve been to North America, I’ve been to Europe, but I’ve never been to Africa, I’ve never been to South America and I’ve never been to Australia. I’ve never been to anywhere in Asia. So, if I had to pick from any of the continents that I’ve never been to, I would probably say it’s got to be Australia. Probably anywhere in Australia, I would be perfectly happy with.
Carlos: You’re just dying for some Vegemite, huh?
Angela: I guess so.
Carlos: Okay, so speaking of food, a food that reminds you of your childhood or comfort food?
Angela: This is going to sound kind of odd. When I was a kid, we moved around a lot. My dad was going to school, so we didn’t live anywhere for very long. But one of the things that my family always did is we would always go to Kentucky Fried Chicken, and then we would take it to the park and we would have a picnic in a park with Kentucky Fried Chicken. And this is in the days before they had biscuits. It was back when they had rolls, not biscuits. They were still hot and you could put the butter and the honey on them and then the honey would get kind of crunchy on there. That is some of my fondest memories is eating KFC in just about any town that we lived in.
Carlos: There you go. Now you’ve made kindred spirits with Kevin, he being a fan of fried chicken. And my connection to KFC, and I feel like they’ve changed their recipe, but the coleslaw was always really, like that was always one of my favorite things about KFC.
Angela: Oh yeah, oh yeah.
Carlos: It doesn’t seem to be quite the same, anymore.
Angela: I found their recipe and I can make it like it used to be.
Angela: Whenever we have Eat Day at work and there’s an opportunity to take coleslaw, I always do. I always bring home an empty bowl.
Carlos: Nice. Now, tell us about when you first started working with SQL Server.
Angela: Okay, I started out a long time ago in a galaxy far, far away as a programmer doing Visual Basic and Power Builder. We were a big client server shop. That was in the pre-internet days. I was working for a consulting company and we went to work on a project in, of all places, Norman, Oklahoma. We did not have a DBA on staff, so we had to bring a third party in to be our DBA. This person actually ended up being my mentor as far as for my entire professional life. I was first exposed to SQL Server and I was like, “oh my gosh, I want to be a DBA!” I liked that so much better than doing all of the coding work that I was doing. That was my first exposure and that was with SQL Server 4.21a.
Carlos: Wow, there you go.
Angela: So, that’s a long time ago, on Windows 3.11.
Carlos: So, there have been many changes in SQL Server since then, but if you could make one change to SQL Server, what would it be?
Angela: That’s a tough one, because I’ve seen how far it’s come when it split from the sci-based code and you used to have to use devices for backups and you used to have to keep track of every time you increased a file size or added a file, otherwise you wouldn’t be able to restore it. Yeah, those were the good old days. I don’t know, I think something that really needs some love is probably within the SQL Server Agent itself. It’s still kind of cumbersome to have to drill down and see job history. You have to get through so many clicks before you can actually see that. If there were some way that fewer clicks to get that, in my mind, I think that would make me very, very happy, I think.
Carlos: That’d be a welcome improvement. What’s the best piece of career advice you’ve received?
Angela: Best piece of career advice would probably be “don’t worry about attending all the sessions when you go to a PASS Summit.” Stop and talk to people. Most IT people are not extroverts. They’re all about sitting in front of their computer and not having to talk to somebody. I am like that, to a certain degree, but then I also like to talk to people. I actually went to the original Summit back in 1999 in Chicago and I didn’t talk to anybody. I didn’t know anybody, I didn’t talk to anybody and I took a little bit of a break in my professional career and I moved away from SQL Server for a little while. And then I got back into a few years later and the community had grown so much in that time. It wasn’t until I attended one of those subsequent PASS Summits and somebody said, “yeah don’t worry about going to all the sessions, you know, you can get the recordings. Talk to people, because that’s where you’re going to get the most bang for your buck.” Whether you’re talking technology or whether you’re talking personal interests or hobbies and that to me, is the best, best career advice I’ve gotten. I’ve made life-long friendships and life-long professional friendships as well, so if I have a problem with something and I just can’t quite figure it out and I feel like I’ve been banging my head up against the wall, you know, I can hit up one of my friends that I’ve met at one of these conferences and say, “hey, you know what? Remember when we were talking about that, that one time? Well, I’ve run into that and I can’t quite figure out how to do this. What do you think?” To me, that is invaluable, to have somebody else that has that knowledge that you can bounce ideas off of.
Carlos: I agree, and in a completely not-unrelated note, you’re going to be joining us on the SQL Trail this year.
Angela: I am.
Carlos: Yes, you’re coming up and this is one of the things we might be talking about or can be on the list and so I think that’s a great segue, if you will, into what we’re trying to bring to the SQL Trail, so it’ll be nice to have you and folks can get to know you a little bit better and all of the experience that you have to offer. That will be fun. We’re looking forward to that.
Angela: Yeah, me too.
Carlos: So, our last question for you today is if you could have one superhero power what would it be and why do you want it?
Angela: Oh, that is so easy. I would want to be able to fly.
Carlos: Fly, tatada!
Angela: Fly! As a kid, I wanted to be an astronaut, desperately. I so wanted to do that and when I got into high school, I had the grades, I could have done whatever I needed to. Unfortunately, I had terrible, terrible eyesight, so there was no way I was ever going to be able to be a pilot, let alone an astronaut. So, if I could have a superhero power, that would definitely be flying, because to me, there’s nothing quite like being that high in the air, looking down and just seeing that it really is just a big, wide world out there and there’s so much. That is definitely it.
Carlos: Well, awesome. Angela, thanks so much for being on the program with us today.
Angela: Absolutely! Thank you for inviting me. This was a great time.
Carlos: Okay, compañeros, that’s going to do it for today’s episode. As always, thanks to Angela, again, for coming on the show. Again, I think this is one of those, I won’t say easy, obviously if you’re taking over an environment, but a lot of times where this can come into effect is when you’re trying to build onto something that’s already pre-built. One of the classics, I think, is the nvarchar versus varchar debate. Well, if you already have an existing structure and they’re already using one of those types, don’t shoot yourself in the foot by changing types because you think that they’re wrong. Particularly when you’re going to be joining data frequently. You’re going to be taking a hit there, and in certain cases, depending on the amount of data and CPU, again, it will vary from instance to instance, but don’t do yourself a disservice by choosing a different data type just because your data type is “better”. If you’re going to be doing those comparisons between those columns, that is something that I think you want to keep in mind as you’re building out your pieces there. So as always, compañeros, we’re interested in getting your feedback. You can reach out to us on social media. We like or enjoy hearing what it is that you think we should be talking about and we’ll see you on the SQL Trail.
Leave a Reply
You must belogged in to post a comment.