I saw Kevin Feasel’s SQLSaturday session entitled Apply Yourself and thought it was pretty funny title. While not close friends, I have known Kevin for a while and knew he had good stuff, so I attended the session. There, he shared this quote in his session.
“If you don’t understand the APPLY operator, your skills are somewhere around the 50th percentile at best.” — Adam Machanic.
I knew immediately I wanted to have him on the show. He introduced some pretty cool concepts. This episode will introduce some concepts, but you will have to do a bit of leg work if you want to start ‘Applying’ this into your code–I’m sorry, I couldn’t resist.
Show Notes
Kevin’s Session Notes
Kevin on Twitter
SQL Cruise
Carlos L Chacon: This is the “SQL Data Partners” podcast. My name is Carlos L. Chacon, your host and this is episode 13. Today we’re going to talk about the APPLY operator. Something that I probably am not using enough, I happen to be in Pittsburgh listening to Kevin Feasel’s talk about the APPLY operator. He kind of made me a believer.
When I heard it I knew that I wanted to have him on the podcast. We’re going to go over that, talk a little bit about some areas in which you might be able to apply it, how you can use it better and then of course some of the benefits.
If we get something wrong, you want to fact check us, you want to hear something else about it. You can hit us up on Twitter. I’m @carloslchacon or you can email me at [email protected].
All the show notes, there’s going to be lots of examples. We’re going to be talking a little bit in the abstract. I don’t want to say the abstract, but obviously to see some of the specifics, you’re going to want to see some of the code.
We’re not going to be able to do that over the podcast, but all of that will be available. Kevin’s actually made his scripts available online and that will be available in the show notes at sqldatapartners.com/podcast.
Again, if you like what you hear we invite you to give us a rating. We’d love to get some feedback. If there’s topics for future episode, we’d like to hear those as well.
As always, Companeros, welcome to the show.
Children: SQL Data Partners.[music]
Carlos: Today, we have Kevin Feasel with us. Kevin, are you still going by database administrator?
Kevin Feasel: I am officially a database engineer.
Carlos: Engineer. I knew you had introduced yourself as something else.
Kevin: Yes, I tend to blur the lines a little bit. Basically, this is the first company that I worked for where there really is a difference between a database engineer and a database administrator.Historically, I’ve loved the hybrid roles where I’m doing some administration backups, restorations, I’m working with snapshots, I’m working with the administrative side, but also writing code and optimizing tables, optimizing queries, that kind of fun stuff on the development side.
Now, I work for a company in Durham, North Carolina called ChannelAdvisor, where we actually have a split. We have a whole administrative team. I’m on the database engineering team, it’s all development work.
The negative side is I don’t get to play with the production. The positive side is I don’t get calls at 3:00 AM, so it’s a fair trade-off.
Carlos: [laughs] Now, being in Durham, does that make you a Duke fan or…?
Kevin: No, I…
Carlos: Did they make you swear allegiance on your…?[laughter]
Kevin: No, although I did love Christian Laettner’s shot. I’m sorry Kentucky fans, you all hate me right now.
Carlos: [laughs] You had to bring that up. OK, Carolina fans either, don’t leave us just yet. I’m sure Kevin has some very nice things to say about Chapel Hill. [laughs]Today we want to talk about the APPLY operator. Now, the APPLY operator, fair warning, it’s not a basic topic. There’s just a couple of things, some vernacular terms we’re going to throw around, that you should be familiar with. If these are completely foreign to you, then the APPLY operator conversation may not be suited for you just yet.
The first one, of course, they’re very commonly…a lot of the examples that you used, Kevin, were in dynamic management functions, dynamic management views. These are the views that Microsoft has made available for administrators, to be able to see under the hood, what’s going on. The other, table-value functions. We create functions that will return logic to us.
Kevin: A results set.
Carlos: A results set to us. We also talk a little bit about common table expressions, which, a good definition…Jeff Johnson would kill me. I’m going to say, a subquery that starts with a whiff. [laughs]
Kevin: That’s a way of doing it. I like to think of it as pulling the subquery out to the top of the full query.
Carlos: There you go. Then, I’m just going to say, subqueries. Where you’re joining to another select statement. Subqueries. Each of these are some scenarios that we use, or talk about, or differentiate, a little bit, with the APPLY operator. Hopefully, that’s at least somewhat familiar to you.We’re talking about the APPLY operator. In fact, the name of your session was “The APPLY operator.”
Kevin: “Apply Yourself,” yes.
Carlos: Oh, “Apply Yourself.”
Kevin: I had to sneak in the pun somewhere.
Carlos: [laughs] When we talk about the APPLY operator, we’re ultimately talking about two keywords that we can use in our queries. The first is the OUTER APPLY, which you mentioned was much like a LEFT OUTER JOIN, and then the CROSS APPLY.
Kevin: Which is semantically similar to, but not equivalent to, an INNER JOIN.
Carlos: There you go. Ultimately, to review why we would want to be using APPLY operators. Why even approach the subject?
Kevin: Sure. I’ll start by saying that the APPLY operator’s been around since SQL Server 2005. The reason that it was introduced was to allow us to join two functions. Functions had been around in SQL server since at least 2000. I believe they were introduced in 2000 but I’ve never used a version before 2000.
Carlos: We won’t hold that against you.[laughter]
Kevin: I saw a SQL server 7 version in an environment I worked in once but I didn’t have to touch it, so that was good.
Carlos: Lucky you.
Kevin: Yeah. The APPLY operator was introduced in SQL server 2005. The reason that it was introduced was to give us the ability to take a data set and iterate over the data set and perform some type of function. The simplest used case was dynamic management functions.A great example I like to give is we have a dynamic management view called sys.dm_exec_connections. That shows all of your open connections. All of the sessions that are currently open on your instance. That dynamic management view gives us some interesting details, like when the session started, how many reads, how many writes.
It also gives us this thing called the most recent SQL plan handle. Which is a binary blob that acts like a pointer. The binary blob itself isn’t very useful to us, but we can then take that pointer and pass it in as the parameter to a dynamic management function. A DMF called dm exec SQL text.
That will return to us the actual query being run, or that has recently been run whose pointer is that pointer we passed in as the parameter. We can do that for an entire result set. If we have 100 active sessions, we can see all the queries that are being run at present against our instance.
Carlos: Super helpful if we’re looking at execution plan details and the meta data, if you will, about those planes.
Kevin: There is another dynamic management function which will show you the execution plans themselves. Yes, absolutely.
Carlos: I guess the classic scenario where you would use an APPLY operator is in the connecting of those two for the view of the function?
Kevin: Yes. Or a table and a function. Generally, a dynamic management function is involved in there somewhere. You can also use user defined, custom, table valued functions. I found that the performance ramifications of using them, even just very simple table valued functions, there’s a significant performance overhead to using a function as opposed to in lining that SQL code.
Carlos: Some of the reasons why we would want to use an APPLY operator. In what scenarios would we want to do that? One of those we just talked about, right? Joining table valued functions. With an inner join we can’t pass a parameter but with a CROSS APPLY we have the option to do that.
Kevin: Exactly. If you’re back in SQL 2000 world, you probably learned at some point that you could do an inner join to a function and as long as you have all your parameters have defaults, you could use an inner join. But as soon as you needed to pass in a parameter, inner join won’t work anymore. It’ll give you an error. That’s where the APPLY operator starts to become important.
Carlos: Another scenario is in pre-aggregated data?
Kevin: Yes. The example that I give actually comes from a Jason Straight blog post. I’m glad that he put that out. It was part of a T-SQL Tuesday on the APPLY operator. The link is in my code demos because I wanted to give him credit for it.The general idea is I want to figure out, say from the AdventureWorks data set because everybody’s got AdventureWorks, how many times has a water bottle been purchased? There’s an easy way to write this query where you can select from the sales detail table. I can join a couple of other different tables to get some important information, but then I need to get my count.
My count of records in these sales order detail table, because that will tell me how many times somebody has bought a water bottle, but if I want to get any other detail like the product sub-category name, or the product name or even if you’re looking at say the month of purchase, I now have to group the resource set.
Group by class, it’s fairly standard, very straight forward, it’s got my known aggregated attributes, and that works. That works fine. There is no problem with that. However, there is a chance that we’re not writing the query in a way that will allow it to perform most optimally. In this scenario, the query returns back something on the order of 187,520 rows. I wish I didn’t remember these things.
[laughter]
Kevin: I’ve blocked out all the important things from my mind, but I know these numbers exactly. I’ll show an execution plan that sees, oh, I have a nested loop join of one row, because I only have one product, my water bottle, but it shows up in the sales order detail table a 187,520 times, so my nested loop join basically on the left hand side.I’m iterating through once, but on the right hand side I’m burning through at a 187,000 times, then I have a 187,520 rows going through the stream to a stream aggregate which smashes all that back down to one row, and returns that one row to the user.
By contrast, if we can find a way aggregating those 187,520 rows earlier on, we can join one row to one row, because I know that my water bottle. I’ve got one water bottle, and when I aggregate that product up, I should only have one row.
Join one row to one row, and I can end up getting better performance because it’s less effort of the CPU, it’s not less IO, we still need to hit those same rows, we still need to pull out that same information, but there is less for the CPU to do.
Carlos: Yes. That’s [inaudible 13:31] interesting, so we have basically the performance gain by using the operator and ultimately you talked about the optimizer being able to…that’s another term, right?The “optimizer”, that thing that helps translate your query into what the database is going to do, so the optimizer will see that and be able to make some logic assumptions about what your query is going to do in order to give that back.
Kevin: Exactly. It can understand that, “Oh, I know there is only one row up here, and I know I can aggregate this information earlier on, so I join one row to one row,” so it’s exactly what’s happening. You can use the Apply Operator to do this, you can also use other techniques.You could take a sub-query, and inside the sub-query you’re doing a count operation in there, and the optimizer can see that and say, “Oh, wait. That looks like I should pre-aggregate the data here, and then join one row to one row.”
Ideally, the optimizer will take any logically equivalent version of this query including the first one that I talked about, and say, “You know if I pre-aggregate this data, I’m going to get better performance.”
Unfortunately it’s not always perfect, it has to find a solution in a very quick amount of time, and it finds one that’s good, and says, “OK, this one will work,” because we get the same results back and in many cases it is still going to perform well enough, but sometimes you need to [inaudible 15:11] out that extra bit of power.
Carlos: There you go. Using the other APPLY operator will give it the hint that it needs to help, just get a little help if you will.
Kevin: Yes, exactly.
Carlos: Now, interestingly enough for it so any time we talk about performance, whether it’s data professionals or, “Hey, you know we want something that will help perform much faster.” I guess we need a throw up of a flag of caution that the APPLY operator in this case is not necessarily always going to be the fastest scenario.
Kevin: Absolutely not. Basically, the APPLY operator is going to try to force you to do a nested loop join. Not a hundred percent of the time, but 99 percent of the time. Because…
Carlos: You can almost kind of expect that.
Kevin: Yes, because you go back to what the APPLY operator does. For each record on the left hand side on your table, we’re going to take row and perform a function. We are going to perform some type of operation and get some outputs.That lands itself very nicely to a nested loop join, where with a nested loop, we’re taking each row and then we’re looping through the right hand side, it’s just that normally we’re going to have one record come back from the right hand side, or relatively few records come back from the right hand side.
That can be beneficial when you’re looking at say I want to get a relatively small percentage of a very large table. By contrast, the other join at operators, a hash match or a nested merge join, they may involve pulling all of the table into memory, and when all they need a tiny percentage of that table, they’re generally not going to perform as well.
Carlos: A little bit more overhead to get what you need, and so you actually mentioned an interesting statistic one that I was not familiar with, and so your discussion was about 20 percent or more of the records returned, so skinning that table. Basically bringing in a lot more of that table, you’re seeing results that are a little bit faster there. It’s just like the merge join or even the hash join, would can get you there.
Kevin: Yes. The [inaudible 17:38] that I’ve seen is that, if you have less than half percent of your table being retrieved, you generally want to use an index seek operation. If you have more than 20 percent of the table being retrieved, you want to use an index scan.If you’re somewhere between, either one could possibly work, it really depends on your data size, there is no hard cut off point for, “I need a scan here or a scan there,” but half percent, 20 percent, those are pretty good rules of thumb.
Carlos: This is, if we’re just stepping back even for a moment, this is always one of those where trying different ways about peeling that banana, if you will, or skinning that rabbit, you may have to try a couple of different ways because data, your situation will be a little bit different, your mileage may vary, and so you’ll need to understand that, play around with it, and use some of…show statistics and show your time options, so that you can actually see what the queries are doing and…
Kevin: Yes. You definitely want to check execution plans, you want to see what those statistics look like, because even you’re using the APPLY operator, you might end up getting a larger number of IO operations, because I’m using a nested loop join, so for a table with say generally just a few million rows, it will be about 3 read operations to grab a row, if you’re using an index seek.It’ll take three read operations to get to the information that I need from that row. If I’m pulling back 20,000 rows, I’m looking at upwards of 60,000 read operations, may be fewer because if data is clustered together, it can do a brain scan, but that’s not something that I want to get into too much here.
If we’re looking at the 60,000 reads, a single scan of a section of a table probably is going to be fewer than 60,000 operations. However, it may take longer. It may take up more memory. It may push out pages that other queries need. It may make your other queries slower.
Carlos: I guess, as always, we can take this only an isolation. Then you look at the overall system and go from there.
Kevin: Exactly.
Carlos: Another scenario, very quickly let’s talk about the example that you used — I want to see the last five orders or some information about the last five orders from a customer that they APPLY operators as an option to use here as well.
Kevin: Yes, this is where the operator really starts to shine. If I want to get the top “N” records from some table, I can very easily write a CROSS APPLY or an OUTER APPLY and inside that APPLY operation I can put in a derived function or an ad hoc function. I am going to use the terms interchangeably.Now inside the derived function, I will have my operation to grab the top rows. Let’s say that I have a set of customers, and I want to get their latest orders. Customers are on a customer table, orders are in an order table. I want to get the last five orders for each active customer.
My query is going to look at from customers, and then I can CROSS APPLY. Inside the CROSS APPLY, I write my derived function that says, “Select top five” and whatever order information I need from orders where customer ID is equal to the customer table’s customer ID.
That will allow me to very quickly write a query. If I only have three orders for a customer, I’ll grab the three. If I only have two orders, I’ll grab the two. If I have seven orders, I’ll grab the five latest. It’s a nice easy way, very easy to understand the query, very simple syntax. That will give me back exactly the rows that I need.
By contrast, if I tried to grab them all together, I can’t use the top operator because the top five of customers joint or orders that’s only going to give me five orders total.
Carlos: Right.
Kevin: Alternatively, I could use something like a common table expression or a subquery and inside there grab records for each customer using something called a window function. Do you want to get into window functions here?
Carlos: No, we will just say basically that’s a way of ordering that. That sounds like maybe another option not to talk about…
Kevin: That’s right.
Carlos: [laughs] …another topic to talk about at another time.
Kevin: Another day, yes. We can grab the five latest orders using some other method. Syntactically speaking, code, beauty speaking, I actually prefer the APPLY operator just a little bit, because I think it’s a little easier to read.
Carlos: Your biases are coming out a little bit there Kevin [laughs] .
Kevin: Yeah.
Carlos: Speaking of readability that leads us into one of the other sections that you suggest there. That is on simplifying calculations. Particularly the idea, some of the codes that you’d shown again, the code examples will be sqldatapartners.com/podcast.The link will be there to show you some of these examples with some information about comparing and contrasting the different ways. One of those that made that process a little simpler to read was in calculations section.
Kevin: I’ve worked for companies which have not always had the best data design, have not always had the clearest written code. It happens. I’ve written some horrible looking code.
Carlos: Yeah, that’s right.
Kevin: At one point I was doing work on Medicaid data. I had subqueries inside subqueries inside subqueries inside subqueries to perform the calculations. My select clause was three quarters of a page long. I was really proud of myself, because look at all this code that I’d written. Then they told me to change something at it. I said, “Oh, boy! That’s not good.”
Carlos: Reusability factor went down a little bit.
Kevin: Yes, exactly and part of the reason in my select clause was so big was I would have little sections of code that were repeated. I’m building a mathematical calculation. I need to take A plus B and at that becomes something, and then I need to take that and multiplied by C and then I need to do divided by D and then I need to perform some other operations on this stuff.I need to take that quantity into two separate things to give me two different metrics and some copying, and pasting all that code over and over and over. One of the fantastic uses of the APPLY operator — my favorite use of it — is that you can really simplify calculations. A derived function does not need a “from clause.”
Let’s say that I need to perform some operation. I need to take A plus B. In a CROSS APPLY, I can say select A plus B as C. Now I can query this newly derived attribute and it is now called C, so I can select C. Let’s say that I need to multiply C times D and divide that quantify by E. I can now have another cross supply operation that says, “OK, select C times D divided by E as F.”
I can keep going on and chaining things. Now I’ll show in my select query instead of some really complex code, it will be more like here’s A here’s B, C, D, E, F. Really simple select statement, really easy to maintain and even easy to follow through. It’s sort of like showing your work when you’re doing high school math. It’s algebra at that point.
Carlos: I think as you went through what helped me, you think about that a little bit more as if I was going to be doing those steppingstones. I think traditionally I’d be doing temporary tables. A plus B, stick it on a table. Then do some other operations. Stick that to another table. You can have these groups or sections that you can logically step through.The APPLY operator, the CROSS APPLY operator gives you those sections to look at as well and you can say, “Oh, OK, here’s kind of what I’m doing here, here’s what I’m doing here, here’s what I’m doing there” — visual just makes it a lot easier.
Kevin: Yeah. Here’s the beautiful part. I’m glad that you mentioned temporary tables because with the temporary tables, I’m writing out to tempDB. I am writing at least a memory, probably to disc. Then I’m reading from disc. I’m writing disc. I’m reading from disc.While with the APPLY operation, you’re not writing to disc, you’re not creating new tables, you’re not creating new structures in memory. What happens is there’s a little scalar calculation that you can see in the execution plan, which if you’re already using some function, like a sum or counter something or you’re using a left or right or substring or any function like that, “Hey you’re already seeing that compute scalar.”
There is no real overhead to using the APPLY operator to simplify your calculations in that way. It’s all done in line. The Optimizer takes that and says, “Oh, this query that you’ve written that is human readable is exactly the same as this ugly looking query when you get down to it, so I’m going to give you the same execution plan.”
Carlos: There’s a couple of options. There are a few more that are around there. We’ll put those in show notes and make those available for you to take a peek. Of course, they’re in your coding options as at Companeros. I hope you’ll check those out. Kevin, I think it’s been a great discussion about the APPLY operator.I know I learned some things and I think those listening will as well, to use Kevin’s pun to APPLY yourself a little bit better. Before we let you go we do have a couple of other questions that we want to ask you.
Kevin: Sure.
Carlos: Ultimately one of the things we want to do is create value. We want to talk about some things, help folks be better about in their positions. We also like to talk about what tools you’re using. Tools that help us to perform a little bit faster. What’s your favorite SQL tool? There could be a paid tool, free tool, but why you like it and how you use it?
Kevin: I’ve two of them, but I’m sure there’s somebody has already talked about “SQL Centric Plan Explore” before.
Carlos: It’s come up once or twice.
Kevin: Yeah. I’ll let that one slide. The other tool that I really enjoy is by a company called “Devart.” It’s called “dbForge SQL complete.” The reason that I like this tool, I actually ended up paying for it out of my own pocket. The reason that I like the tool is because it will auto-format your code.Now there are a number of other tools that will do auto-formatting of code, but my current employer, we have some very strict coding standards. We’ve some very specific coding guidelines and I would try out the other tools. They would fail for various reasons.
Sometimes, for example, one of the things that we have is that all store procedure parameters must be fully left aligned, but parameters in other circumstances should be indented and some of the tools would say, they’ve all got to be left aligned. Some of them will say they’ve all got to be indented.
This one actually allows me to get down to the level of detail that allows me to have the output look pretty much exactly what I want. It’s not perfect but it gets 95 percent of the way. More importantly, when we’re building up Dynamic SQL, I’m looking at the outputs and it can be just a mess of gibberish.
All I do control K, control D and now it’s at least human readable or closer to human readable, depending on who wrote the code.
Carlos: [laughs] Those formatting options can be very nice, very handy. We have one more question, but before we ask you that we want to talk a little bit about another opportunities that folks have to learn about SQL Server.If you’d been listening to the show, you know we talk about SQL Cruise — sqlcruise.com — you take the best of the SQL Server training. You put that on a cruise ship and you’ve got SQL Cruise done by Tim and Amy Ford. They put together a great training experience. You have four or five top-notch instructors there on the boat. Now, Kevin are you a cruiser?
Kevin: I have not. I need to do it at some time.
Carlos: There you go. I have an offer there for you and for anyone else who wants to go, Tim’s offered the listeners of the podcast, a $100 off if you want to go. If you go to sqldatapartners.com/sqlcruise, you can follow the instructions there about how to get your $100 off and Tim would be very excited to have you on the cruise.Who knows, maybe I’ll see you on board there. Kevin again, thanks for all this good information. We do have one last question for you and that is — if you could have one super hero power, what would it be and why would you want it?
Kevin: I would want “phase walking.”
Carlos: Phase walking? We’ll have to explain that to me.
Kevin: Yes, it’s OK. My favorite X-men character was Kitty Pryde, “Shadowcat.” She had the ability to walk through objects to be able to go up and down, left and right, walk through walls, go through floors. I’ve always loved that concept. I think that would definitely be my power.
Carlos: Why would you want that power?
Kevin: First of all, I need to go over to that other room. I’m lazy. [laughs] I don’t want to go to the door. I’ll just walk right through it.
Carlos: OK, very good.
Kevin: Second, I’m driving. Someone’s going to hit me. I hope that doesn’t happen but, if I could phase walk, if I can get my car to be able to just go through something, I don’t have to call the insurance companies.
Carlos: It can make traffic a much more easier to bear.
Kevin: Lots scarier for everyone else, but yeah [laughs] .
Carlos: Very good. We’re Companeros. I hope you appreciated the conversation, got some good value out of it. Again, if there’s something that we said, that you’ve questions about, you can hit us up on Twitter @CarlosLChacon. Kevin you’re also on Twitter?
Kevin: I am, it is @feaselkl.
Carlos: We do appreciate you Kevin for you taking your time to be with us.
Kevin: Glad to be on here.
Carlos: Companeros, I will see you on the SQL Trail.[music]
Children: SQL Data Partners.