DAX is the acronym for Data Analysis Expressions, and it is the formula language used throughout Power BI. In this episode we talk about DAX concepts and how they are simple and straightforward, but can be confusing for those coming from a relational database background. DAX uses some unique programming concepts and patterns which can make it difficult to fully use and understand. We talk about some experiences getting started with this language and ideas on how you can get started.
“There’s a couple different aha moments. I think two of the biggest ones is one, DAX and the VertiPaq engine [behind it] is a columnar storage engine, not a row storage engine, and then the other piece is these evaluation contexts that basically affect what your expression can see.”
“I really recommend buying a book or watching my course, because if you try to learn DAX just adhoc, like you might SQL, you’re just going to keep banging your head against walls as you go along, so it’s simple, but not easy.”
“There’s two ways of encapsulating business logic in DAX and they look really, really similar, but they’re very different in how they function.”
“If you can understand measures and you can understand evaluation contexts, that’s going to take you most of the way.”
Listen to Learn
00:38 Intro to the team & topic
01:38 DAX is simple but not easy – don’t try to figure it out without help
04:39 Two ways of encapsulating business logic in DAX
07:36 Are we talking language or architecture?
08:32 Not using the formula bar is a pitfall for Power Query, what about DAX?
11:01 Some of the dividing lines in when and how to use DAX
14:31 DAX is a columnar engine and what comes because of that
17:37 A good approach to DAX if you’re coming from a SQL background
20:27 Who is going to be interested in starting to learn and use DAX?
21:56 What DAX can do for the database administrator, specifically
24:14 Closing Thoughts
Carlos: Compañeros! This is the SQL Data Partners Podcast. I am Carlos L Chacon, your host, and this is Episode 164. I am joined today by Eugene Meidinger.
Carlos: And Kevin Feasel.
Carlos: Taking some time out of his calendar before he heads off to the MVP Summit, so thanks, Kevin.
Kevin: Yeah, they might even let me in the doors, this year.
Carlos: Well, that’s always good. Okay, so today’s episode is we’re going to dispense with some of the openings this episode, compañeros, and we’re going to jump into it. Our topic is DAX. I know in our previous episode, I believe it was 162, we talked about Power Query vs DAX, and so now we’re going to focus in a little bit more on DAX, what it is and some thoughts that Eugene has about it. Our show notes for today’s page is going to be at sqldatapartners.com/powerbi or at sqldatapartners.com/164 for the episode notes. Yeah, so we’ve had this conversation, we’ve talked a little bit about what it is. Yeah, so I guess how are we going to start today? Do we need to go back over it again?
Eugene: No, we can kind of talk about what DAX is good for and what makes it different. I’ll say that the infamous Italians, Marco Russo and Alberto Ferrari, if I’m remembering how they described it correctly, they say that DAX is simple but not easy, and I think that’s an accurate description. Whereas, it’s a very concise, terse sort of formula language, but actually understanding it the first time is very difficult.
Carlos: So does that mean that you have like an aha moment and then all of a sudden the whole language just opens up to you?
Eugene: A little bit, yeah. I’ve got a course on this on Pluralsight and I’ve given some presentations and I draw out this learning curve, and so the first couple pieces are deceptively simple. So you do stuff with like basic scalar functions like concatenate or adding a date or something like that.
Carlos: Sure, it’s just like working in Excel almost, right?
Eugene: Exactly. Or you do something with calculated columns, which is kind of like– think of it like a persisted computed column in SQL. You’re literally extending the table with a formula, but it’s actually taking up space. So when you’re working with those things, it’s pretty intuitive. If you’ve worked with Excel, you think, “okay, this makes sense. I’ve got this,” and then the learning curve just shoots way up, and you run into this wall once you start dealing with measures and manipulating filters. So yeah, there’s a couple different aha moments. I think two of the biggest ones is one, DAX and the engine behind it, the VertiPaq engine is a columnar storage engine, not a row storage engine, which has dramatic effects on how you work with things and how you reference the data. And then the other piece is these evaluation contexts that basically affect what your expression can see, so to speak. And most of the bread and butter when you’re working with DAX is manipulating those evaluation contexts to say, “oh no, no, no, I know that you said that we’re looking at 2019 data, but I want to do a baseline analysis, and so I want to be able to undo that filter to get all of the sales for forever, and compare 2019 to that average,” or something like that. So those are two of the big aha moments, and I really recommend buying a book or watching my course, because if you try to learn DAX just adhoc, like you might SQL, you’re just going to keep banging your head against walls as you go along, so it’s simple, but not easy.
Kevin: Yeah, you should probably get Eugene’s Pluralsight course instead of buying a book, because books are for nerds.
Eugene: It’s true.
Carlos: Wait a second, here.
Eugene: Dead truth.
Carlos: As I look at the stack of books on my desk.
Eugene: Well, I wouldn’t want to be associated with anyone writing a book.
Kevin: You have a stack of books, but not a stack of Pluralsight courses, so obviously you should see Eugene’s Pluralsight course.
Carlos: There you go.
Eugene: Right, yeah, support the Eugene Meidinger fund. My wife likes it whenever the mortgage gets paid.
Carlos: Okay, so do you want to take us through a specific example?
Eugene: Yeah, sure. So there’s two ways of encapsulating business logic in DAX and they look really, really similar, but they’re very different in how they function, so that’s one of the first things that you run into that’s confusing. So like I said, there’s calculated columns, which are really simple, but honestly, you should use them as sparingly as possible, because if you’re adding columns or doing basic kind of data massaging, ideally that should be happening in Power Query. Especially because calculated columns are compressed last and so you can get worse compression, worse performance, potentially. Whereas if you’re doing it in Power Query, then it’s just part of the table. But either way, a calculated column, you’re extending a table. A simple example, let’s say that you wanted to say “okay, what’s my extended price for product? What’s my line item total?” Well, what you’re just going to do is you’re going to take quantity and multiply it by price, and so the question is, do you want to store that in your original data or do you want to calculate that later? That’s a good example where you might have a calculated column or something like that. So that’s one way of encoding business logic and a lot of times business logic isn’t as clean cut. Like you go to accounting and say, “okay, I need to know my gross sales.” And they say, “well, you just multiply quantity by price. Well, unless there was a discount. Well, unless the moon’s waxing gibbous on the third Thursday in Africa, and then we have to take into account BAT tax.” Right?
Eugene: And so, the other way that you encapsulate business logic, the whole point of DAX is to encapsulate business logic so someone can just drag gross sales and they don’t have to think about it is using measures, and measures are a very different animal, because they’re looking at all of the data at once. And so, whereas a calculated column is literally attached to a table, it literally extends the table and it takes up more space, a measure is computed at run-time, based on whatever filters have been applied by the user and by your DAX code, and so they can live anywhere. So you can have a sales table, and you want to say just something simple like total sales, whatever, normally you think, “okay, I’m going to put that measure on the sales table.” But something that we’re doing right now with a customer is we’re using these measure groups or measure tables where we make a dummy table that has one column and no rows of data, and then we just add a bunch of measures to it. And even though the columns that those measures are referencing are on a different table, it doesn’t matter, because taking they’re taking this bird’s eye view of the data all at once. So yeah, the first thing is understanding, “okay calculated columns look intuitive, I’m used to them from Excel, but I really want to start working with measures and thinking about the entire set of data.” It’s kind of like when you first start learning SQL, having to learn to think in set-based kind of theory as opposed to like building out cursors or something like that, if that makes sense. So that’s kind of the first aha sort of thing.
Carlos: Sure, what’s interesting is that, I mean we talk about it as a language, but then you’re also applying, well, I feel like architecture type questions to, “okay, well, how do I want to put this data together?” And then you know, and I guess you DAX it, right?
Eugene: Yeah, so something that I, when I’m talking about Power Query or DAX or something like that, I think of it like a scalpel or a corkscrew. It’s a domain-specific language for a very specific purpose. And the thing is, if you have a scalpel, you can open a bottle of wine with it, right? That’s not what it’s made for, and I would say like half the pain of working with DAX isn’t from the language itself, it’s from mis-architecture. It’s from not understanding how you’re supposed to be using that corkscrew and you’re trying to butter a piece of toast with a corkscrew and it just doesn’t quite work out. But that’s a valid point that it’s hard to talk about the language without also talking about the shape that it wants the data to be in.
Carlos: Right, interesting. I’m curious, in Power Query, one of the pitfalls that I was learning about in our user group was folks not using the formula bar or activating that. Does that work the same with DAX?
Eugene: It’s very different. Power Query is very much designed for Excel users to just ramp their way on. It started as an Excel add-in, although DAX did as well. But Power Query’s super heavy on the GUI, and then you can dive in and look at the code underneath. With DAX, there’s no graphical way to build things out. Now, with Power BI, there’s Quick Measures, where you can say like, did this with a customer recently, “okay, we want a list of all the territories that are referenced in these sales accounts” or something like that. And you can use the Quick Measures tool to say, “okay, give me this formula that someone else has already built out.” So there’s a little bit of that, but if you’re using DAX, you’re almost certainly going to be writing code, and it’s designed for people who are comfortable with Excel. I was last night talking about it at user group and one of the things is you’re going to have to unlearn things either way, but if you’re starting from Excel, you have to unlearn fewer things. The most difficult path to DAX, and the Italians say this as well, is if you’re coming from SQL. Because you’ve got more mental shifts and more paradigm shifts and it’s just harder. So if you’re coming from the Excel world and you’re used to writing formulas in there, DAX isn’t a huge leap.
Carlos: So the bricks are stacked against us, compañeros, it sounds like.
Kevin: But you can do it. You can do it.
Carlos: You can do it, yeah.
Carlos: You can do it. You want to get in here, Kevin? Your thoughts on getting started?
Kevin: Yes. Yes, I would like to get– so, I do agree with Eugene that it is a mind-shift. I think it’s a little bit easier if you’ve ever tried to learn MDX and failed. Don’t worry, you’re definitely in a large group of people. There’s a lot of people who have tried to learn MDX and failed, including me, repeatedly. DAX has a little bit less of an initial learning curve than MDX, so it’s got that going for it. I have to admit that I’m not that proficient with DAX. I’ll pretend to be, sometimes, but when it comes down to it, yeah, there’s a lot that you can do that, in ways to optimize DAX that I’m not that personally familiar with.
Carlos: Sure. I’m curious, so we know that Eugene’s all in, here. But I guess where’s the dividing line sometimes here? I say that because ultimately, Power BI was designed or the thought was, like we said, it’s kind of coming from Excel, so it’s quote, unquote self-serviced BI. A lot of the folks listening to our program, have come from the ETL, database architecture or they’ve been programming, and so how do you define the dividing line between, “okay, well, I’m going to create dimensions and measures in a table,” versus, “hey, I think I’m going to put this in Power BI and apply DAX.” Does that make sense?
Eugene: Yeah, that makes sense. So, for a lot of stuff, you can get by just fine with implicit measures. So you take sales and you just drag it into a chart and you tell it, “okay, I want a sum instead of average.” So for a lot of stuff, you can get really, really far with that. Some places where you may need to start looking at DAX and where it’s simple and where it’s difficult, I think whenever you’re trying to have any kind of more complex logic beyond the grain of whatever your fact table is, you’re going to need to start looking at DAX. So if you’re able to say, “okay, I just want to sum up total sales or total quantity” or something like that and “I can have all of the calculations at say my invoice level or my invoice line-item level,” yeah, you’re fine. But the moment that you need to start thinking at a higher grain, so maybe at like customer or product family or something like that and you need to start taking into account some of those things, that’s one area where you really need to consider DAX. Another area is the moment you want to start using time-based logic. That’s really, really common, and thankfully they make it pretty easy if you’ve got a normal fiscal calendar. I say that because I’m dealing with a customer where they’re on a 455 kind of schedule, but not a normal one, and so the number of invoicing days in each period changes year by year. But let’s say you’re on a normal calendar, you’re dealing with regular months or something like that, it’s pretty easy to get going. So any time you want to deal with time-based kind of logic, you’re probably going to be looking at DAX. And then any time you’re doing way more complicated filter kind of stuff, or you want to mess around with ‘what if’ parameters. So this same customer, something that we’re doing is, if we just want to have measures for month-to-date, quarter-to-date, year-to-date, and they had a normal calendar, it would be super easy to do. You don’t need a lot of DAX knowledge to do that. But one, they have a goofy calendar, but two, they want a dropdown that says, “okay, I want to change this measure to be month-to-date, quarter-to-date or year-to-date, depending on what I select in the dropdown.” Now you need to start reading a book to be able to do that. Now you really have to start messing around with the logic. So, if it’s just something where it’s basic aggregations, and you can do it at the grain of whatever your fact table is, you don’t need DAX. If you want to start messing around with filters, especially for time-based logic, you need to learn a little DAX, but it’s not too difficult. If you want to start doing stuff with parameters or custom date tables or some part-to-whole comparisons, like “I want to know how well my costs out of transportations, same customer, what is that of total sales? Two percent of our total sales is going out the door in shipping costs.” Now you need to start learning some slightly more complicated DAX to accomplish that. So those are some of the dividing lines in my mind.
Eugene: So, one other aha moment that I want to talk about it is I mentioned the fact that it’s a columnar engine. And so, there’s some things that come from that. One is that DAX is really, really good at filtering and aggregations, because it stores the data as columns. So one of the things that’s common in analytical databases or analytical reporting is that you need all of the data for a column. So let’s say we’re looking at sales. Well, I want to be able to see trending of every year, what was my total sales for say, Latin America, or something like that? So I filter on Latin America and then I’m aggregating the entire column. And the problem with something like SQL Server, it’s a row-store database that’s storing everything in these 8KB pages. And so if I just want to sum all of sales for Latin America, unless I have a secondary index on region and sales, if I don’t have that index, I have to read the entire table, and if I’ve got 10 million invoices, I’ve read everything. So that kind of analysis stuff is not performant; that’s why they added column-store indexes and that sort of thing. With DAX, you’re storing everything in these columns and a lot of analyses is just dealing with a handful of columns. You know, I said sales, cost as a percentage of sales. Well, now we need to know sub-charge type or description, we need cost, we need sales, and we need date. And that’s about it, so it can grab those columns, and even if we have 50 other columns, it’s able to take just those. In addition to that, it’s running in memory and it’s compressing everything because it stores the column. So whenever you’re dealing with analytical databases, commonly you’re going to be denormalizing the data, you’re going to flatten it out, and one of the downsides of denormalizing is you have more repeated values. An example is, we talked about Latin America, well normally in a transactional system, you just have maybe a region table and you have a key and you just have a reference to that. Whereas in an analytical database, you’ve probably merged and denormalized that region information into your invoice table, your fact table, and so now, you’ve got your region repeated 10 million times, potentially. You may have Latin America show up a million times. Well, if you can keep it to as few columns as possible, what the DAX engine can do is it’ll sort the data that’s going to give the best compression and then I’ll use something called Run Length Encoding. So now, if you have Latin America repeated a million times in a row, instead of storing a million Latin Americas, it’ll store Latin America comma one million or something like that.
Carlos: Oh, gotcha.
Eugene: There’s other things that it does.
Carlos: Yeah, so kind of doing those aggregates there for you there, right?
Eugene: Yeah, so it’s not doing pre-aggregations like with multidimensional, but it’s storing things in a way that’s very efficient. So I’ve seen routinely 5 to 10x compression on the data if you minimize the columns and they’re ones with repeated values. If you’re not careful and you include primary keys or GUIDs, you can actually compress to more data. So you want to avoid unique columns or anything like that. But if you, again, architect it right, you can get 5 to 10x compression, easily, on your data. And so that’s another kind of mental shift for people.
Kevin: So, Eugene, you’ve mentioned some of the difficulties, you mentioned that S-shaped learning curve, although I don’t think you explicitly called it that, here.
Eugene: No, I wasn’t smart enough for that.
Kevin: But if I am coming at this from a SQL background, you know, I’m listening to this and saying, “oh boy, this is going to be difficult.” I guess you’re telling me there’s some benefit in it, but what would you say is a good approach to take if you are coming at this from a SQL background and are still, nevertheless interested in becoming proficient at DAX?
Eugene: Sure, yeah, so maybe a situation where you think that the DBA job is going to get automated and you hear BI is hot and trending and you need to start learning DAX. That’s a good question. So I would say again, I wholeheartedly recommend either buying a book or watching my course or something like that, because it’s hard to learn adhoc. But either way, what I would say is understand how filtering and manipulating filters works and start with just some very basic stuff. So for example, make a measure that gives you the sum of sales. You can even use dummy data or AdventureWorks or something, but you can do this in Excel. It’s available as Power Pivot. You can play around in Excel. But say, “okay, I want all sales” and you just type in “sum(sales)” and it looks a lot like Excel. And then you say, “okay, I want all sales for black products.” And so now you say “calculate(sum of sales, color = black)” and then you want to say, “okay now, I want to undo any filters the user has applied.” So you say “calculate(sum of sales, all of color)”. Or something like that and work your way up with just playing around with the filters, so add them, remove them, read about calculate and some of these other functions that work with that sort of stuff. Another area that I think is a good way to learn is read about how to implement time intelligence from scratch. So there’s a bunch of functions, convenience functions, that are built in, but I find it really helpful to understand, “okay, how would I implement month-to-date by hand?” because once you start getting a handle on that, it’s a lot easier to grasp what’s going on. I mean, my advice is to play with it, but especially just start working with some of the simple functions of calculate, all, filter, and understand what’s going on with row contexts and filter contexts, because I think those contexts, those evaluation contexts that determines what the expression can see, I kind of put that at that peak of that S curve, right before it starts to flatten out. So if you can understand measures and you can understand evaluation contexts, that’s going to take you most of the way.
Kevin: Despite everything you’ve heard in this episode, still go learn DAX.
Carlos: Yeah, so I guess maybe a poll, so obviously I haven’t gotten into DAX all that much. Kevin and Eugene both have. So from your coworkers or others in the community or whatnot, what– I don’t know if percentage is the right word. You mentioned being at the user group last night. So how many of those people do you think are going to be good candidates to learn DAX? Does that make sense?
Eugene: Yeah, I’m going to say 50/50, because it was the Columbus SQL Server user group and there was at least one guy who had never really heard about it before, didn’t know where it was used, he was more of a traditional DBA role, that sort of thing. I think that what’s going to happen is very few people are going to look at DAX and say, “ah, this solves my problems.” Most of them are going to say, “I’m using tool X and so now I need to learn DAX.” So if you’re using Analysis Services already, if you’re in tabular mode, you’re using DAX. If you’re using Power BI and you want to do any kind of modeling, you’re using DAX. And if you’re in Excel and you’re tired of doing VLOOKUPs all the time to join together your different tables, you’re probably going to want to learn DAX. So I think that it’s a language that’s tool-driven, as opposed to value add-driven. I think it’s going to be rare for people to go, “oh, this solves my problems.” It’s more likely that they’re going to say, “oh, Power BI solves my problems, and now I need to data model.” Or “Excel solves my problems and I’m sick of VLOOKUPs.” So I think that’s much bigger. It’s hard to look at a person and say, “oh, this would be a good tool for you.”
Kevin: And jumping off of that for the database administrator crowd in particular, think of it this way, you’re not necessarily going to build a Power BI model and a dashboards for end customers, but you still have that kind of modeling requirement internally, probably. You know, you’re having reports that you’ve built, you’ve got things that you collect and finding good ways to visualize that to bring to the surface useful information that you’re going to be able to do to help improve your job. Well, that’s the type of thing that you can do within Power BI and to get the most out of that as Eugene mentions, DAX is going to be a useful tool to know, even if it’s not going to be your bread and butter. Even if it’s something that you’re going to have to Google every time you need to remember the exact syntax for it. Just the knowledge of what you can do with DAX, versus having to come up with a rather complicated SQL query to try to simulate that result. It’s going to be useful.
Eugene: Yeah, a perfect example is that trying to get XML data into SQL Server so you can manipulate it is a giant pain. Trying to get Excel data that’s written by hand by someone that has blank rows in between and columns that need filled in and all that stuff is a pain, but it’s easy to do with Power Query. And so an area where if you’re a SQL person may be beneficial is if you’re dealing with certain flat files that you don’t want to write up a whole SSIS process just to get the data into SQL, just so you can do some basic analysis. Another thing, like Kevin said, is, I think something we’re seeing more is that we’ve got DBATools of varieties that are adding on Power BI dashboard. So if I remember correctly, the dbachecks program has a Power BI dashboard, some of Brent Ozar’s stuff has a Power BI dashboard.
Kevin: Right, or Open Query Store.
Eugene: Yeah, exactly. And so, I think what you’re going to find is a lot of these open source tools that are made for dbas need a way to analyze that data and Power BI’s an easy way in and if you want to start enhancing that a little bit, you’re going to need to know some of these things.
Carlos: There you go. Obviously having a problem to solve, then kind of opens the door for other things, there, and so using it for your own stuff first probably makes a lot of sense. Awesome. Okay, well I think that’s going to do it for today’s episode, compañeros. As always, we’re very interested in hearing what you have to say. You can reach out to us on social media. Let us know what you’re thinking. Eugene?
Eugene: You can find me at sqlgene.com for my blog and you can follow me on Twitter at sqlgene.
Kevin: You can find me wherever good burritos are sold.
Eugene: Factually true.
Carlos: Yes, especially on Saturdays. And compañeros, you can reach out to me on LinkedIn. I am at Carlos L Chacon and we’ll see you on the SQL Trail.