Carlos: Compañeros! Welcome to another episode of the SQL Data Partners Podcast. I am Carlos L Chacon. This is Episode 197. I am joined today by Kevin Feasel.
Carlos: And Eugene Meidinger.
Carlos: And compañeros, we all have to learn. Some lessons must be learned more than once and to my eternal shame, this is actually the second time we are attempting to record Episode 197, as the first recording somehow vanished before I was able to get it to cloud storage.
Kevin: I’m going to try to make the exact same jokes at the exact same timepoints, regardless of what people have been saying up to that point. So, if you just hear some weird discordant comment from me, that’s the reason why, honest.
Eugene: I appreciate your eidetic memory, the fact that you can pull that off, that’ll be beautiful.
Carlos: Yeah, so we’ll see what our editing crew can do, there. Okay, so our topic today is Power Query in Power BI. I don’t know if we’re getting close to the end, but one of the features we’ve touched on, but not necessarily dived into, and so that’s what we want to talk about with Power Query. And the good news is that some of the dumb questions I asked last time won’t have to be edited, because I’ll know not to ask them.
Kevin: Also, since we kind of are redoing this, there is a new exam around analyzing data with Power BI that’s in beta.
Eugene: Oh, nice.
Kevin: That just came out, I think it was yesterday. It’s in beta, it won’t be out for a little bit, yet, but it’s DA100.
Eugene: Oh, very interesting.
Kevin: So, take all the stuff you learn about Power Query today and you’re going to be able to apply it to this exam and get like 5%.
Carlos: There you go, yes, so if you decide to take that model, let us know if you’re under or over Kevin’s percentage.
Kevin: Well, that’s marginal 5%. Hopefully you get higher than 5% on the exam.
Eugene: Right? I think you get that for putting your name in; it’s like the SATs.
Carlos: Okay, so we do have a couple of shout-outs, compañeros, and it’s fair to say that I hope that everyone is safe, right? We live in a different world than we did a couple months ago, and so we are thinking about you. So, for those who are showing a little love to the podcast, we appreciate it. We appreciate those who are– if you can get out, right? I know some of our urban areas are having a tough time social distancing. There’s only so much space and a lot of people. But hopefully you’re safe. So, shout-outs to Tony Hoechbauer, Idris Arslanian, Max Araújo.
Kevin: To be fair, Carlos did just about as poorly the first time around on those names, too.
Carlos: Yes, so as always, our customary “we’re sorry.”
Carlos: We’re sorry if we butchered your name. We still love you and we hope you still love us. And our show notes for today’s episode will be at sqldatapartners.com/powerbi or sqldatapartners.com/197. Okay, so when we think about Power Query and where it fits in the model, if we’re used to– or at least my experience with Power BI, you open it up, you click Get Source, and then you want to start visualizing, you may very well have skipped over the very topic that we want to talk about today. So, Eugene, tell us about– oh my gosh–
Eugene: Power Query?
Carlos: Tell us about Power Query.
Eugene: It’s two words. You can do this, Carlos. I believe in you.
Carlos: For some reason I was thinking like there’s so many Powers, I’m like, “could it really– Power Quer– are we really talking about Power Query in Power BI?”
Eugene: The nomenclature isn’t the best, I admit. For a while, they had reverted the naming to Get and Transform Data, which is more intuitive, but very hard to Google. Actually, still in Excel, it’s just Get and Transform Data, and for a while in Power BI, they changed it just to be the Query Editor, and they finally put the Power back in, which I’m very happy about.
Carlos: I’ve got the power!
Eugene: Mm-hm, sure. So, first, something whenever I talk about this stuff that I like to say is that if your data lives in a structured SQL format, if you have a data warehouse, if you’re lucky enough to have a data warehouse and you don’t consider OneDrive to be a data warehouse, you may not need to use Power Query in a sufficient capacity. You could probably get away with just doing, load the data and hitting Load instead of Transform at that UI that you were talking about, but Power Query happens kind of at the beginning. It is the data cleansing, data prep stage or what people like to colloquially refer to as data wrangling, I say colloquially, because it’s a very informal way to describe it, but Microsoft has started doing that. Because, at least to my knowledge, if you use Azure Data Factory you can now use Power Query in it and it’s referred to, instead of the regular Data Mapping step, it’s the Data Wrangling step.
Kevin: Wrangling data flow, yeah.
Eugene: Yep. So, I just imagine a rodeo-style thing where your data’s this big bull and you’re trying to lasso it down and flip it over.
Carlos: Get your cowboy hats out.
Eugene: Yeah. Yeah, so that’s where it kind of fits in. it’s not where you’re putting in the business logic that makes your business unique, generally speaking. But instead, it’s more the kind of clean-up stuff, like making a date table or replacing missing values or unpivoting data or whatever that you’re going to need to do for any kind of data you’re working with, no matter what your business is.
Carlos: So, I think a lot of this is the approach, the whole self-service thing. So, I have data, I want to be able to format it, so that it is then nice and prepared for my visualizations. So, this is part of the pieces that come from Excel, so are our Excel gurus going to be happy with their experience in Power Query?
Eugene: Well, so there’s a couple things with that. The official litmus test is it was designed for people to get daily value out of the Excel formula bar, so it’s certainly targeted for them. One of the things that’s nice is that it’s very graphical user interface heavy. So, I like to say it follows Pareto’s law or like the 80/20 rule, and so 80% of what you want to do, there’s a button for it. You know, if you want to split columns, pivot data, heck, in Power BI, there’s a Create Column from Example, so literally you just type in what you think it should be for multiple rows, and if it’s something straightforward, like making a full name or some date manipulation, it’ll figure it out pretty well. The other 20% is a nightmare, because now you have to move from the GUI to this M code that you’ve never worked with before, that looks like a bunch of punctuation thrown all over the place. Pardon my very, very angry 6# chihuahua. He does not like M code and neither do I.
Kevin: Would have been better in Haskell.
Eugene: Yeah, yeah. So, you have that, but then also, from my experience, it’s very much optimized for the kind of person who uses VLOOKUPs in Excel unironically. So, the people that they don’t have SQL tooling, SQL experience, or they know just enough SQL to be dangerous and so they’re forced to do a lot of data manipulation, data joining in Excel. And so, if you’re that person, Power Query is a God-send, it’s awesome. So, definitely targeted at the self-service market, definitely targeted at Excel users or business users in general. They’ve made the tooling really approachable.
Carlos: Now, so this idea of getting daily value out of the Excel formula bar, so again, this idea I can basically automate these steps over and over, is that fair?
Eugene: Yeah, I mean, I like to joke that anything that you can pay an intern minimum wage to do in Excel, you can automate. And I say that just because this isn’t SSIS. It’s an ETL tool in a similar manner, but a lot of the stuff that you can do in M, or specifically through the GUI with Power Query, it’s very basic stuff. But, if you’re doing that manually today, if you’re doing any data preparation manually today, Power Query can probably automate that, and that’s awesome. Then again, that’s why I say people who use VLOOKUP unironically, they’re doing a lot of manual data prep on a regular basis, so it is really good for automating simple data manipulation.
Carlos: So, talking about the 80/20 rule, so 80% of it is in the GUI, you’ll be set up there. There’s that other 20%, so in terms of I want to roll up my sleeve and take a peek at M, resources that might be available for us to check out?
Eugene: Well, I mean, whenever you’re using the GUI, it auto-generates the code, so you can go to the Power Query Editor and click Advanced Query Editor and you can see the code and see what’s going on and play around that way. There’s some good books out there. I think one of the most recent ones is by Gil Raviv. So, there’s some good books out there, but be aware that a lot of those books do take into account that the core audience are Excel folks. So, if you’re coming from the SQL world, you might be reading this like, “why do I need to know how to read .csv files? What is this, the 80’s?” Or, “what is this, the 2000’s with data lakes?” You know, time is a flat circle.
Kevin: I swear I’ve heard this conversation before, officer.
Eugene: Yeah. Yeah, so it may still be a bit confusing. I think, I believe Matt Mason was the product manager for Power Query for a while, and he has some really good M deep-dive videos on YouTube that I would definitely check out. Also, during Free April with Pluralsight, if you’re listening actively, you can watch my M courses on Pluralsight and help pay my mortgage, so that would definitely be appreciated.
Carlos: There you go. Yeah, free in the month of April, so hopefully, let’s see, by the time this goes out, which I think will go out on the 15th.
Eugene: You’ll still have like a week.
Carlos: Yeah, it’s two full weeks.
Eugene: Two weeks, our most avid viewers.
Carlos: That’s right.
Eugene: Mala will take advantage of it. Everyone else will probably–
Kevin: She might already have.
Eugene: She might already have.
Kevin: I let her know. Also, if you’re looking for M resources, just type “M” into Google. You’ll just find it immediately.
Carlos: Yeah, the very searchable language.
Eugene: Yeah, that’s why I was so mad when they changed it to Get and Transform Data. It’s like, “what am I supposed to do with this?” It’s worth making the distinction that M is the data prep language, the actual code, just in the same way that you might write T-SQL, but Power Query is the entirety of the infrastructure, and so a lot of the documentation refers to Power Query, because again, it’s a lot easier to understand and think about. But technically speaking, you can run Python and R in Power Query. You can run those as steps. Now, the integration there is really clunky behind the scenes. So, I think David Eldersveld, who, I believe still works at Blue Granite, had written a post or two on this, but literally, like it’s writing your data on the .csv files and then kicking it over to the Python engine or the R engine to do the work and then taking it back, so again, everything’s just .csv files, it’s just flat files everywhere. But, you know, M is technically not the only language you can use in Power Query, but most of the time it’s what you’re going to be using.
Carlos: There you go. And so, I know we talked about this previously, so that is also very similar. So, in terms of your own personal use of M, and having to break it open, what percentage of the time are you having to crack the knuckles and get away from the GUI and start editing M code?
Eugene: So, I’d say 90% of the– so, how do I put this? We’re going to mix percentages. So, it works 60% of the time 100% of the time, kind of thing. So, I would say that maybe like 10-20 of my work in Power Query involves actually looking at the code, and easily 4/5 of that, so 80%, is just because that’s faster for certain things. It’s very rare that I need to do something complex with hand-crafted, artisanal M code. Usually it’s more that there may be something that’s not an option. Like if you do– I forget. I think if you change data types or if you rename columns or you do some of these things, there’s not that little gear icon by the step and so if you want to make more modifications, you either have to redo the steps and hope it combines, or you just go in and change the M code. So, a lot of times, whenever I’m renaming a step or something like that, I’ll be going in and making sure that everything looks right or I’m renaming a column or whatever. So, usually it’s just that for me, there’s certain things that are just faster if I’m just in the code. Anything that I need to change multiple times throughout, that’s common, but yeah, most of the time I’m able to get away with working with just the GUI.
Carlos: Very good. So, you didn’t have too many jokes, Kevin.
Kevin: Not this time. Nor did I derail us on F# as many times as I expected.
Eugene: Yeah, I was waiting at the 6 minute mark, because I think we’d made some sort of F# joke cause I compared it to that.
Kevin: Oh, it wasn’t a joke.
Carlos: They should have used F# instead of M?
Kevin: Yes, actually, it turns out there’s a Power BI idea about it, except that the person used the wrong language. They’re like, “replace DAX with F#”, and I’m thinking, “that’s not the one you replace.”
Eugene: No, no.
Kevin: It’s gotten 15 votes in 5 years, people. It’s a wave of popularity.
Eugene: Actually, this is totally a tangent, but there was some request on the user voice for Power BI and I saw the picture on Twitter, and I’d misread it as ‘interesting’ as opposed to ‘in-testing’, and I’m like, “wow, that’s a very nice way to say, ‘go pound sand.’” Like, “oh, this is an interesting ticket. Not gonna do it, but it’s interesting.”
Carlos: Oh, in-testing? That’s funny.
Kevin: Really weird when they get the intestines tickets. Those are the bad tickets. In-testing, interesting–
Eugene: Power BI internals.
Carlos: They’re trying to keep it PC, apparently.
Kevin: Was it a small intestine or large intestine query?
Eugene: Yeah, is this the duodenum or jejunum? I need to know. But anyway, we digress.
Carlos: Just a tad. Okay, so there are some options out there. Hopefully if you have to roll your own M code, we’d be interested in hearing from you. Let us know under what circumstances you had to, what was the use case, there? We’ll point you to a couple of books. Now, again, you mentioned somebody else that we don’t have on the list.
Eugene: Okay, so now it’s my turn to butcher names, especially just based on memory. So, Chris Webb has a book, but I don’t think he’s updated it since 2014, which is a shame, because he’s a great writer. I want to say Miguel, I’m going to say Escobar, but I feel like it’s something similar and I’m just an idiot, but the book’s called like “M is for (Data) Monkey”. I enjoyed that book quite a bit and I heard a rumor it’s on its second edition, but I haven’t gone to look. So those are both good books, but again, both very heavily on the Excel focus. And then Gil Raviv’s is much newer and looks very comprehensive from what I’ve seen. Those are my three book recommendations, and then honestly, just Google Matt Mason and I guess ‘M’, just a bunch of ‘M’s or Power Query. He’s got some really good stuff. Like some of the stuff, like the thing where I mentioned the litmus test is Excel formula bar usage, that’s from like 2014 TechEd Australia, but it’s still an amazing session that really helps you understand a lot of the guts of the M language. Because, I guess before we wrap up, it’s worth saying that one of the things that I think is fascinating about the M language, that really is kind of rare to see unless you’re dealing with a functional language like F# or something, is they’ve applied a lot of techniques that make your code more performant right out of the box, without you having to do anything. So, the first one is that it has limited side-effects, so what that means, it’s not like you’re getting a headache from running M code, necessarily. But what that means is that it’s limited in how much it touches the outside world, and so–
Kevin: If query runs for more than 4 hours, you should consult a doctor.
Eugene: Yeah, exactly. So, you may think, “oh well, that makes sense.” But one, a lot of stuff you do with the database is just state manipulation, so a lot of times you’re dealing with state and messing with the outside world and that stuff, but I remember at my last job, I was trying to set up unit tests in a Visual Basic 6 ERP application, which you can imagine why I started working for myself. And I’m like, “I’m going to put in unit tests. Well, what can I test that– okay, we have to read from a database, fine, but it doesn’t really affect the outside world, much?” I’m like, “oh, the tax code. You know, there’s a function, and if you put in city, state, zip, it makes the database call, gives you the tax rate, how hard can this be?” Well, in order to do that unit testing, I had to pull in the email library. And I’m like, “why, why do I need an email library to figure out what the tax rate is?” Well, apparently the code was designed by this one company. It was just a home-spun sort of thing, that if you had sent in a call without a state, like Pennsylvania, it assumed that the data was bad and would email the administrator. And so literally, yeah, no, I kid you not. So yeah, that’s what side-effects look like. Whenever you’re dealing with M, there’s very few side-effects, and so what that then means is that it can reorder steps in a way that makes sense, but then it can also do lazy evaluation. And, you know, it sounds like a moral judgement, but all that means is that it will not run code until it knows it needs to. And one of my favorite demos to do with that is that you can make a stored procedure that takes 10 minutes. And the way that I usually do that is I actually make it just run 10 seconds for demos and I use the Wait For command and then I have it return a select value. So, you can have a really slow stored procedure and then do a bunch of manipulation to the results, and then for the final output, just output 2 instead, so you end up not using any of it. Power Query will never run that code. It will never run that stored procedure and so even though you have a stored procedure that takes 10 minutes and you’re doing all this stuff; it’ll never touch it. And that’s not the case, to my knowledge, with SSIS, which is much more iterative. It’ll call the data, it’ll go through, it’ll do all this work and then if later you decided, “no I didn’t need that column” or “no, I didn’t need that information,” well, it’s too late.
Carlos: That’s right. “Cause you told me to, Mr. Sergeant.”
Eugene: Yeah, well, yeah, as Andy Leonard likes to say, the computer does exactly what he told it to, which is the problem. And then finally, there’s something called query folding, which is very similar to say, predicate push-down or something like that. But not only is Power Query lazy in what data it pulls and what work it does on its side, but if it can say, “you know what? I’m going to push this down to the query,” like a filter or a join or whatever, it’ll push that down. And so, you can take advantage of indexes and all of that stuff on the SQL Server. It’s actually so intelligent that you can join data from an Excel file and a SQL database, and it’ll read the Excel file first and push that into the SQL query, which is pretty amazing. So yeah, there’s a lot going on underneath to make your M code run a lot faster, and so I do like Power Query and M quite a bit for that reason.
Carlos: Very cool.
Carlos: Okay, compañeros, I think that’s going to do it for today’s episode. And only because our image is going to be Voltron, I feel like we should reference, again, one of the casualties of having to record this again is that–
Kevin: We had so much great material; it’s all gone.
Carlos: That all of these pieces come together and form our 80’s hero, Voltron. So, if you see that on our promotional material, that’s where all that comes from.
Eugene: I feel like that’s gotta be the best part of Amanda’s job is just searching Shutterstock for images for the promos.
Carlos: Yes, except for when the idea well runs a little dry and she’s like, “what do you want me to do for this?” And I’m like, “uhhhh.” Okay, and as always, compañeros, if you want to reach out to us, we are still available on social media. Connect with us. Eugene?
Eugene: You can find me @sqlgene on Twitter or Eugene Meidinger on LinkedIn.
Carlos: Mr. Kevin?
Kevin: You can find me on random Zoom videos.
Eugene: He’s the one who’s been Zoom-bombing all over the place.
Kevin: I am wearing clothes, though. Don’t worry, I’m wearing clothes.
Eugene: We can only tell about the upper half right now, so I can only confirm half of that statement.
Kevin: You can really only confirm about like upper chest.
Eugene: Yeah, exactly.
Carlos: It’s a cut-off.
Eugene: Crop top.
Carlos: And you can reach out to me on LinkedIn. I am at Carlos L Chacon. Again, compañeros, we do appreciate it. Oh, we do, and you’ll forgive me the names, we have had a couple of requests for back to basic DBA. I guess too much Power stuff.
Eugene: You’ve deviated from your core audience and the people are angry.
Carlos: Like, “what the heck is this?”
Eugene: I haven’t touched a proper database in a year now, so I’m useless, so best of luck with that, Carlos.
Kevin: Oh, I’m great at making stuff up, I’ll be good.
Eugene: Yeah, that’s true.
Carlos: There we go. SQL Server 2000? I mean, how much can have changed since then, right?
Eugene: Um, I think triggers. They either added like server triggers or table triggers, I forget, but that, at least, has changed since 2000.
Carlos: Yes, oh, you had to go there. Man, triggers. I have been dealing with a couple of triggers now that I’d like to be like–
Eugene: Sorry, it’s just the weird stuff that sticks to your mind.
Carlos: Yes, just stay away, people. Stay away from the triggers.
Kevin: Except when you need them. But that will be another day.
Carlos: Another day. We will live to record another episode. Thanks again for tuning in, compañeros, and one day I hope to see you, again, on the SQL Trail.