So what happens when you want to use data from two different sources, manipulate it (transform it) and then present the data in Power BI — and you don’t know how to use SSIS? Well, it turns out you can use Power BI dataflows. In this episode we explore the topic of dataflows with our guest, Matthew Roche, and talk about the scenarios and types of folks dataflows was intended for.
“With dataflows, Power BI takes that Power Query experience, both the editor and the engine, it puts those into the Power BI service so that any user can build a Power Query and land the results into a dataflow entity where it can immediately be reused by multiple users in multiple work spaces.”
“Having an opportunity for that business user to do more of the upfront work than they could previously do, and to have a solution that is more self-supporting and more architecturally correct, that’s one of the things that dataflows deliver.”
“What dataflows will allow is for the individual entities that are being produced by a business user to be shared more easily and consistently and in a more manageable way with other business users across the organization.”
“Twenty years ago, one of my favorite questions to ask at cocktail parties was, “if you were stranded on a desert island and you could only have one programming language for the rest of your life, what would that language be?” My answer was always Transact SQL and their answer was always not to invite me back to their other parties.”
Listen to Learn
00:00 Matthew loves to bake and give it away for others to enjoy
01:24 Intro to the guest
01:58 Compañero Shout-Outs
04:26 What Have I Learned
08:06 The big explanation of why dataflows exist and who they’re supposed to be helping
13:37 Do you still need to have some IT chops?
18:24 Why self-service needs with business users happen and how dataflows helps
25:58 Dataflows empowers business users and makes the hand-off with IT stronger
27:35 What tech-savvy Nancy from finance should do to get started with dataflows
29:28 Eugene’s thoughts on why dataflows were created and who they’re good for
31:56 Dataflows and native SQL queries
35:39 Query folding is very powerful
38:11 SQL Family Questions
43:07 Closing Thoughts
About Matthew Roche
Matthew Roche is a Senior Program Manager in Microsoft’s Cloud + Enterprise Group, where he focuses on business intelligence, enterprise information management, metadata, and data warehousing.
Matthew is currently a member of the Power BI CAT team, where he was saddened to learn that there were not actually any cats. Previously he has worked on Azure Data Catalog, SQL Server Integration Services, Master Data Services, and Data Quality Services. Before joining Microsoft, he worked as an architect, consultant, and trainer focusing on ETL, data warehousing, and business intelligence.
When he’s not enabling the world to get more value from its data, Matthew enjoys reading, baking, heavy metal, and competitive longsword combat.
“Happy Rock” for What Have I Learned by https://www.bensound.com
Music for SQL Server in the News by Mansardian
Matthew: I found myself coming home and making bread every day. Because if you’re making bread by hand, a lot of it is just, ah, but this was back before I did martial arts, so it was a chance for me to burn some of those calories, work through some of the frustrations of the day. But it also meant that I had all of this bread, so I would end up taking what I had baked into my clients. It’s like, “hey, I’m just going to leave this in the break room” and they’re like, “ooh, fresh baked stuff!” and I realized that I could make my customers, my consultant clients both happy and more likely to die young at the same time, as a side effect of my kitchen therapy, and it sort of stuck around. So yeah, bonus. We’re not recording yet, right?
Carlos: Compañeros, welcome to another edition of the SQL Data Partners Podcast. My name is Carlos L Chacon, your host, and today we’re very excited to have Matthew Roche with us on the program. Welcome Matt.
Matthew: Thank you very much, it’s a pleasure to be here.
Carlos: And of course, we are joined, as always by Angela Henry.
Carlos: And Eugene Meidinger.
Carlos: So our topic, and one of the reasons we wanted to have Matthew on the program today, was to talk about Power BI dataflows. You know, compañeros, we are starting to have a little more emphasis on the Power BI space, or that environment, and so we’re happy to have Matthew and chat with him today. Before we do that, a couple of shout-outs. First, I want to give a shout-out to Eric Klein from Raleigh NC, Richard McNeil, Hasan Savran, Alison Baker, and Joe Basil. And Matthew actually had a couple of shout-outs he wanted to give.
Matthew: Yeah, I basically wanted to shout-out to the entire Power BI CAT team. This is the team that I work on and have for the last year or so. Amazing group of super technical folks that inspire me every day and seem to teach me something new every day, as well, which is not something I expected this late in my career.
Carlos: So admittedly, I am not as familiar with the Power BI CAT team, but when I saw CAT team, I admittedly thought of the SQL CAT team, and although he’s no longer a member, shout-out to former member Jimmy May, who I’m sure is hopefully skiing somewhere in the mountains at this time of year.
Matthew: Jimmy and that initial generation of the SQL CAT later Azure CAT team made such a huge difference on my career 10, 15 years ago, when I was working on much more technical things and doing a lot more SQL Server work. So best wishes to Jimmy, definitely.
Carlos: Yeah, that’s right.
Eugene: Speaking of the Power BI CAT team, congrats on your latest addition. I think I saw Phil Seamark was joining the team.
Matthew: We’ve announced that, finally. Phil has been in the pipeline for what seems like a couple months, now, and now that the paperwork is final, we’re super, super excited and I actually think that this week is his official first week. I’m not 100% certain, but he’ll be joining shortly, and I think he is going to be staying in New Zealand, so he’s not going to be relocating, but we’ve got a lot of folks around the world, so that’s not a big difference from the way that we usually work.
Angela: Very exciting.
Eugene: Yeah, we’ll see if he still makes video games inside of Power BI.
Matthew: I think that’s going to be his responsibility. Everybody has their own specialty.
Eugene: It’s good to have a core focus.
Carlos: So is Phil the one who did the soccer, like the World Cup standings? Is that what you mean by video games, or is this–
Eugene: No, no, no, he literally recently published one of those 1990’s, walk around in a very flat 3D maze kind of games in Power BI.
Matthew: Like Castle Wolfenstein without the bad guys.
Angela: Exactly, yep.
Eugene: Yeah, it’s horrifying and amazing, because I know he must be doing some serious abuse to the DAX engine to be able to do any of that.
Matthew: It’s all magic to me.
Carlos: Awesome, okay. And so for our section What Have I Learned, and I feel like we just have to jump right to Matthew, here. He has something much funnier than what the rest of us have been doing.
Matthew: I think the thing that I wished that I had learned, or what I have learned I wish that I had learned earlier is that modern people fight with swords. I’ve learned in 2014 that there’s this thing called Historical European martial arts where people use actual blunt steel weapons and decent protective gear, and they have full contact sword fights. It’s even more awesome than it sounds, but I wish I could say I learned it earlier, because when you pick up a combat sport in your late 40’s, it’s exactly as hard as it sounds or maybe even harder. And there are people that I compete with that they’ve been doing it for 20 years and they’re still in their 20’s, which means that the odds are stacked against me.
Carlos: So you’re saying on the rankings, you’re a little lower on the ranking scale? Is that what you’re–
Matthew: I am definitely more rank than I would want to be.
Eugene: I have to ask about that. I do a little bit of martial arts, and in my very limited experience with the one time going to a martial arts tournament, all of my form just went to heck the moment someone was actually trying to punch me in the face and it wasn’t 10% pull your punches kind of sparring. I’d have to imagine that when there’s swords involved, even if you have protective equipment, the first time, your form just goes out the window.
Matthew: You are absolutely correct. I think that the Western martial arts have more of an emphasis on sparring in competition than at least what I have seen of karate and other traditional Eastern martial arts. So, if your first tournament is the first time that you’ve had a full contact sparring experience, that’s certainly to be expected. But if your club has a fight night every month with people coming in from clubs across the region and it’s something that you’ve done from the time that you were first comfortable to do it, the adrenaline at an actual tournament where you might get a medal or you might get a broken arm, it certainly ramps up the adrenaline and the stress, but it’s not quite as bad as if it’s actually your first time.
Angela: Yeah, I would imagine you probably needed a new pair of underwear after that first time, too, cause somebody coming at me with a sword.
Matthew: I love the competitive environment, so the most significant thing I think is being able, and this may be a segue back into data, but the most significant thing is knowing that your sparring partner has your safety as a top concern, as well. Unlike watching movies or TV shows, where there’s a good guy and there’s a bad guy and there’s all of this drama about it, the community that I have found is very supportive and as a rule is very healthy where even though you’re doing your best to win, no one, with very, very few exceptions that we won’t go into, no one wants to win at the expense of their partner’s or their opponent’s injury. So, even though bruises are common, I’ve broke my hand, I’ve broken someone else’s ribs, it’s never an intentional thing, and generally the person dealing an injury feels worse than the person receiving it. Because when you have a broken bone, invariably you get good pain meds, so you’re feeling pretty good.
Carlos: So, there were a couple of things I was thinking we might have to solve by hand to hand combat, but now it sounds like that’s off the table
Matthew: Oh no, we can definitely keep that on the agenda.
Carlos: Awesome. Okay, the show notes for today’s episode, compañeros, is going to be sqldatapartners.com/powerbi or sqldatapartners.com/159. And so with that, let’s go ahead and get into that discussion about Power BI dataflows. Now, Angela and Eugene were kind of ribbing me a little bit because originally when we planned this, I said, “look, I know nothing about Power BI dataflows. You guys will need to help me, here.” But then in doing a little bit of research, I’m like, “this was only released in November of 2018!” So, I don’t feel that bad anymore about not knowing so much about dataflows.
Angela: Yeah, but it’s Power BI and they release stuff like every hour, so come on! Relatively, that’s like way old news.
Carlos: Yes, so for the lay person, those of us who are a little behind the times, light years, light days behind, take me through, Matthew, we’ll give you first shot, here, what a Power BI dataflow is and what am I comparing this to when I think about, maybe even other products that I might have used in the SQL Server data space?
Matthew: Yep, so that’s a great way to think about it. Let’s think about SQL Server Integration Services as our baseline. So those ETL tools, or as I like to refer to them, data mise en place tools, so you get the data from your sources A, B, and C, you apply all sorts of transformations before they go into destinations X, Y, and Z and in between, you’re doing things like standardization and cleansing and consolidation, as well as getting the data into the appropriate shape and location for whatever its future downstream uses are going to be. This data preparation or ETL or however you want to think about it, these capabilities have always been necessary for BI and analysis work. But it’s always been something that in most contexts, an analyst or business user or power user, the type of person that would be comfortable working in Power BI Desktop or Excel, they would need to reach out to IT for assistance. And at least somebody listening to this is saying, “wait a minute, what about Power Query? We’ve had ETL built into Power BI Desktop since day zero. What about Power Query? This is obviously a tool that fits this need.” Well, if we think about Power Query traditionally, Power Query has been all about defining the definition of the tables in your analytics Tabular model. The thing that you deploy up into the service so that you run inside of Power BI Desktop. Which means that even though there have been self-service data preparation or self-service ETL tools in Power BI for years, their scope and their context has limited their value. So if you have, let’s say hypothetically, a data dimension or other standard conformed dimension you need to use in multiple Tabular models in Power BI, if you’re just using Power Query inside of Power BI Desktop, you would need to have the M script that defines that dimension or whatever your reusable asset is, you’d need to have that in every single PBIX file. They would need to be maintained and versioned manually because without the opportunity to have the output of that query stored in a common location, a data warehouse, data mart or data lake, the logic would then need to be duplicated in all of these different models. So, with that as the background, these mature ETL tools being difficult to use and optimized for data engineers and developers, and the self-service data preparation tools being made available to business users, but not targeting the same scenarios, dataflows in Power BI bridge those two worlds. So, with dataflows, Power BI takes that Power Query experience, both the editor, which is very user friendly, and the engine, that same set of connectors and transformations, and the underlying M Power Query function language. It takes those, and it puts those into the Power BI service so that any user can build a Power Query and land the results of that query into a dataflow entity where that dataflow entity can immediately be reused by multiple consumers, multiple users in multiple work spaces. And through integration with Azure Data Lake storage and two, if the Power BI administrator has configured it to do this, it can actually put the data that has been produced by a user-created query, it can put that into a common format in the data lake so that it can then be reused by services and processes and users outside of Power BI as well. So yeah, it’s a big deal.
Carlos: Yeah, okay. Now, okay, so I have to use the Socrates method now. Normally I would say the knuckle-dragging Neanderthal speak here. You mentioned self-service BI, we’re going to get some SSIS-type flexibility in, but then the next breath you’re talking about M language and data lakes and whatnot. Now, granted, I’m not the Power BI guru.
Matthew: You’ve never heard of Power BI before, have you? Let’s be honest, now.
Angela: He doesn’t even know how to spell it. He had to ask me.
Carlos: But I feel like the people that we’ve interacted with, that want to get into Power BI, they’ve already gone to sleep, basically. I feel like there’s still a chasm, so while we’re getting closer, it seems like maybe you don’t quote, unquote need IT, but you still have to have some IT chops to be able to navigate some of this. Is that a fair–
Matthew: Let me respond in two different ways. One is that from a capabilities perspective, having the ability to use a tool like Power BI Desktop, which is essentially designed to let someone who doesn’t have quote, unquote IT chops, they’re the finance analyst or the sales manager, they’re someone that needs to solve a business problem with data. They may not know the difference between a table and a view or Hive and SQL, but they know intimately how the data relates to the business and how it’s used in core business processes, so think about that person. They can now define a table, so I’m putting table in quotes, here, they can define this reusable entity in Power BI that previously they would have needed to ask someone else to do. And they can do it using the same set of familiar tools or familiar-to-them tools that they’ve been using for other problems for years. And at some point, there’s always going to need to be this grow-up story, this operationalization story when it goes from being the one analyst that is using the data, to the analyst team that’s using the data, to the whole company that now needs to use the data. There will need to be some sort of handoff, or it’s likely there will need to be some sort of handoff, so the IP can provide better support and maybe better scale or better integration. But having an opportunity for that business user to do more of the upfront work than they could previously do, and to have a solution that is more self-supporting and more architecturally complete or architecturally correct, that’s one of the things that dataflows deliver. So moving away from the Power BI context, this is a great way to think about this, if you’re looking at a BI solution, like traditional Enterprise BI where you’ve got all of your OLTP systems, you’ve got SSIS, you’ve got SSAS, you’ve got SSRS, so this old-school, hardcore, all SQL Server. What would you do as a consultant if you walked in and found that all of your SSRS reports were going directly against the transactional systems and there was no data warehouse and there was no analytics model? Do you recognize the challenges that that approach would introduce from a consistency and concurrency perspective?
Carlos: Sure, no question.
Matthew: So, without dataflows, the self-service approach reintroduces a lot of those common and well-known challenges, because those Power Query queries that are part of a Power BI Tabular model, that Power BI dataset, if there isn’t an existing data warehouse or data mart that’s been created by IT, the business user needs to go back to those source systems. They need to go against the SQL database or the REST API or wherever the data that they need is, and that means that even though they’re able to very quickly solve the problems that they need to solve, the thing that they produce, it may not be ready for primetime. And with dataflows, they can basically use that same interactive data exploration and transformation capabilities that they’ve had in Power Query, but instead of putting the data directly into their analytics model, they’re landing it into what is functionally or logically a data warehouse. So just as in that traditional old-school BI application, the inclusion of a data warehouse before you get to SSAS and your reports, it allows for different refresh schedules. It allows for different permissions or different ways to pull in data from the different source systems so that it fits into their load window and fits into the performance characteristics of those systems. We can have those same things, but in a self-service manner. And if you were to think about dataflows as being an SSIS-like tool but optimized for both in the experience and the implementation, optimized for less technical business users and analysts, it’s an awesome way to start thinking about it.
Carlos: Sure, so then to bring out Eugene and Angela into this a little bit, this idea of self-service, making things easier for my users. If I am responsible for reporting, does that mean I just let my users loose on dataflows? Is this kind of a stopgap? Is this kind of a “hey, why don’t you take your first crack at it with dataflows and then let me take a look at it and then I’ll work it into the system”? I mean, obviously Matthew presented a scenario where there was no infrastructure and that’s kind of, I want to use the word extreme, but that’s one scenario, is probably a better word. But for listeners of this podcast, and they’re starting to use Power BI, how are they going to start using dataflows?
Angela: I have a question, here. So, I’m Nancy from finance, and somebody says, “hey, there’s these cool new things called dataflows and they have all of the data that you need, and you can get to them in Power BI Desktop. So yeah, that’s how you do it, you just go to get data sources and select Power BI dataflows,” but there’s nothing there, so where are my dataflows at? How do they get created?
Matthew: So Nancy, you zigged when I was expecting the conversation to zag. Let me come back to Carlos’s question and let me respond with a question. We’ll come back to yours, Nancy. So, Carlos, you mentioned in the set-up for your question that imagine you’re a SQL Server guy, you’re like a data professional guy and you’re starting to use Power BI. Why are you doing that? Why would you use Power BI instead of using these powerful, fully capable SQL Server tools that you’ve been using for decades?
Carlos: Oh, that’s easy. The C-level suite wants a dashboard. They’re not sure what they want, but they want to visual some of that data and they’ve heard about Power BI and they’d like us to create a dashboard in Power BI.
Matthew: So essentially, you’re getting a different set of requirements for your business users and you need to respond to those requirements through a new tool, because the existing tools don’t have the full set of capabilities that you need. Is that a decent way to paraphrase that?
Carlos: Yeah, fair enough.
Matthew: So, one of the most common scenarios that I see aligning with that, and in my role on the CAT team, I have the privilege of working with some of our top customers, like pretty much Fortune 50 or Fortune 100 customer that you can think of, and the way that I often see that, that can’t do it with the traditional tool scenario playing out, a lot of it comes down to I need, as an eye, the data team, the BI team, I need to do more than I am staffed or capable of doing. And when the business comes to me with requests, when I can’t keep up, because they’re asking me to do 6 months’ worth of work every 3 months, that means that some of this needs to get dropped when I can’t keep up with the business requests, the business is going to do whatever they need to do. Whether that is getting text file dumps and working with it in Excel or going out to the web and finding the latest tool du jour to start applying to those problems. Where the most common motivation is the organization as a whole, both IT and the business, that they need to have more insights from more data, but there aren’t enough resources. It just doesn’t make sense to hire twice as many data professionals on that side of things. So this is, regardless of the tool that you choose, this is the most common motivation for self-service business intelligence at scale. So, tools like Power BI and it’s not just Power BI, think about whatever competitors you want to, tools like Power BI are designed to make it so that a non-technical person can fill in the gaps that IT has left. And very frequently, in an Enterprise self-service or managed self-service context, this means that the central BI team has put together the data warehouse, the data lake, the Analysis Services cubes or Tabular models, all of these central golden data sources, and they have provided some sort of training and onboarding so that people that have Power BI licenses, they know where to go to get the right data for finance, for sales, for HR, for manufacturing, for supply chain, whatever is relevant to their roles, they have both the permission and the information about where to go find the data. As they’re going to execute, using their self-service BI tools, whenever possible, they will be working with their inputs are the output from the IT team. So they’ll get 90% or 80% or whatever it is, they’ll get the majority of their data from the existing data warehouse or Analysis Services cubes. And Power BI allows them to close that gap, so if 90% of what they need is in the data warehouse, they get the other 10% from some other system, Power BI gives them the tools to mash them up and put them together without needing IT involvement, and when they’re done, they have a solution that fulfills their requirements. They get the insights they need, they can finish their report, mission accomplished. But the mission isn’t always accomplished, because at some point, the business user that has created what is essentially a full BI end-to-end application, they’ll want to hand this over, in a lot of situations, back to IT and to say “hey, go do stuff with this.” Or they would want to let someone else from their team or from a neighboring team, a sibling team, to use that self-service work that they’ve done as a starting point for something similar or something related. And this is really where a lot of the challenges around self-service BI start to be felt by a central IT team. Because if you’ve got 100 business users, each one of which is creating applications and each one of which is reproducing the same set of logic, the same sort of transformation or calculation logic, eventually they’re going to get out of sync and eventually someone is going to come back to you, because you’re the data team, you’re the BI team, someone is going to come to you and say, “why do these numbers on this report not match these numbers on this report?” And the answer, of course, is because Nancy this one and Bob did this one, so we’ve got different business users introducing problems that IT has solved, but this is outside of that managed IT scope. So this is sort of the journey that a lot of self-service, managed self-service BI organizations go through, where the self-service BI tools are absolutely necessary to allow business to solve its problems, it’s data-centric problems, in a reasonable time frame, but it makes it difficult for IT to manage. So if we then switch the context or switch the conversational focus away from self-service BI in general to self-service data preparation and dataflows in particular, what dataflows will allow is for the individual tables or entities that are being produced by a business user to be shared more easily and consistently and in a more manageable way with other business users and other stake holders across the organization.
Carlos: Which I think can then be really handy, because then those users, they know what they want, rather than in other models, it’s like, “well, hey, I think I need this report that does XYZ”, so you go and you build an XYZ report and you throw it back to them and they’re like, “oh, no, no, no. Y is okay, but what I really need is S, T and Y.” And so there’s this back and forth and back and forth and so at least now, they’re getting that framework or the groundwork to be able to say, “ah ha, now I have A, T and Y. And this is what I need, and now I need somebody to kind of manage that and help me support it,” make this a little more mature, if you will.
Matthew: Yeah, and I love that real-world spin that you’ve put on it, as well, because when the business takes written requirements to IT, 99% of the time, in my personal experience, either the requirements are too vague, or IT just doesn’t understand the business enough and weeks or months later, the thing that IT delivers is not what the business needed. With Power BI and dataflows, business can build it themselves and when they then take the application that they’ve built, and they give it to IT to say either operationalize and support this as is or incorporate these things that I’ve done into the IT managed model in data warehouse. Their handoff, their functional requirements, as it were, is actually the functional application that they know solves the business problems, because the business built it to solve those problems, so it’s a much stronger handoff. And for Nancy, or for Angela, for your question, if you are a business user and you’ve heard about this particular feature and you go looking for it and it’s not there, I think that a lot of this would really make me question, “who did you hear about it from?” And also, “what would you want to do with it?” That savvy technical business user who wants the latest and greatest and coolest new features, that’s a little bit of a unicorn in my experience, so Nancy certainly exists, but she’s not super common. But the thing that I would say in that context is, if Nancy has permissions, which as a Power BI pro user, she does by default, she can actually go into any workspace in the Power BI service and define those reusable dataflows entities on her own, and then any user that has permissions to access that workspace, they can read the data from the dataflows that Nancy has created, which, again, can fill in the gaps for what IT has delivered, but still provide a building block for more reuse.
Angela: Yeah, so Nancy, even though her profession is finance, she is very tech savvy, and she listens to this really cool podcast called SQL Data Partners.
Matthew: Oh, that Nancy.
Angela: And that’s where she heard about it.
Matthew: So if Nancy does go into Power BI Desktop and chooses Power BI dataflows from the data menu and there is nothing in there, what she should do is go to any app workspace, it doesn’t need premium, but it can’t be your My Workspace. It can’t be your personal workspace. But go into any app workspace in the service, and there should be a new dataflows option right at the top, alongside data sets and reports and dashboards and she can create one from right inside the service.
Angela: Very cool. I’ll let Nancy know.
Eugene: It’s interesting, because I think with dataflows and Power Query in general, it’s an exercise in empathy or you’re going to fail to understand what the point of it is. And I mean that seriously, because the first time I heard about Power BI, I was like, “this is dumb”, because so much of it was reinventing the wheel and I had perfectly good wheels. I had SSIS, and TSQL.
Matthew: And the new wheels didn’t have the cool rims on them.
Eugene: Yeah, exactly. And it probably took me a year before I had my equivalent persona, I usually say Chris from accounting, so Nancy or Chris or whoever, of like, “oh, this isn’t for me.” IT people and BI people are going to end up using these tools, but when I thought about Power Query, it’s like, “why do I need this? Most of my data’s in SQL.” And then you run into someone who’s living in flat file heck, where everything’s just these data dumps, there’s no data types, there’s no defined schema, and they have to do a report for the CFO every week or something like that, or end of month or whatever. And I think dataflows are the same way, where if you’re just looking at it from an IT perspective, you’re going to go, “well, this is like SSIS but it has less features.” Whereas, if you realize that Microsoft, I think, they decided, “okay, instead of trying to make IT tools and solutions more business user friendly, let’s go extreme. Let’s go all the way to the other end and we’re going to make something that’s as useable as possible for Chris or Nancy or whomever in accounting or finance, and then we’re going to work our way back to the mature capability models of their other solutions.” And so, I think that’s a lot of what they’ve been doing is they said, “we’re going to start here and work our way back to that grow-up story.” And so they added Power BI Premium so you can deploy to larger groups. They added app workspaces and the ability to actually have QA in staging instead of just, “oh, I hit publish and now I’ve updated production.” So I feel like what’s been going on so much is they’ve decided, “okay, how do I get this person who loves Excel and is great with vlookups and knows just enough SQL to be dangerous, how do I get them onboard with this technology? How do I build a technology they’ll use willingly and perhaps even happily? And then how do I work it back so we’re not just angering entire IT divisions with these weeds of self-service solutions that suddenly they have to maintain as if they were garden solutions.”
Carlos: Yeah, yeah, it is very interesting, how that’s come about, and I think to your point about that integration or two things coming together, one of the things I was taking a look at and that has been rolled out with the dataflows is actually SQL native queries. I think that’s probably a step up from what we were talking about, but those people who can wield themselves with some SELECT statements, now appear to have the ability to be able to go and start pulling some of that data and again, kind of helping that transition from “hey, IT, now you have to support this”, well, at least I have something a little bit more familiar to me to onboard there.
Matthew: Yeah, so my personal take, and I’d like to point out that 20 years ago, one of my favorite questions to ask at cocktail parties was, “if you were stranded on a desert island and you could only have one programming language for the rest of your life, what would that language be?” My answer was always Transact SQL and their answer was always not to invite me back to their other parties. So I love SQL, it is beautiful, it is elegant, it is like Zen in programming language form. But honestly, I think that the support for native SQL queries in dataflows is less significant than it might seem to someone who lives in SQL every day today. So every Power Query query, which includes dataflows, the queries that define dataflows entities, every query is for all intents and purposes, a view. So it is the definition of the query that gets the data and transforms it to define a rectangle of reusable data and a Transact SQL view can only get data from SQL Server. A Power Query quote, unquote view, the query that you can build in Power Query can get its data from 100 different sources: SQL, Oracle, big data, web data, whatever. But because the Power Query engine, the underlying connectivity stack, it has a feature known as query folding, it is intelligent enough to say, “oh, so you’re doing this sorting, this grouping, this filtering, all of these things in your query.” If you’re getting the data from your flat files in flat file heck, it will have to do those transformations in memory as part of the execution process, because the data source doesn’t support any native instore operations. But if you’re going against SQL Server or Oracle or Teradata or another more capable back end, it will automatically pass through anything that it can to optimize that query, to take a full advantage of the original source. So what we’ve announced in the last couple of weeks, or released in the last couple weeks, with these native SQL queries in dataflows is essentially a way to say, “rather than letting this intelligent engine, not unlike the query optimizer in SQL Server do the best that it can to push the right SQL query back to the underlying database.”
Angela: Oh, that’d be basic.
Matthew: We’re instead saying, “you provide the query and the query that you put in will define the starting point for the downstream steps inside the Power Query query. So, it’s great if you already have a query that you want to reuse. I would definitely not recommend people looking at this as the preferred starting point, because it is sort of a bridge between the two technologies, rather than the best part of either world.
Carlos: Fair enough, fair enough. And we’ve talked about, on this program before, I mean, even last week’s episode, the whole idea of data virtualization and just the numerous sources of data that are out there, you’re right, you want to give your users the ability to interact with as much as they can, and so that makes a lot of sense to me.
Eugene: I agree that query folding is really powerful, because literally there’s been times that I’ve tried to break query folding and failed. And I thought, “okay, clearly if I do this GROUP BY thing, then it’s not going to fold into the query” and it does. And if you slap Profiler against a demo server or Extended Events or whatever, you can see there’s a lot of stuff that it’s able to push down, but the moment that you’re explicitly telling it what query to use and then later on top of that, you decide, “okay, I want to filter and I just want black products” for example, it’s no longer doing that essentially predicate pushdown. That’s the problem, is if you’re taking the SQL query verbatim, great, fine, awesome. But if, especially which I would imagine would happen with dataflows, you’re making modifications later or the user’s making modifications, then that’s not getting pushed down, and actually a perfect example, and I think I got this from, I want to say Chris Webb, I forget. Query folding is so powerful that it can pull information from a flat file and push it into the SQL query.
Carlos: Oh wow.
Eugene: This is one of the coolest demos. If you take an Excel file and it has, say, just a single cell and the word ‘black’ and then you join it against your SQL data and you say, “alright, I just want products that fit this JOIN,” it’ll actually take that word ‘black’ and put it into the SQL query that’s going to the SQL Server, and so it’s that powerful. And the only reason I know about that is because there’s this whole thing about privacy levels and stuff where maybe if that word ‘black’ was instead someone’s social security number, you wouldn’t want query folding to be that powerful and accidentally exfiltrate that data. But it’s amazing, some of the stuff that can do and some of the performance gains that you can get if you go through the GUI as opposed to writing out all of the TSQL.
Matthew: I would like to say that based on all of the directions this conversation has gone, people may have more questions than they came in with. So I would recommend both our dataflows documentation, as well as my personal blog, which has a lot of dataflows content, so it’s ssbipolar.com.
Carlos: Very good. yeah, so I agree, you can go in lots of different directions. We were taking at it from the IT more perspective. I think one of the pieces there to keep in mind is that bridge. It’s there to help bridge the gap to allow your users to go faster and even if you, the executioner, not the executioner, the worker bee, the IT worker bee.
Matthew: I like executioner better.
Eugene: He’s a fan of heavy metal.
Angela: I’m sure Matthew’s got an axe he can let you borrow.
Carlos: That’s right. Yeah, all that medieval sword fighting. Okay, awesome, shall we do SQL Family, then?
Matthew: Sounds wonderful.
Carlos: All-time favorite movie?
Matthew: How to Get Ahead in Advertising. It’s a late 80’s British comedy, very, very weird, but if you watch it, you will understand me a lot better than you do already.
Angela: Okay, alright. What is the city or place you most want to visit?
Matthew: My favorite city in the world is Stockholm, Sweden, so wonderful people, wonderful food, just beautiful city with lots of islands. I’m fortunate that I get to travel a lot, which means that most of the places that I want to go, I have already gone, but Stockholm is the place that I love to go back to the most.
Carlos: Very cool. Admittedly, still on my places to visit list. Talking about food, a food that reminds you of your childhood?
Matthew: My childhood had awful, awful food, so nothing good. All of my food memories are recent and wonderful.
Angela: Was your grandmother British, too?
Matthew: I was the child of hippies, so I had a lot of whole grains and lentils growing up. It’s a long, painful story.
Angela: Alright. So how did you first get started in SQL Server?
Matthew: I actually got started as a SQL Server trainer. Back in the 90’s, I was part of the MCSE or MCT Microsoft Certified Trainer first generation or maybe second generation and my training center needed me to teach SQL Server 6.0 and 6.5 classes, so I sat a class, I got certified, and I just fell in love with it. It was the first technology that really, it touched this visceral, emotional part of me, something about working with data, shaping the data, etcetera, etcetera, it really hit a sweet spot for me and it’s been all downhill since there.
Carlos: Now, in that time working with SQL Server, if there’s one thing you could change, and we can toss in Power BI here, any of the SQL Server family of products. If you could change one thing about it, what would it be?
Matthew: I don’t think about the technology when I answer this. To me, solutions are much less interesting than problems. The thing that I think that we need to solve, we, being the data industry in general, is having more data literate users and having a community of practice or having best practices for building communities that practice inside large organizations. It’s hard to do things at scale, and it’s hard to keep up with the pace of change that today’s world has forced upon us. There needs to be better patterns for us to apply to solve the types of problems that we were discussing today, regardless of the technology.
Carlos: Very good.
Angela: Yeah, I would definitely agree with that. We keep reinventing the wheel.
Matthew: Let’s make it round this time.
Angela: Yes, that’s a good plan. Alright, so what is the best piece of career advice you have received?
Matthew: Sometimes you need to let other people fail. It’s really hard to see somebody trying to do something and know that they’re not going to get it right and know that it’s important, it’s always, oh you want to step up, you want to give them a hand. Sometimes the way that you can enable someone else’s success it to let them fail at something important, because that actually shows the leaders, the people that could give them more resources, that there’s not enough dedicated to it now, and it’s hard, and I haven’t learned it, I don’t do it all the time, but it’s probably the best advice I’ve got.
Carlos: Yeah, I thought you were going to go a slightly different method there and just in the thought that failure brings more learning opportunities. But you kind of made it a little more holistic, a team approach type idea, so that’s interesting. Matthew, it’s been great having you on the program today. Our last question for you, if you could have one superhero power what would it be and why do you want it?
Matthew: I don’t honestly have an answer to that one. Everything that I can think of is either trite or immature. I honestly think that the powers that we should have as people are the powers that we earn for ourselves. And if there’s one thing that working with data and working with swords has taught me is that you need to practice, you need to study, you need to continually drive yourself to get better and any super power, even though this is obviously the hypothetical question, any power that just shows up, you’re never going to appreciate it, you’re never going to be able to get the most from it. So, maybe wisdom. That sounds like a super power, right?
Angela: For some people it is.
Carlos: That’s right, all of a sudden, you’re going all philosophical on us there, Matthew, at the end. We’ve gone full circle here, in this conversation.
Matthew: Hopefully we can keep things light as we move forward.
Carlos: Well, Matthew, thank you so much for joining us. It’s been a pleasure talking with you.
Matthew: It’s been a real pleasure for me as well. Thanks again, and I can’t wait to hear what everyone thinks about Power BI dataflows.
Carlos: We just had this whirlwind conversation with Matthew. We’re looking for some feedback, compañeros. We’d like to know what your questions about dataflows might be.
Eugene: It definitely felt like we were doing a lot of zigging and zagging all over the place, and that kind of reflects the real-world impression that people are getting with dataflows. There’s a lot of people that are saying, “oh, well this is supposed to replace the data warehouse, but it can’t” and it’s like, no, it’s not really to replace the data warehouse. And people are like, “oh, is this supposed to be another version of SSAS” and it’s like, no, not really. And it’s funny, a lot of Matthew’s blog posts are more about what dataflows isn’t than what it is.
Carlos: Well, and I think compañeros, I mean you did get that, right? I feel like that idea that it is baby steps, if you will. Allowing the business to be able to move forward with asking the data questions so that you, we, the IT folks don’t have to support it, I think is a pretty good use case, or it sounds like it’s a pretty good use case for what dataflows is, and then what it isn’t, because you’re giving that scenario for who and when it should be used.
Eugene: It sounds good. I think that makes a lot of sense, and I think it’s one of those technologies that takes a couple run throughs to figure out what it’s actually for. I feel like it’s the platypus of the BI world. You’re not sure if it’s a duck or a mammal or what.
Carlos: And it is brand new. We’re talking about November of 2018, so we’re less than three months in.
Eugene: Right, right.
Carlos: So, yeah. It’s a crazy world, when you thought about it seems like from a data platform perspective, I’m not saying SQL Server not changing, because that’s not true. We heard about things at Ignite and whatnot, but man, Power BI, blink and it’s changed.
Angela: I guess what I would say is I really like how Matthew did kind of say what it isn’t. That was kind of my whole point of asking the whole Nancy from finance question is so I could kind of pull that out and if, you know, because people are confused. They’re not really sure, especially a lot of people in IT, because they see, “oh, this is awesome, this is great” and then you’re like, “oh, but you know what, I have all of these other tools that do that, so why would I even bother?”
Carlos: Right, right. Interesting, okay, well, great conversation. Okay, compañeros, that’s going to do it for today’s episode. Thanks again to Matthew Roche for joining us talking about dataflows. Again, we are interested in your feedback. You can reach out to us at sqldatapartners.com/powerbi, get show notes for today’s episode. And you can reach out to us on social media. To reach Eugene, you can get him at sqlgene, Angela is sqlswimmer and compañeros, I prefer you contact me on LinkedIn. I am at CarlosLChacon, and we’ll see you on the SQL Trail.