For only 10 bucks a month you can start visualizing your data and finally get the dashboard your executives have been waiting for!! What’s that? You need to refresh the data–perhaps even multiple times a day? Oh, well it’s still $10 a month, but you are also going to install a gateway. What’s that you say? Well, in this episode we begin our discussion on Power BI with one of the first components you will need outside of the Power BI Desktop for your users to start interacting with updated data. We discuss what it is, some of the requirements, and our experiences setting one up.
Power BI is so easy to use, you would think that installing/configuring/using a Data Gateway would be easy. It is, if you answer some questions before you get started.
Are you mixing your data sources?
Are you using more than one type of data in your dataset? For example, are you using Excel spreadsheets and SQL Server? Or CSV files and Oracle? If so, you will be in for a big surprise. Currently, only homogeneous datasets are supported in the Data Gateway. If you have heterogeneous datasets (Excel and SQL Server), you will need to use the Data Gateway in Personal Mode. Now, I say currently, but Microsoft is constantly adding new features and improving upon existing features so this may change in the future (no guarantee though) so keep checking.
Where to install your Data Gateway?
Because the Data Gateway acts kind of like an operator, just answering calls and passing them through to the requested data source, you need make sure that you install the Data Gateway as close to your data source as possible to reduce network latency. You don’t want to have your data going through multiple hops to get back to the Data Gateway. Oh, and don’t plan on installing your Data Gateway on a domain controller, because it’s not supported and won’t work.
Are you going through a firewall?
I know this seems like a silly question, of course you’re going through a firewall to get to on-premises data, because you are all security conscious data people. More than likely your firewall will be fairly locked down, which means you will need to open some ports. Luckily they are all outbound ports, so your security administrator shouldn’t give you too much grief in opening them up. In addition to the ports, you will need to whitelist some IP addresses. But keep in mind that Microsoft is constantly adding new data centers and your list of IP addresses can change (the list is updated weekly). This means you will most likely need to automate the retrieval of the IP addresses from Microsoft and apply them to your firewall.
Do you need to use HTTPS instead of direct TCP?
If you are going to be using HTTPS, there’s good news and bad news. Bad news is that you will most likely encounter slower performance. The good news is that the gateway will be strictly using FQDN instead of IP addresses. So instead of having to keep up with all the constantly changing IP addresses for your firewall whitelist, you can simple supply the list of FQDNs to your firewall administrator and no automation will be necessary.
Are you okay with using a local account for the Gateway Service?
When you install the Data Gateway, by default it is installed under a local account that is created by the install process. This account will be granted “log on a service” permissions, so if you’ve got System Administrators that don’t like having lots of local accounts, then you can use a domain account instead.
What account should you use to register the data gateway in Power BI?
While I know that most users are super excited to start accessing on-premises data, you must resist the temptation to register your Data Gateway under a specific user’s account in the Power BI service. What happens if that person goes on vacation or worse yet, leaves the company? You are better off creating a domain account that will act as a service account for registering the Data Gateway. Just make sure that the service account is set up to receive email as Microsoft will send a verification email when the account is first registered with Power BI.
Answering these questions can save you a lot of time down the road. This is not all inclusive list of questions that you need to ask prior to installing/configuring/using a Data Gateway, but they should get you off to a good start.
“Data Gateway is definitely a very good resource for being able to pull data from different types of data sources.”
“With the Data Gateway, it’s basically just looking to see what’s your data source, do we have the credentials, what reports are mapped to it, and it just seems to work.”
“The ideal scenario in any circumstance is that the Gateway is as close to the data as possible without impacting the server, so unless you’re running a tiny workload, you probably don’t want to run the Gateway right on the SQL Server.”
Listen to Learn
02:07 Compañero Shout-Outs
03:34 What Have I Learned
06:40 SQL Server in the News
15:06 Intro to the topic
17:04 Installation…it’s easy, until it’s difficult
18:37 There are two different types of Data Gateways
21:22 Compare and contrast Data Gateway with SSIS and its multiple services
24:50 Data Gateway needs quite a bit of space
27:12 Two possible scenarios where you might have more than one Gateway
28:59 There are three ways to implement row level security
30:48 What if you’re not accessing Microsoft data sources?
33:14 Query folding in Power Query might be a challenge
34:21 Closing Thoughts and how to connect with the panelists
Music for SQL Server in the News by Mansardian
Carlos: Compañeros, welcome to another edition of the SQL Data Partners Podcast. This is episode 158 and I am your host, Carlos L Chacon. I am joined today by my fellow co-hosts, Angela Henry.
Carlos: Eugene Meidinger.
Carlos: And last, but certainly not least, Mr. Kevin Feasel.
Carlos: Mr. Kevin sounds like he’s a little under the weather today, so he might be a little slow on the take, but we’ll see if we can increase that voice inflection as we get going, here.
Kevin: The only question is, what’s my excuse for the rest of the year?
Carlos: Yeah. At least today, we’re providing you with some cover, anyway. Okay, so compañeros, one of the things that we have been doing at SQL Data Partners leading up to 2019 is focusing a bit more on Power BI, and obviously we’ve had Eugene on and he likes to talk about Power BI and he’s very knowledgeable about it, as is Angela. I thought, “okay, well, why don’t we start focusing or having some discussions on Power BI?” I know lots of you chiming in on LinkedIn, lots of the analysts and whatnot are trying to use Power BI and so we thought we would create some episodes specifically around that. Obviously, a lot of the things we’re going to talk about are going to be tied into SQL Server and that ecosystem, so the Microsoft Stack, so we’re not leaving that, but we are going to be focusing a little bit more on Power BI. And so ultimately today’s topic is going to be on Gateways, so we’re going to be setting some foundation, and then we’ll get into some of those pieces.
But before we do that, I have a couple of shout-outs. First to Victoria Parks, Tara Mason, Judy Dang, Joey Sheppard, Mariano Kovo, from Argentina, sending me some alfajores los Havanna. I enjoyed them very much. The only bad thing is that there are 6 in a box and there are 7 of us and so they went very, very fast.
Angela: Was somebody stabbed with a fork?
Carlos: Well, it was very close, but luckily, I just gave everybody spoons. I learned that lesson from previous attempts. Jules of SQL connected with us on Twitter, and Tim Maas, so thanks everybody, there. Other shout-outs, we were actually just in Nashville, well, most of us. Three out of the four of us were just in Nashville and Angela, do you want to give a shout-out to the folks in Nashville?
Angela: Absolutely, we did our first ever all-day session and we had some great people in there, including Meagan, Bryce, and Tracy. Those are all folks that used to work with Bill and they came out and took our class and it was a great time and they were wonderful.
Carlos: Yeah. Yeah, one of those brought their own data and we were able to utilize that a little bit and start asking them some questions about that data and then their eyes got real big and they’re like, “oh wow”.
Angela: Oh yeah, they were like kids in a candy store. They were like, “oh wow, this is so cool.”
Angela: Makes it all worth it.
Carlos: Okay, so in the What Have I Learned section of the podcast today, I’m going to say Oracle resources are hard to find. Some of you compañeros who might be listening, you maybe have tried to hire people, or you know, thinking, “well, SQL Server resources are sometimes hard to find”, and I suppose that’s true. Although a quick email to [email protected] might help you in that regard, just a little FYI. But I found it very interesting, I had not been working with Oracle for some time. At one time, Oracle was in my, well, I don’t know if in my wheelhouse is the right word, but I feel like I’d be–
Angela: It was adjacent, next to your wheelhouse?
Carlos: It’s adjacent, that’s right.
Angela: It was leaning up against it?
Carlos: That’s right. So I learned how to be a DBA in an Oracle environment from an Oracle DBA, but I have kind of divorced myself from that community, if you will, not that there’s that much of a community, and getting back into it was a lot more painful than I had imagined. So that was my experience, trying to get back in and luckily, I don’t know that we’re going to have to get back into it anymore, but we’ll see. Now Eugene had something.
Eugene: Oh yeah, so I started listening to The Phoenix Project, which is this book that’s basically like devops told in a story kind of style. And I was expecting it to be a bit cliched and rote, I’ve read The 5 Dysfunctions of a Team by Pat Lencioni and it was a good book, but it felt kind of contrived. Whereas, I just started listening to The Phoenix Project and I’m just getting war flashbacks from my last job, because this guy gets an involuntary promotion and the very first thing he has to deal with is “oh, there’s a problem with this upstream data system for payroll” and if he doesn’t get it fixed that day, people are going to be a day late. And I’m just thinking like, this time clock system that I supported that was in Access written by the VP and like, oh no.
Angela: Oh my gosh! Oh my gosh!
Eugene: Yeah, so I was expecting it to be kind of contrived and inauthentic and like the first two chapters I’m hooked because I’m just like, “oh, this is so painful. I’ve been here.” So I’m excited to see where the rest of the book goes.
Carlos: Yes, that is interesting. There are a couple of those. There was another fellow whose name I can’t remember off the top of my head and I was trying to look here. He wrote a book in a novel-type format, trying to help people learn Python of all things, and talked about going out to space and it was basically putting this problem in a story form, but then kind of having you walk through the process of solving it with them. They interviewed him on the .NET Rocks! podcast, and I thought that was kind of interesting. It didn’t interest me enough to try and pick up the book and buy it, and go, but if I wanted to learn Python, I would definitely, maybe start there.
Eugene: Yeah, it’s got my mom’s stamp of approval, which is impressive. So my new work commute is dropping my mom off at this adult daycare/Medicare kind of place and so we listened to it on the way and she was like, “this is really good”. I don’t know how much she understood of it, but it kept her engaged, so it’s got the mom’s stamp of approval.
Carlos: Hey, there you go. Okay, so time for a little SQL Server in the News. We’ve been a little behind, here, haven’t got as many of these out as I would have liked, and all of a sudden, 2019 is here and they announced at Ignite, and I then I think we’ve followed up at Summit, talking about these big data clusters. And admittedly, I haven’t gotten into them. Once again, it’s all in preview, you have to sign up for it and stuff, but these clusters seem to be all the rage. And the idea is, and I’m just going to read here from the documentation, is that this concept of big data clusters is going to “allow you to deploy scalable clusters of clusters of SQL Server, Spark, (which is all the hotness right now) and HDFS containers on Kubernetes.” Now I did just see another presentation on the Azure Kubernetes service and pretty impressive. I’ve been dealing with containers at a SQL Server level, and more for the test environment pieces, but now it looks like they are ready to start trying to deploy this into the production realms, and so it’ll be quite interesting to see what happens here. And then what caught my eye, is that while it’s not even available yet, they’re already talking about how to automate machine learning using these big data clusters and Spark. We’ve seen this creep of, well, I’ll use the word creep, maybe others would finally opening the floodgates, if you will.
Angela: Oozing? Possibly oozing?
Carlos: Oozing, yeah, of languages. We started with R and then Python, and now we’re going to get Spark and who knows what else and–
Kevin: And Java.
Carlos: Java, that’s right. So there’s XML and JSON in there, and it’s just one big happy family, these days, and so I’m interested to see where this goes. And ultimately, any time I hear about Spark or Analytics, obviously Kevin, I’m thinking about you. And I’m curious to know, lots of things going on, but is this something that you think people are asking for? What’s your thoughts here, on where they’re going with this?
Kevin: So this is, from what I’m seeing, a play toward data virtualization, which has gotten pretty popular with a number of platforms that are available, where I don’t necessarily care where the data is, we’ve given up on the idea of everything needs to go into one location or everything needs to run on one type of platform, and instead, are saying, “well, okay, there are business cases where it makes sense to run using something else.” Maybe we’ve got this historical ERP system in Oracle that we’re just not moving over, because it’s already there, the money’s spent. We’ve got some systems running SQL Server, there are things that are running, say, CosmosDB or MongoDB because we don’t really care about the data. See, even on the podcast I get to make fun of Mongo, it’s great. So, different platforms, different types of data, stored different ways, a lot of data in a data lake that’s hosted on HDFS or Azure Data Lake services or whatever. So being able to take that data, combine it all together and work with it, saying “yeah, I do need that ERP data from Oracle, but I don’t necessarily want to write PL/SQL. I need that data from SQL Server, maybe I’m more familiar with TSQL, so using what we saw in 2016 with Polybase connecting to Azure Blob Storage into Hadoop. What we’re seeing in 2019 with Polybase, “hey, I can connect to Oracle, Spark, Teradata, MongoDB”, and so on. This is part of a move towards saying, “SQL Server is your data virtualization platform.” This is where you’re going to think of that as your developer space and where the data actually lives, they’re all hidden behind external tables, or it’s all hidden as part of an interconnection under the covers with Spark, so you don’t have to think so much about, “well, now I have to write this code. I have to write a Scala function so that I can package it as a JAR file and drop it on my Spark cluster so I can execute this code, and then pull it over into here, and now I have to write DSON to pull this data out of Mongo and maybe have some wrapper in some other language so that I can connect those two together.” Instead of having to do that, you just think of it as, “well, I’ll write this one language here, I’ll write this TSQL query that goes and treats all of the data as though it were just tables living in SQL Server.”
Carlos: This is interesting. and it kind of almost gets into the concept that we’re going to be talking about today with Gateways, right? But the idea of having yeah, that single source, but then under the cover, so many ingredients, but a single connection, in a sense. An ultimate destination.
Kevin: That’s a strange analogy, but it works.
Eugene: I just wonder about what’s the performance when you’re dealing with data virtualization, cause if you’re doing batch stuff, okay maybe it doesn’t matter so much, but if it’s doing all this work to pull data from other places, I really do wonder how the performance works for that.
Carlos: Well, I think to Kevin’s point, or his example was he did mention the old ERP that’s not coming over, so I would imagine that in the beginning, let’s just say that your cut over was 2018, 2019, yeah, the year over year stuff might be a little bit troublesome, but come February, how much of that 2018 data are you really going to be pulling? It’s going to be less and less as time goes on.
Eugene: That makes sense.
Carlos: I mean, that doesn’t mean that somebody won’t still complain about it, but–
Kevin: Yes, performance is always a tricky bit with data virtualization, because there are a lot of factors on it, too, like how much money are you willing to spend? How good is your network connection? It’s generally not going to be as fast as native querying until maybe you get to a large enough scale where yeah, dumping this off into a Spark cluster and having that large Spark farm do the processing for me and bringing it back in, that may end up being faster. But if you’re talking about, “I’m going to go query Oracle and join data to DB2 within SQL Server,” you’re not going to get much in the way of predicate pushdown. Whereas, if all of that data lived in the same location on one of those systems, then you could. It’s similar to the problem we’ve had with linked servers for so long. Data’s gotta move somewhere and yes, the time that’s spent moving data is going to be a factor.
Eugene: I can see a lot of value in this, I just have a feeling that we’re at the peak of the hype-cycle and a lot of people are about to head into that trial of disillusionment and then we’ll kind of stabilize. But I feel a lot of the early adopter for this big data thing are going to be in a world of hurt, but we’ll see.
Kevin: Yeah, as far as Hadoop goes, we’re already seeing that slide down into the trough. I think we’re in the trough slowly starting to get out. The Cloudera/Hortonworks merger is an interesting data point in that.
Eugene: Oh, I didn’t hear about that.
Kevin: Oh yeah, so Cloudera and Hortonworks are merging. They’re going to be Cloudera, so they picked one of the names, but they’re coming out with the Cloudera data platform, so they kind of picked the other company’s nomenclature.
Eugene: Oh, interesting.
Angela: So we’re not going to get anymore elephants at the conferences? That’s what I’m hearing.
Kevin: No idea on that, but hopefully there will be elephants afterwards. They may not be green. They may be blue or yellow. But the merger is official and over the next about three years, we’re going to see them wind down the independent distributions and spin up their own central distribution that has the basics of Hadoop, plus Cloudera’s Data Science set, plus Hortonworks DataFlow, which is things like NiFi, so relatively easy integration processing for people who like point and click stuff.
Eugene: Very cool.
Carlos: Yeah, that’s interesting. So lots of news on the horizon, then. We’ll have to see what happens there. Okay, so on to our topic for today, which is going to be Power BI Gateways, and the links for today’s episode is going to be at sqldatapartners.com/powerbi or at sqldatapartners.com/158. Now, I suppose, knuckle-dragging Neanderthal that I am, I can start the questions, this way. For all of the hype, to use Kevin’s word, that we’ve gotten about Power BI, and oh, it’s only 10 bucks a month, ultimately what that means is, I can build the report, I can package the report and the data, send it up to the Power BI service, and that’s 10 bucks a month. But then it doesn’t take very long for one, I want to refresh my data, or two, I have too much data to push up or three, I have other sources that I want to access, or I need it to be real time, or any number of reasons when all of a sudden, that simple process no longer works. Well, I say simple, and all of a sudden you need a bit more infrastructure. And ultimately that’s where we get into Gateways. Am I wrong or am I right?
Eugene: No, that sounds good. I would say instead of calling yourself a knuckle-dragging Neanderthal, Kevin would probably say you’re filling the role of Socrates, where you say that all you know is nothing, and through the Socratic method of asking questions, truth emerges.
Carlos: Eugene, you are a scholar and a gentleman, my friend. There you go.
Kevin: I’ll allow this, because beforehand, I did joke about being the Greek Chorus on this one, so I can’t be Socrates. Alternatively, Carlos could be Bill and Ted and I can be So-crates.
Eugene: Either of those works.
Carlos: There we go.
Eugene: Yeah, so at a certain point you do need a little bit of infrastructure. It is easy to say, “okay, it’s just an opex cost of 10 dollars a month per user” and all of that kind of stuff. Now, in my experience, getting the Data Gateway set up isn’t a lot of work. I mean, it’s pretty much a straightforward installer and one of the nice things, too, is generally speaking, you don’t have to hassle any of the network guys, because it only needs outbound access.
Carlos: Well, that’s interesting you say that, because one of the issues that we’ve run into is they have so many different addresses and so there’s two options. One, you can add, it’s not quite that many, but it’s like 50 IP addresses that could change at any time, that you then have to somehow keep up with, or allow *@powerbi.com or whatever.
Eugene: Right, when there’s a number of ports, too, that they use, for sure. So I guess with the issue that you ran into, were you having to open up whitelist outbound access, or what?
Angela: Yeah, yeah, it was a super restricted network where they have to know exactly where that data is going.
Eugene: So to minimize the risk of data exfiltration, where someone’s kind of just like sending out to Dropbox all the HR stuff.
Angela: Exactly, exactly.
Eugene: No, I mean, it’s fair, but yeah, no, in that case, it is difficult, because like you said, there’s a lot of IP addresses, there’s a lot of ports, because I know from memory, they use 443 for SSL and then there’s like 6 different ports for the service bus piece, and I think there’s a couple others, so it’s very chatty, for sure.
Carlos: Yeah, so it’s easy, until it’s difficult.
Eugene: Yeah, and it’s interesting, too, because they make it a little bit too easy, because there’s two different types of Data Gateways. They’ve merged it a little bit, but there’s the regular Enterprise one, and then there’s Personal mode, which used to be called the Personal Data Gateway. the Personal Data Gateway, it’s weird, because it’s designed for Susan in accounting, she just made this report, she wants to deploy it, and she wants to be able to just refresh easily, and it runs on her machine, and it runs basically as her. And so, if you’re worried about data exfiltration, now you have all of these potential Personal Data Gateways, and you have no great way of auditing. Theoretically, you can do it through the Unified Logging Service or whatever with Office 365 and all that, but now you need to get knee-deep in some PowerShell or just do a lot of work. And so from–
Kevin: That’s not guaranteed.
Eugene: The government’s standpoint, it’s a little bit of a nightmare, but one of the things that confuses me is you were talking about R with machine learning and all of that kind of stuff. Well, right now, the only way that you can refresh any R scripts is with the Personal Data Gateway, so it’s weird.
Carlos: Oh, interesting.
Eugene: Yeah, I know. I have a blog post referencing it, because it tried to play around with it and ran into an error. So, it’s weird that they’re like, “yeah, this is just for testing and QA and don’t run this in production” and you try to install it, it’ll warn you, “hey, this isn’t for production.” But then if you want to do anything with R and you want to refresh it, you have to use the Personal Data Gateway. So yeah, it can get a bit messy, for sure.
Carlos: Well, and this is one of the great things and the challenges with our current environment is that it is changing, there’s lots of different things, so in fairness to Microsoft, I mean, we are talking about a couple of different options. So there is the Power BI service that we kind of mentioned. Lots of people don’t want to push that up, or at least they say they don’t want to until they’re like, “well, you know your email’s up there, right?” They’re like, “oh yeah, well, let’s just push it.” But yeah, so they want to host things internally so that we have to integrate from Reporting Services, and then there’s imbedded, so there’s a couple of different flavors. They’re trying to make all of these pieces work and then of course, there’s data virtualization, they have to kind of figure out and play with or think about in the future. Yeah, I mean, it’s amazing what they’ve been able to do, because of the complexity of this kind of shifting landscape.
Eugene: Right, it is definitely a very good resource for being able to pull data from different types of data sources, like I’ve been impressed with how easy it is to pull from this SQL database or these flat files wherever, .csv or some other type of database, so I will say, talking about data virtualization, they’ve done a good job of being able to pull data from tons of different types of sources or even online stuff like OneDrive.
Carlos: Oh, sure.
Eugene: They make that fairly manageable, which is nice.
Carlos: Maybe that kind of sets up to the next piece there, is okay, so now part of the function of the Gateway is to connect to all of those different data sources. Let’s talk about that experience. You mentioned OneDrive or SQL Server. If we were going to compare it to something, I’m thinking like SSIS, where I have multiple services, so maybe walk me through what that looks like or compare and contrast.
Eugene: Sure. It’s interesting, because whenever I compare SSIS to Power Query, I say that SSIS is this guy in this starch-pressed suit that wants everything to be in line and all of that kind of stuff, whereas Power Query is more in a Hawaiian shirt and just laid back and doesn’t care and doesn’t do any compile time checks for anything. SSIS, if your data type changes the slightest, it’ll yell at you real loud, whereas Power Query literally, it won’t check anything until run-time. It’s pretty laid back about stuff. And so whenever you’re dealing with the gateway, once you have it installed, it’s pretty easy to add data sources. You just go into powerbi.com, you find the gateway, you add a data source under the gateway, so you might say the name of the SQL Server, and then you just add the credentials that it’s going to be accessing as and in my experience, that’s mainly all there is to it, and you pick what type of data source you want to talk to and then usually the reports just kind of work. I think sometimes you may have to specify which Data Gateway you want it to pull from, but I’ve found it to be a pretty easy experience overall, thankfully, compared to my limited experience with SSIS, which is like, if you don’t touch anything, it won’t break.
Carlos: Just move your Excel file location.
Eugene: Well, again, I haven’t done a ton with SSIS, but at my last job, we had to output data from SQL Server to .csv files to upload to HP to say, “here’s what we sold”. Should be a really simple process, literally just a SQL query and a file output.
Eugene: That should be it. And what would happen is I would have some columns and I’d SQL query, they were hard coded. It’s just the name of our company and okay, the company name changed, so I need to change that one column. It’s not pulling from anything, it’s literally like in quotes, here’s the company name. I’d change it and it would say “Lineage ID 419 is no longer valid. Remap your columns.” And it should not be that hard, and I’m sure they have a good reason for all of that, but it was just such a hassle. Whereas with the Data Gateway stuff, it’s basically just looking to see, okay, what’s your data source, do we have the credentials, what reports are mapped to it? And it just seems to work, so I haven’t run into a lot of hassles with getting that stuff set up, which is nice.
Carlos: What about you, Angela? I know you’ve had a bit more experience with SSIS. Thoughts there?
Angela: Yeah, so the Gateway, it’s awesome. It lets you access on-premises data, it’s perfect. All you have to do is install it and it runs and you can talk to your data.
Angela: Well, not exactly. There are some caveats in there. One of which, kind of going back to Carlos’s point about Oracle resources. We had a client with Oracle and if you need to access on-premises Oracle, you’re going to have to install the Oracle drivers on your Gateway box.
Eugene: Oh, yeah, that’s a fair point.
Angela: And not just Oracle, but anything that’s essentially not SQL Server or flat file, you’re going to have to install drivers on the Gateway so that it can communicate to that data source. I think that’s a piece that is very commonly overlooked.
Eugene: Yeah, and related to that is figuring out where to actually put the darn thing.
Eugene: Because not only does it have to be able to talk to whatever your data sources is, and you may have a complicated network topology, but then, it’s kind of funny. You look at the actual program and you’re like, “well, this doesn’t seem very heavy weight.” And then you look at what they say for suggested resources, and you go, “why do I need a VM with 8GB of RAM just to run a data shuttle?”
Angela: And 16 cores? What?
Eugene: Yeah, I know.
Angela: It’s not 16 cores, it’s something else. I was just throwing that out there because that’s what it feels like, because you’re thinking, “all it is, is moving data back and forth, who cares?” But no, it’s a pretty beefy server that you need, comparatively.
Carlos: So to that point, they are asking a little bit of you in terms of what they’re expecting for the Gateway? I guess you’ve gotten into some reasons. It would not be unusual, compared to some of the other SQL Server environments that we’re looking at, but particularly if you’re coming more from the analyst role, or you’re not familiar, or even in Azure, you started dabbling in that and you’re thinking about DTUs and some of the smaller systems. All of a sudden, the Gateway can kind of seem like, “wow, this is all of a sudden big.”
Eugene: Right. You may not always have the resources you need for your regular workload. It was interesting. I think hopefully someone remembers, I think Brent Ozar had released some information from his Constant Care stuff, it may have been a different survey, but how much RAM people had, and it was on average like a 10-to-1 ratio compared to data size. So if you had 100GB database, you had 10 gigs of RAM or something like that. But there definitely are people that are lucky they’re able to get 8 gigs of RAM to run their SQL Server, and so having to ask for another 8 for this Data Gateway can be a big ask for those poor souls.
Angela: Oh yeah, that’s huge. I had one system guy who said, “yeah, SQL Server, the minimum requirement is only 512 megs, so you don’t really need anything more than that, and I just wanted to stab him in the eye with a fork.
Eugene: I understand. I really do. You can run it on a gig, it’s not going to run well.
Carlos: Sure. Hence, the term “minimum”, right? We need to get this thing off the ground, but we need to land it again.
Eugene: Well, it’s funny, even Windows Server, you can’t install with half a gig of RAM. You have to have 800 MB to install it and then you can bring it back down to half a gigabyte.
Carlos: Yeah, that’s funny. Now you mentioned something in terms of connection. You said which Gateway? Now that almost made it sound like in some cases you would want more than one Gateway. Did I miss something, there, or?
Eugene: No, no. So, that’s more of a technical detail. Technically speaking, you can have multiple Data Gateways and you can assign different credentials to them. In the real world, I’ve yet to run into a scenario where that made sense. But I can see two possible scenarios where that could occur. One is maybe you want some sort of scenario where you’re making a distinction between dev and QA and production and that sort of thing. So I could see something like that where maybe you want to have more limited access or different credentials for that, or if you’ve got some sort of network topology where it’s not all connected. So maybe you have different sites for manufacturing plants or something like that and you have multiple Gateways. So that’s a technical possibility, but I’ve just never run into a scenario where it makes sense to have multiple Gateways.
Carlos: Gotcha, and in that second scenario, you’re saying the Gateways would exist in different, let’s just call them buildings.
Eugene: Geographies, yeah.
Carlos: Geographies, but they’re not talking to each other.
Eugene: Yeah, the ideal scenario in any circumstance is that the Gateway is as close to the data as possible without impacting the server. So unless you’re running a tiny workload, you probably don’t want to run the Gateway right on the SQL Server. I mean, if you’ve got a 5GB database, you’re probably fine. But in a lot of cases, you want it to be reasonably close so that you don’t have a lot of latency in accessing the data. You don’t want to have to travel all the way from Vermont to Richmond, Virginia just to access the query. So I could see a situation where you have multiple ones for different geographies or something like that, but again, I’ve yet to run into a practical situation where that was necessary.
Angela: So that’s interesting that you say that, because you have to provide credentials for that and that’s what everybody’s going to be using, right?
Angela: So, then how does row level security come in, when you want to access on-premises data?
Eugene: Right. That’s a good question. Yeah, running into that with a customer and so it’s interesting, because for the SQL Server row level security, it’s just not an option, directly. So there’s three different kinds of ways that row level security could be implemented. One is just the SQL Server feature, I think they added in 2016 where it basically looks at who you are and says, “okay, you’re only allowed access to these rows.” And that doesn’t work because what happens with the Data Gateway is it just runs the same service account, no matter what, that you specified just as “oh, I’m actually Kevin, or I’m actually Angela or something.”
Carlos: So I think of this very much like a linked server in SQL Server.
Eugene: Yeah, exactly.
Carlos: So whoever has access to server 1, I create links for him to server 2. Whatever credentials you give them, everybody on server 1 gets to now have that access on server 2.
Eugene: Right, which can be a big problem. The two ways that you can get around that today is you can use Analysis Services so that it’s able to actually pass through and say “yes, I am actually Angela”, even though it’s just pretending. Now to do that, it needs admin access to that Analysis Services. And then the other option is you implement the row level security in powerbi.com in the online service. But the challenge there is hopefully you’ve got well managed Office 365 groups, because now you have to basically reimplement that whole security layer. Or you need to have a table that says, “okay, these people can access these geographies or these customers” or whatever. So yeah, if you’re using SQL Server row level security, you lose that robustness right out the gate with using a Data Gateway.
Angela: Yeah, and then what about those folks that aren’t accessing Microsoft data sources? What if you’ve got to talk to an AS400? What if you’ve got to talk to an Oracle? How does that work?
Carlos: Well, first, let’s have a moment of silence.
Eugene: That’s funny.
Kevin: In fond memory, the AS400. Rest in peace.
Angela: They are still alive and well.
Kevin: They’re undead, they’re not alive.
Eugene: Yeah, I don’t think there’s a good solution for that right now. I’d imagine that either you’d have to be working with data that’s it’s okay for everyone to see, or you’re just going to have to do it in the service, like with some of the other options.
Angela: Like create the table that’s got of the information in there, essentially duplicating all of your security. In yet another place and then maintain it.
Eugene: And hopefully it’s in a tabular format, right? Because some places, their logic isn’t something where you can just join to the security table. It may be a place where okay, this person can see these general leger accounts, but this person can see all of them or this person can see a certain subset. Some of that security is complex and conditional and not just, “okay you’re allowed to see these customers, let’s join to this table that’s a bridge table, essentially.” So yeah, it can definitely be pretty painful. Although I’m really interested in the customers that are invested in Oracle and have decided, “okay, we’re not going to go with Oracle’s business intelligence solution.” So it’s always interesting to hear why they decide, “okay, we’re going to try to make Power BI work” if they’re not knee-deep in the Microsoft Stack.
Kevin: So, I’m going to trash everybody today.
Eugene: Yeah, please.
Kevin: OBIEE isn’t very good.
Eugene: I’ve heard little bits of that, but I haven’t ever worked with it.
Kevin: Yeah, it’s better off not using Oracle BI Enterprise Edition, yeah.
Carlos: Yeah, so there could also be the cost implications. I mean, I think admittedly, one of the very attractive features of Power BI is the integration with Office 365 and you give your reporting users the ability to have that single interface. They log into email, then all of a sudden, I can get to the reports. I don’t have to have ala a SharePoint layer or some kind of presentation layer in addition. It all kind of works seamlessly, if you will and I think that’s a big plus for a lot of people.
Eugene: Yeah, I think, too, something that’s going to be a challenge, and maybe not directly related to Data Gateways but still relevant is query folding in Power Query. Power Query, when you’re dealing with SQL Server, is pretty smart in the sense that you’ve written this ETL and it’s going to be run by the Data Gateway and all of that kind of stuff, and if you have a filter, it’s smart enough to push that into the source query. So you say, “okay, I only want stuff from 2019”, it’s smart enough to push that into query and improve your performance. With Oracle and other systems, it can do a little bit of that, but you’ve got a risk that there’s not that essentially predicate push-down going on. So yeah, once you leave the Microsoft pond, working with Power BI does start to get a lot more painful.
Angela: So your data anywhere, any way, becomes a whole lot more complicated.
Eugene: Well, yeah, it’s very easy to have it any way when you control the whole pipeline. It’s very easy for Microsoft to make the integration with SQL Server spotless, but yeah, once you want a color for your Ford T Model to be anything other than black, it starts being a little bit dicey.
Angela: You’ve got to get out the spray paint cans, right?
Eugene: Yeah, pretty much.
Carlos: Okay, well, thanks again guys, for the conversation. I think, very interesting and a good look at getting started with Gateways. Our music for SQL Server in the News is by Mansardian, used under Creative Commons. We’re always interested in hearing from you, compañeros, about what you think we should be talking about, and you can reach out to us on social media. You can connect with us in a variety of ways.
Angela: So, you can connect with me, that’s Angela, and I am on LinkedIn at AngelaHenryDBA or you can get me on Twitter at sqlswimmer.
Eugene: Yeah, you can talk to me on Twitter at sqlgene, and you can also read my blog at sqlgene.com.
Kevin: I’m all over the internet. Just search for my name. you might find my ICQ number as well. If you do, please let me know, because I forget what it is.
Carlos: Yeah, and if it’s Greek, just skip that part. And I am CarlosLChacon on Twitter. That’s going to do it for today’s episode, compañeros, thanks again, and we’ll see you on the SQL Trail.