Do you want to be a data scientist? Do you even know what data science is? In this episode, we tackle the idea of being a data scientist and discuss the criteria and what it may take to get the job description of data scientist. So not everyone feels out in the cold, our guest Brad Llewellyn, talks about how this field has developed out of data analysis and how we may have much more in common than differences between us.
Brad’s video “What is a Data Scientist and How Do I Become One?”
David Eldersveld’s blog post for R and Python in PowerBI
“I want to say the cloud, in its massive computing powers, is really what’s brought [data science] to the forefront.”
“I don’t like to think of BI and data science as two different things. I like to just refer to both of them as analytics and they’re just different tools for the same job.”
“Pick up one of those easy to use tools that really helps you learn how data works, and then learn the fundamentals of data, and then you can transition into data science, because if it’s 80% ETL and data cleansing, then you need to know ETL and data cleansing to do data science.”
Listen to Learn
01:33 Compañero Shout-Outs
02:38 What Have I Learned
03:57 SQL Server in the News
07:07 Intro to the guest and topic
08:50 No one was talking about data science five years ago
10:45 Neural networks, expanded computing power, a combination
13:25 Low or no code machine learning
14:30 Are business intelligence and data science cousins?
17:06 Data scientist, data science technician, data science analyst, data science engineer, data engineer…differences?
21:38 An organization is ready for a data science project when…
25:40 Recommendations for those who would like to get into data science
27:16 Brad’s opinion on current low code/no code products
30:24 Prognostications on the next three years in data science
31:28 SQL Family Questions
38:58 Closing Thoughts
About Brad Llewellyn
Brad is a Senior Data Science Consultant at Syntelli Solutions in Charlotte, NC. Brad helps individuals and organizations unravel the Data Science world in order to figure out how it can work for them. He has an M.S. in Statistics from the University of South Carolina, MCSE Certification in Data Management and Analytics, MCSE Certification in Cloud Platform and Infrastructure and various MCSA Certifications in Business Intelligence and Advanced Analytics. Brad is an active blogger at breaking-bi.blogspot.com. He is also an organizer for the Charlotte BI Group, a local PASS chapter in Charlotte, NC. You can connect with him on LinkedIn at https://www.linkedin.com/in/bradllewellyn and on Twitter @BreakingBI.
Music for SQL Server in the News by Mansardian
Carlos: Compañeros! Welcome back to another edition of the SQL Data Partners Podcast. I am Carlos Chacon, and I am joined here with Angela Henry.
Carlos: And today’s episode, Episode 155, our topic is Data Science and our guest is Brad Llewellyn. Brad is a Data Science Consultant with Syntelli Solutions. He hails from Charlotte, North Carolina, so we’re looking forward to chatting with Brad today. Kind of an interesting discussion, I think part of what we’re trying to define in our conversation with Brad is what is data science, how can we learn the lingo and then obviously some steps that would be needed in order to get into the data science realm, if that’s something that we’re interested in. So yeah, I think Brad brings some interesting ideas to the table, and of course, we have Kevin and Eugene later on, in the conversation as well.
But first, we do have a couple of Compañero Shout-outs and we’re always grateful, compañeros, when you’re talking to us on social media. A few names to throw out today, James Dandridge, Nadir Doctor, Hailegziabher Dechassa. We actually met at SQLSaturday DC, and Hail, I apologize for not asking you how to pronounce your name, but thanks for connecting with me and it was nice chatting with you, there in DC. We’ve got Linda Groszyk, who will actually be a guest on our next episode. We’re talking to her about some interesting things. Connecting via LinkedIn, Syed Islam wants to hear more about PowerBI and Data Flows, so this should be an interesting topic, I think for both you and Eugene. I’m sure that will be something he’ll want to talk more about. I want to give a shout-out to Chris Albert up in Connecticut. Karel Tavernier thought we had some pretty concrete tips for developers, so thanks, Karel for passing that information along. In our little segment here, What Have I Learned, previously we were kind of calling this Tips and Tricks and so we’re kind of morphing this, or this is kind of growing into some of the things that maybe we’re working on. And so we had an interesting experience here, recently, where we had a customer install SQL Server 2017 and of course, now with 2017 they’ve separated SSRS from the install of SQL Server. They were moving forward, but they didn’t have their license yet and so when they went to apply their license, they applied it on SQL Server, but did not apply it on Reporting Services, because it seemed like it would be one and the same. And in the past, it would have been, but now in 2017, they’re separate and so they ran into some issues and that didn’t play very well. You know, it ended up all working out, but yeah, not quite as straightforward as you would have thought. So, if you’re using 2017 and you need to get your licenses in, you’re using an evaluation version when you go to actually implement those license, make sure you implement them on both Reporting Services and then of course, the database engine.
Angela: And be sure to do it before they expire.
Carlos: Yes, that being the key piece.
Carlos: Yeah. Okay, and so now, time for a little SQL Server in the News. This news actually comes to us from Vicky Harp and admittedly, this is a little bit dated at this point, but we haven’t talked about it on the program and so we thought we’d bring it up. That is the, well, at first what I thought was an announcement, but is a rebranding of SQL Operations Studio, which is now Azure Data Studio for SQL Server. It seems like those names are getting longer and longer.
Angela: They are, they are.
Carlos: Yeah, now one of the things that I thought was interesting is that Vicky says, “the research has shown that users spend an order of magnitude more time working on query editing than any other task within SQL Server.” Which I guess makes sense, because it just takes a lot of time to do. Sometimes there’s no magic button on how to write these queries sometimes. I mean, yeah, there’s scripts and things like that, but inevitably, when we get in there and have to start making tweaks, you have to know the data, you have to know the column names anddata types, all that kind of jazz and that just takes time. So ultimately, it’s the Azure Data Studio is going to be, or is, there are some enhancements that have been made to the Operations Studio and it is the cross-functional tool. While they don’t go into a whole lot of detail, they do talk about all of the management features of SQL Server Management Studio will be made available in Azure Data Studio, and the two products are going to integrate smoothly. And at some point, we’ll find out whether Management Studio just gets integrated into the Azure Data Studio or not, but it’ll be interesting to see. Now admittedly I haven’t played with it on the web, but Vicky posts here a picture from what looks likethe Azure portal and so Angela, have you used that or am I just looking at that wrong?
Angela: No, so what she, actually it’s not in the portal at all. This is actually the desktop tool, Azure Data Studio and they do more offer the ability to, more monitoring-type stuff so that you can see performance and what’s going on. I mean, the biggest thing is that it’s for Linux and Mac OS users, so there’s quite a disparity between the two products being Azure Data Studio and SQL Server. So you’re still going to spend a majority of your time if you’re an administrator using Management Studio, but if you want to check out performance and how things are running in your servers, Azure Data Studio is a better tool for that. It’s got a lot more bells and whistles built in for monitoring and kind of quick visualization things to see how things are going.
Carlos: Sure. Yeah, from a development perspective, those two things are where Azure Data Studio is trying to make its stand. So yeah, it’ll be interesting to see how this continues to develop and of course as the tooling affects us, what features will be available in one versus the other. And so yeah, it’ll be interesting to see what happens here.
Okay, so the show notes for today’s episode will be at sqldatapartners.com/datascience or at sqldatapartners.com/155.
Okay, Brad, welcome to the program.
Brad: Hi, thanks for having me.
Carlos: And we are joined today, as well, by Kevin Feasel.
Carlos: And Eugene Meidinger.
Carlos: It’s good to have you gentlemen along with me, today. So Brad, ultimately our topic today is Data Science and this is one of those very trendy buzzwords we hear a lot about. It makes a lot of our listeners, or at least some of our more traditional listeners, if you will, quake in their boots. Of how things are going to go and how the world’s going to change, and then the machines are going to take over everything and like what are we going to do in the future? We’ve seen it in movies, science fiction, and now we’re starting to see some of it in reality. Let’s just kick it off from the very basics. When you think about data science, give us the Brad version of what data science means.
Brad: That’s a great question. It’s hard to succinctly explain, but the way I look at it–
Carlos: In preparation for this, I actually went to the Microsoft Academy, and I’m like, “oh, well, let me see what they say.” And they basically said that thing, “oh, it’s hard to explain” and they never actually explain it. Boom. So here we go.
Brad: I democratize it a bit more and like to think of it as making decisions with data and at the more extreme end of it, allowing computers to make decisions for you, with data. So, I know that doesn’t distinguish it a whole lot from BI, but I don’t think there’s a nice clear-cut distinction between the two, anyway.
Carlos: Well, there you go. This actually transitions very well into Eugene’s question.
Eugene: I guess the question I have is what do you think was kind of the tipping point for data science becoming this big thing? Because statistics have been around for centuries, millennia, whatever, but it really feels like we weren’t talking about this 5 years ago. So, I’m curious if it’s just that companies are finally starting to adopt it or computers just got faster, or someone in marketing said, “oh, if we put science in front of it, people will actually be excited.” I’ve heard the joke that if you have to put ‘science’ after it, it’s not a real science, so like political science, library science, all that kind of stuff. I’m going to offend like a quarter of our audience. But what do you think was kind of the tipping point, because I feel like five years ago, no one was talking about this?
Kevin: Wait, wait, wait. Annoying and/or making angry half the audience is my job, not yours.
Eugene: Yeah, I’m sorry. That’s right.
Carlos: Oh boy.
Brad: That’s a tough question. I’ve got a somewhat interesting perspective, because that five years ago you’re talking about it exactly when I was just getting my first job.
Brad: So coming from a statistics background, where I’ve seen a lot of R and SASS and around like the biostats fields where that’s really popular and been popular for decades. And then come to the data field and everybody was doing SQL Server development and BI out of Excel and Analysis Services was just getting popular, too, so it did feel like this distinction. And now, for some reason it’s collapsed, and I want to say it’s got to be the cloud. It kind of has to be the cloud, just because that’s the one thing that popped up around that same time, and in my experience, statistics was kind of relegated to this, take a small sample and see what you can do on your laptop type of thing. And then the cloud kind of exploded that into, “oh, we can take our billion-row fact table and actually do data science on it.”
Brad: And so, from what I’ve seen, as soon as business has the opportunity to do something, they’re going to try to do it. So, I want to say the cloud, in its massive computing powers is really what’s brought that to the forefront.
Eugene: That makes sense.
Carlos: That’s interesting.
Kevin: I see a couple other factors around the expansion computing power, like the ability to make neural networks practical. Prior to about 2012, 2013, neural networks were an interesting idea that you really didn’t do because you didn’t have the computing power necessary for a large enough network.
Carlos: Now you’re going to have to explain to the knuckle-dragging Neanderthals of us on the podcast. What’s a neural network, Kevin?
Kevin: Do you really want to go there?
Eugene: Hey, wait, I’ll give the lay explanation.
Carlos: Here we go.
Eugene: I’m going to butcher it, and Kevin’s going to bite his tongue and start bleeding. So, a neural network is a mathematical model designed to be inspired by how brains work, even though it’s really very different. But you basically have a bunch of nodes that take these inputs and have weights on the outputs that they put out. And if you have enough of these nodes and you have enough hidden layers in between everything, then you can do some impressive deep learning, or you can recognize, say, symbols or faces or even map information. It’s a mathematical model of sorts.
Carlos: So the way that they put out like guess your agestuff. You put up your photo and it says, “oh, I think you’re however” like that kind of decision making?
Kevin: Sometimes that would probably use a convolutional neural network, because that would involve image recognition. For the purposes of today’s discussion, neural networks are magical black boxes where you put in inputs, you get outputs and you have no clue how you got the output from the inputs.
Eugene: It’s not even a joke.
Kevin: That is like the top level of neural networks. You can dig deeper and understand how these things work. You can understand how the weights work between nodes, how back propagation works.
Carlos: That sounds like another episode.
Kevin: But you don’t want that here.
Carlos: That’s right, okay.
Kevin: So, I think that like I said, neural networks picking up, that was an impetus, but I do agree with Brad that it was definitely an expansion of computing power.
Carlos: Right. The combination of those things, because ultimately, and again, I guess I think about R in the sense that it also made some of that statistics more approachable from the IT side. We’ve talked about it even on this program, so Eugene and Kevin, I’m sure, have both said, “hey, you’ve got to have some statistics in there,” but it’s not like it used to be. You don’t necessarily need the degree, anymore, in statistics, just enough to get by. And I think you know, that maybe that’s to your point of the neural network, but all of those things kind of coming together, I’m sure, created an interesting environment for some of this stuff to happen.
Brad: One of the other things I’ve seen that seems to get a little less air time is the rise of low code or no code machine learning tools, has definitely taken a bite out of the industry that wasn’t possible five years ago. Because now it’s accessible to people who don’t have that deep statistics knowledge, so it does expand the audience a little bit, similar to the way that BI does for business analysts.
Eugene: I’ll be interested to see where that goes, because I’ve played around with Azure Machine Learning, and they definitely make it very much a low code experience, but at the same time, you see a box that you can drag onto the canvas that says Random Forest, and then you go, “what is this?” Whereas, if you understand at least some of the different types of models and things, then it’s not so intimidating. At least for Azure Machine Learning, they make it very approachable, but you still have to at least have a ballpark idea of what you’re dragging around.
Brad: True. You should try something like Data Robot or the new Microsoft AI model builder that you literally don’t have to know anything.
Eugene: Oh, nice.
Brad: You just (?) in a data set and tell it what you want to predict and click the Go button and it gives you an answer.
Eugene: Oh, that’s awesome. Very cool.
Carlos: So, you mentioned at the beginning how data science was, maybe not an offshoot, but we had business intelligence before, and then now, data science has come into fruition. Are they cousins or are they getting any closer? I still feel like there’s organizations looking for that unicorn data scientist that can do everything, that’s not necessarily coming from the business intelligence group?
Brad: That’s a really interesting question, and honestly it brings up two separate points. I don’t like to think of BI and data science as two different things. I like to just refer to both of them as analytics and they’re just different tools for the same job. We’re trying to use data to solve a problem, and whether solving that problem is just by “hey, give me a little bit of information and let me make a complex decision” or the decision is simple enough I can teach a computer to do it. Either one of those is kind of the same problem. The other side of that is talking about why data science resources aren’t coming from the BI teams and personally and the number of businesses I’ve had the fortune of interacting with, I think that’s a problem, honestly. There are a lot of data science teams that are being shut down at a lot of organizations because they’re very expensive and not providing a ton of value. And I’d argue most of that is because they’re taking pure statistics developers and making them team leads without looking back at the data industry and saying, “hey, the BI team’s been doing this for 20 years. They’ve been helping us make decisions with data for 20 years, why are we not inviting them to the table?” One of the things I say is the leader of your data science practice probably shouldn’t, at least for the next couple of years, be somebody who comes from a pure statistics background. It should be somebody who has good knowledge, but also a lot of experience in just traditional data and how to make that work in production.
Carlos: Interesting. So, not to put you on the spot here, Kevin, but you kind of fit that mold a little bit, right?
Brad: Aren’t we all just self-fulfilling prophecies? We all look at the world so that we’re special? Unfortunately, that’s human nature, so yes, and I try to not make that cloud my judgement, but we’re all human.
Kevin: Yeah, my background is a little bit closer to a mix, because I do have a pretty strong stats background prior to data work, but I think that’s an interesting point, where it’s not necessarily the knowledge of statistics. I think that if you’re leading the team, you probably want to be the domain expert. You really want to know what’s available and be able to give people who are specialists in an area enough insight that they’re able to draw conclusions. Which kind of leads me into a question for you, Brad. What are some important skills for getting into data science if you don’t necessarily have to have that stats background to be a lead, but you say, “I want to be a data scientist. I may not necessarily want to lead a team, I just want to be doing something really cool.” What kinds of skills will get you there?
Brad: That’s probably the most common question I ever hear, and it’s still a great one. The one question I’d ask back to you is what do you call a data scientist?
Kevin: The question I’d ask back to you is what do you call a data scientist? Well, actually, my answer for what a data scientist is, is a data analyst working in San Francisco.
Eugene: That’s true. So I think that it’s probably important to make a distinction.I don’t have an answer to this question, but I get confused on it all the time. And I think part of the reason I get confused is, the analogy I like to think of is, my wife works as a pharmacy technician and they go, “oh, she’s a pharmacist” and I say, “no, no, no. She didn’t spend $70,000 to get a PhD.” She walked into Walmart Pharmacy and said, “hey, can I have a job?” What would probably fall under a true data scientist is someone who has a strong mathematical background with stats and that sort of thing. They have a strong understanding of data and some of the technology around that, and then they have a very strong domain expertise. That’s kind of your pharmacist level and that’s the person that’s making $150,000 a year. But I think that a lot of what us regular folks are trying to do is more like being a data scientist technician. Just like my wife, she understands a lot about medicine, she’s able to fill out the bottles and things, but she’s legally not allowed to give out medical advice. So, I suspect that there’s kind of this bimodal distribution with these two humps and you have a lot of us being kind of data scientist technicians. And then you have these people who actually have their PhDs and have spent years understanding statistics and they’re way over at the other end of the spectrum.
Brad: That’s a really interesting point. I actually had a boss I worked with. He unfortunately left the organization a few months ago to pursue other paths, but he described it as there are two main types of data scientists. Data science analysts, who are what we normally think of as “traditional data scientists” or like you said, “true data scientists”. They’re the ones that have that really deep statistical background, the really strong business knowledge and turn data into insights. Then there are data science engineers. People who have a fundamental understanding of what the analysts do. May not be able to do it as deeply or as strongly as they can, but know how to take what the data science analysts create and then integrate it into the data ecosystem. So you think of it like the data science analysts are the people that understand the deep learning and the neural networks and how to build all those predictive models. And the data science engineers are your BI developers and your big data developers and your DBAs and the ones who make it work in production.
Kevin: As a notational follow-up, and I promise that if you have a different answer, I will explain mine. When you say data science engineer, is that different from data engineers?
Brad: I have no idea.
Kevin: Okay. So, my split is that data engineering is affectively what people have been doing for 20 years in the BI space. It’s ETL, it’s data cleaning, it’s data processing. The technologies are a little bit different because, oh, I’m using Spark and I’m using Kafka instead of using Integration Services and Service Broker, or I’m using ELT instead of ETL, but the general concepts are still the same. I’m a data plumber. I take data and move it from one place to another place. I make it fit through these pipes. I get it into a place where people like it and I try not to mix the septic pipe with the fresh water pipe.
Brad: That sounds pretty similar to my explanation, too, it’s just a nomenclature difference, because people want to put the word ‘science’ in their title.
Eugene: Right. Well, it’s interesting, I think a lot about internet and ISPs and the whole last mile problem, where it’s really, really efficient for them to be able to get background networks all over the place, but it’s that last mile to your Uncle Rick’s shack in the middle of the woods that’s a challenge and why he can’t get internet. And the way I kind of make the split a lot of times is data engineers have to deal with the majority of that data pipeline, from start to finish, and then the data scientists deal with that last mile or if you’re thinking of like a hose, that last inch of the data.
Carlos: That’s interesting. Yeah, particularly if you start thinking about trying to get subsets of data and whatnot. I like that analogy.
Kevin: Yeah, so Brad, you mentioned that organizations, some of them are pulling back because they spent a bunch of money on data science teams didn’t get what they expected and said, “all right, we’re going to start cutting.” To me, that sounds like an organizational maturity problem. A, do you agree?
Brad: Yes, absolutely.
Kevin: Okay, B, good, because since you agree, I can do my follow-up question. What are some tells that an organization is ready for a data science project? More than I’m going to do some data engineering, I’m going to do some data cleanup and maybe some light analysis or reporting, which organizations have probably been doing for a while, but to take that next step into a real data science project?
Carlos: So you’re saying that because the CEO said so, it doesn’t qualify?
Kevin: I enjoy getting sacks of money as much as the next person, but it doesn’t necessarily mean the project’s going to succeed.
Brad: No, that’s a great question, and honestly, it comes down to a problem that’s been solved for decades now, and the problem is adoption. One of the questions I always like to ask organizations whenever I get brought in and they say, “okay, I want to predict what my sales are going to be next month.” My question is, “do you know what your sales are this month? Do you know who you’re selling to?” Because if you can’t answer those basic questions of what are you doing right now, then telling us to predict what you’re going to do tomorrow, there’s not going to be any trust in those numbers. If executives don’t trust the numbers, they’re not going to use them, and if they don’t use them, we might as well have not done the job at all.
Kevin: Very true. That kind of ties in with the idea of hey, you’ve got this descriptive analysis, which some companies have gotten pretty decent at, and then you can’t really do predictive until you understand descriptive. And you can’t do prescriptive analysis, where a machine is recommending you perform certain business actions, until you get the predictive part down, that there’s a maturity model involved.
Brad: I agree 100%, and that’s exactly the same way I see it, too.
Kevin: And I think there’s also an issue where you’ve got to have some of the data available. You have to actually have the information. Like being able to say, “well, what are my drivers for future sales?” We can build models, but unless we have good data and good features, the best models are going to give you bad results. At best they’ll be slightly helpful while explaining a small percentage of variance. At worst, they will be actively harmful and attribute to unrelated factors and lead you down the wrong path.
Brad: One of the things I always like to think of, and everybody here is some flavor of a BI developer at heart and I like to think of BI as having one job. BI runs around in circles so that we can eventually figure out what data we have and what it looks like. That’s 90% of the data science problem is figuring out what your data is, where it lives and what it looks like. So, if we can solve that as part of our BI pipeline, the data science pipeline becomes really, really lean and efficient, and that’s the way I like to recommend that people go about those types of problems.
Eugene: That’s the thing that kind of shocks me is you look at surveys of data scientists and again, somebody slapped the word science on stats to make it sound like a really sexy job. And you realize that 80%, 90% of the work is still just data cleaning, ETL, that sort of thing. It’s like being this rock star that maybe you’re playing in front of this big audience once a month and then the rest of the month you’re just practicing in your garage or something like that. It surprises me, consistently, just how much grunt work is involved with doing data science.
Carlos: Yeah, you know, you’ve got to, practicing would be– tangent slightly, looking at taking cello lessons. I mean, that’s one of the things that the lady was like most of the time you’re going to have the cello you’re going to be by yourself, practicing, alone. You’re only going to perform a very small percentage of the time. And so yeah, it still takes a lot of work to get to the point where you can have it prepared for concert time, if you will.
Kevin: Yeah, on the topic of practice, do you have any recommendations for ways for people who are interested in the data science portion to get some of this practice?
Brad: Sure, sure. I’ve actually got a really long, hour long YouTube video on exactly that question. “What is a data scientist and how do I become one?” if anybody wants to look into that.
Carlos: There you go, we’ll make sure that that YouTube video ends up on the show notes page for today’s episode.
Brad: Sure, sure. So, it really depends on where you’re coming from. The recommendation I make is anybody who’s coming fresh. If you’re straight out of school or you’re doing a career transition from a non-technical field, I always say start with traditional data. Pick up something like, and I’m going to apologize for this, PowerBI, maybe?
Eugene: I’ve been restraining myself this whole podcast.
Carlos: You said the word.
Brad: Pick up one of those easy to use tools that really helps you learn how data works, and then learn the fundamentals of data, and then you can transition into data science, because like you said, if it’s 80% ETL and data cleansing, then you need to know ETL and data cleansing to do data science. And then once you understand the fundamentals of data, learning how a neural network works isn’t all that complicated, once you have the fundamentals down.
Eugene: Well, and like you said, the trend is that they’re adding in low code, no code ways to use machine learning and artificial intelligence. They just announced that for PowerBI, that they’re going to be having integrated options for artificial intelligence, and that they’ll also have integrated options for Azure Machine Learning. So, I suspect that they’re actively working, you know, not just at Microsoft, but at other companies, too, to make that on-ramp smoother and smoother.
Kevin: On that no code or low code topic, what’s your opinion of the state of current products in that space, Brad?
Brad: Oh man, they’re not as mature as they should be, and honestly, I blame industry for that one. I don’t blame the developers. There are a lot of organizations, I know Microsoft has created Azure Machine Learning Studio a few years ago, and I know there’s a few other products out there that the competitors are creating, too. The issue is that despite the fact that you put it in the hands of a skilled BI developer, they could solve 75% of your business problems with it, organizations just refuse to use those, because we have data science teams being lead by R developers who say, “I must use R to solve this problem.” So, I think those tools aren’t getting nearly enough investment. I actually asked this exact question to the Azure Machine Learning team at Microsoft when we were at PASS Summit last week, and to my heart’s joy, they said Azure Machine Learning Studio is not being deprecated and there may be investments in the future. But for the time being, it’s still where it was a few years ago. I’m crossing my fingers.
Eugene: Yeah, does it still look like the old Azure portal?
Brad: Yes, it still looks like the classic Azure portal.
Eugene: Oh, my goodness.
Carlos: Oh yikes. Yeah, slap a PowerBI interface on that thing, man, come on.
Eugene: It’s funny that Microsoft is taking the two kind of traditional data science languages to R and Python and just shoving them anywhere that they can. So you’ve got it in SQL Server, you can include R steps in PowerBI or even R Visualizations and I believe they just recently announced that R Analytics are coming to the Azure SQL Database. So, it definitely seems like they’re going, “oh, you can bring a data scientist to water, but you can’t make him drink unless you have R available and then, you know.” That seems to be their strategy right now.
Carlos: So stepping back and I don’t work for Microsoft, obviously, but I think they want to be taken seriously as an analytics environment and so R kind of won that battle, at least in the beginning. And so now they’re like, “okay, well, you know, let’s put it everywhere, because we want it to be baked in to everything,” So that way, end to end, if you will, data science or analytics is possible.
Eugene: So, this may be a bit of a tangent, but I think it would be good for the show notes, so David Eldersveld has a post about improving performance with R and Python in PowerBI and the stuff that the engineers did to integrate the two is just goofy as heck. They’re connecting over like ADO.NET, so it can be like 10 times slower using R that way, but then also it’s not directly integrated. It’s literally there’s a system folder that PowerBI’s making and it’s just writing everything as .csv files. So literally, it’ll take the data from Power Query, dump it into a .csv file and then have R process it and dump it back out to a .csv file. you have to be careful what you’re doing because it can get really slow. I was gob-smacked when I found out that they’re just writing everything to disc.
Carlos: Well, there you go. I’m sure they’ll eventually, you know, do that in a different way, but yeah, sometimes those initial tools can be a little clunky. Okay, you’ve mentioned some of the tools in the machine learning, that hopefully there’ll be some investment, so prognosticate for us. We’ll go three years out, because that’s probably way too far, anyway. What’s the next three years look like for those of us who are working with this stuff?
Brad: So, what I would really like to see, I get a strong feeling that it’s going in this direction. We’re going to see more integration of the automated machine learning tools. The ability to seamlessly productionalize R and Python code has gotten pretty mature at this point. There are a lot of environments you can just drag your serialized models in and voila, it works. You can’t get much better than that, other than just adding power. So, the next step is definitely going to be these tools like Data Robot or Driverless AI. Somebody’s going to directly integrate that into their database system and it’s going to matter. It’s going to allow you to just through a SQL syntax call autoML from your SQL statement and voila, prediction pops out. The first company that does that is going to have some serious power.
Carlos: Wow. Interesting. Should we do SQL Family?
Brad: I’m down.
Carlos: Okay, Brad. What’s your all-time favorite movie?
Brad: I think you meant to ask what is the best movie of all time? And the answer is obviously Shawshank Redemption.
Eugene: Them’s fightin’ words. That is a good movie.
Kevin: I actually did at PASS Summit, my talk on Spark, I used the MovieLens dataset, which admittedly is a little old at this point, but Shawshank Redemption is definitely one of the highest rated films of all time.
Eugene: Yeah, it’s at the top of the imdb.
Carlos: Really? Interesting.
Kevin: And, it was filmed in Ohio, as I recall.
Carlos: Yes, the mother state.
Kevin: Mansfield, Ohio, the home of great movies.
Carlos: Now, was Field of Dreams, that was in Iowa, I think. It took place in Iowa, I don’t know where it was filmed, though. It doesn’t matter. Okay, I digress. City or place you most want to visit?
Brad: So, my wife and I are actually really big foodies and the one thing that we really want to do at some point is go to Europe and just start taking the train around Europe and eating in every country.
Eugene: Oh, nice.
Brad: Getting nice native food from all the different countries.
Carlos: Oh man, very nice, yes. It takes me back going to France, walking down the Champs-Élysées, I mean, that’s kind of a long road anyway, but then it didn’t help that I had to stop at every single pastry shop and try something. So yeah, good times. Okay, so speaking of food, a food that reminds you of your childhood?
Brad: That’s actually an interesting question. My parents almost never cooked when I was a kid. I ate out at restaurants almost every night of the week for my entire childhood.
Eugene: That’s got to be expensive.
Carlos: Yeah, wow, it’s good to be Brad.
Brad: No, well, we were eating at the Mexican restaurant for $5 every night, so I’m not sure that’s the– but one of the only things that I ever saw my dad make, because my dad never cooked. I didn’t even know that men could cook. It wasn’t a thing. The one thing he could cook was spaghetti, and that really kind of opened my eyes to hey, maybe guys can cook and I’m really glad, because I love cooking now. I cook all the time.
Carlos: There you go. Interesting, very cool. It’s a nice skill to have, cooking. Tell us about when you first got started with SQL Server.
Brad: I’m sure most people have these grand stories about amazing things how they got started and mine is less so. I got a master’s degree in statistics because I was like, “yay, I love math and I hate proving things, so I just want to give people answers and not make me show my work.” That’s what statistics is.
Eugene: So neural networks all the way? Here’s this black box.
Brad: And I got out of school and this is like 2012 at this point, so the job market’s not great and I just absolutely, positively could not find a job in statistics. So, the first job I was able to get was a consulting firm was going to pay me to develop Tabular reports, and that was my first real job was developing Tabular reports as a consultant and that really branched my way into traditional data and SQL Server. I look back very fondly on that because it set my career on the path it’s on now, and I’m super thankful for that.
Carlos: Sure, yeah, isn’t that strange?
Eugene: I think it happens to a lot of people. There’s that whole phrase about accidental DBA, and I know, for me personally, I could spell SQL and suddenly I was the DBA for a small company. So, I think it happens a lot more than people might expect.
Carlos: Now, in that time, and I guess I’m interested, we’ve been talking a lot about the analytics side of things, but if you could change one thing about SQL Server, and we’ll expand this, potentially to the tools you’ve mentioned how it needs updating on the machine learning stuff, what would it be? One thing you want to change?
Brad: One of the things that I’ve been really impressed with about Microsoft lately is how much effort they’ve put into integrating external systems with the R and Python integration. And now they just announced Java for 2019 and they have PolyBase, too, which is the starts of a virtualization environment. We were talking about before, we’re talking about data science, I’m really waiting for Microsoft to just keep implementing all of these external tools and data sources and applications into SQL Server to create this true Enterprise hub. You could create this place where instead of an Enterprise architecture being 75 different resources, it should all focus around one environment, so everybody goes to one place, and then that environment manages where you need to go elsewhere.
Carlos: Oh, okay, now this is an interesting question, but do you feel that Cosmos might be that future place?
Brad: It certainly could be. I mean, Cosmos has the advantage over SQL Server of being multimodal, and one of the really interesting things that they’ve done is, I think it’s if you have your CosmosDB set up under the graph API you can still create with SQL. I think that’s the relationship, or it goes the other way or something.
Carlos: Yeah, I know they’re doing all kinds of crazy things over there.
Brad: Yeah, being able to put literally any type of data in your database and then query it with any different language you need, and being able to integrate automated machine learning tools and being able to integrate reporting tools, being able to integrate semantic layers, all of that onto a singular source that you can query in any language? That would revolutionize data, for sure.
Carlos: Yeah, I’m sure there’s a migration piece of it, but I kind of feel like, as I read the stars, if you will, it kind of feels like Cosmos is that future place. Now again, there could be a conversion piece, but if they’re like, “hey, we can accept your data as is”, well, then that makes it a lot easier. Anyway, yeah, that’s going to be interesting to see how that shakes out. Okay, the best piece of career advice you’ve received?
Brad: Nobody’s ever said it to me using exactly these words, but the one piece of career advice that I’ve learned over the years is if you don’t invest in yourself, nobody else will, either. One thing I always tell people is if you’re working your 9-5 and going home and not doing anything else, then your career is going to stagnate. I fully understand if you want to invest as much time as possible into your family or into sports or into your craft or whatever, because those are awesome things, too. If you want your career to be a strong point of your life, you’ve got to put in more than just the 9-5 by investing in yourself through training and getting out and presenting and speaking to the community and stuff like that are all great ways to.
Carlos: Right, luckily the compañeros on this podcast are already ahead of the game in that whether they’re walking their dog or driving to work or wherever they might be, they’re trying to engage in conversation that they may not be able to have elsewhere.
Kevin: Googling videos on neural networks to learn how they really work.
Carlos: Or that. And then coming back and be like, “yeah, we like the black box approach better.” Okay, our last question for you today, Brad, if you could have one superhero power what would it be and why do you want it?
Brad: Oh man, I would imagine most people say things here like flying and super strength and all of those traditional superhero powers that you’d see in the movies or the comic books. But honestly, if I think about it, the one thing that would solve all of my problems in life, if I had the power, was the ability to instantly empathize and cooperate with anyone I need to, just instantly, would literally solve every problem in my life.
Carlos: So, that kind of sounds like a I need to control everyone kind of a power.
Brad: Well, I suppose we could go all, I don’t know if you guys watch the shows, but Killgrave from Jessica Jones had that power, and it was awful. He hated it.
Carlos: Interesting. Well, awesome. Brad, thanks for being on the program, today.
Brad: Thanks for having me, it was a great time.
Carlos: Kevin and Eugene, as always, gentlemen, thanks for joining us.
Eugene: Quite welcome.
Kevin: You’re welcome.
Carlos: I’d like to thank Brad again for coming on and chatting with us. Again, kind of an interesting conversation. We always enjoy getting other folks’ take. One thing that I know where I was separated from the rest of the pack was this idea of the neural network, so not just a science fiction term, but so thanks Kevin and Eugene, for helping me set this straight. That is going to do it for today’s episode, compañeros. Thanks again for tuning in, whether you’re driving your car or walking your dog or whatever you happen to be doing, thanks for taking us along. If you’d like to reach out to us, we’re always interested in getting some of your feedback on social media.