There has been a remarkable shift introduced in the new features of SQL Server–programming languages have been embedded into the database. This has led to a few “What are you thinking” questions from myself and others. It also opens new opportunities for those working with SQL Server and in this episode, I chat with Andy Roberts and Chris Hyde and discuss the new features of R and Python–why they are included, how data teams are changing, and what this means for the rest of us. As someone who does not and has not ever considered themselves a programmer, I don’t try to tackle any of the technical challenges of the language. We stay safe on the side of ideas, process, with a sprinkle of installation and setup.
One of the most compelling ideas from this conversation is the democratizing of data. Sure, this is not a new concept; however, now with a programming language in the database it will force a thinking realignment for those that traditionally called themselves gatekeepers. Where CLR couldn’t quite do the trick, I think the introduction of these languages is going to require increased collaboration with teams and force administrators to up their game as they tackle challenges of data distribution and data consumption.
I am interested to see what lies ahead and how consumers will use these new features. We already have some insights into R and while I won’t call it a smashing success—it is certainly useful to those who know how to take advantage of it and those numbers appear to be growing. With Python, I think we are increasing the breadth of those who can take advantage of analytics in the database, which I think only bodes well for those who enjoy working with SQL Server.
What about you? Is your team trying to implement R or Python? What new skills have you had to learn because of these changes? Hit me up on social media or in the comments below.
“I think that one of the main beneficiaries that doesn’t get a lot of press is actually pushing the predictions out into an edge server that’s not your main data science server.”
“Getting people into SQL Server and then executing R and Python code, which is kind of the language of data science, allows people to solve some of these problems in a little bit more team oriented way.”
“It becomes more and more important to be able to work within the team system, so you need to know enough to be able to talk to the data scientist on your team.”
“Analytics is an increasing part of the [business intelligence] pie.”
Listen to Learn
02:11 Compañero Shout-Outs
04:00 Tips & Tricks
05:11 SQL Server in the News
07:41 Intro to the guests and topic
10:18 R is a data-focused language
13:48 Who is the biggest beneficiary of these features
17:22 Who is responsible for creating the model
20:44 Did you start out on business intelligence
22:38 Domain knowledge is going to separate technology folks
24:15 You need at least a base-level knowledge of statistics
27:24 Where should all of this data reside
30:28 Are more new features coming
31:35 Analytics are an increasing part of the pie
34:44 Technical components of R and Python
36:00 Are the Python and R files sitting outside the database
37:57 Install packages properly so everything is picked up
39:12 It’s easy to find resources to learn data science
40:51 SQL Family Questions
49:46 Closing Thoughts
Microsoft’s AI Courses: Aischool.microsoft.com
About Andy Roberts
Andy Roberts is a currently a Data Platform Specialist for Microsoft in the Northeast district and has spent 12 years as a consultant for Microsoft Consulting Services. As a consultant Andy assisted customers in implementing mission critical OLTP and DW/BI solutions on SQL Server. Andy wore many hats as a consultant including application developer, solution architect, mentor to development teams, mentor to DBAs, BI lead, SCRUM Master, and the guy-that-knows-a-bunch-of-stuff-about Microsoft.
Chris Hyde is an independent SQL Server BI and DBA consultant based in Albuquerque, NM, and is the leader of the Albuquerque PASS local user group. He is a part of the Friends of Redgate program and was recently named in the Idera ACE class of 2018. He loves loud music and cricket, but not usually at the same time.
Music for SQL Server in the News by Mansardian
Carlos: Compañeros! Welcome aboard the SQL Trail once again. It is good to be with you. From wherever you’re joining us, thanks for tagging along, whether you’re on your way to work, trying to drown out some other noise, who knows, walking the dog, wherever you might be, welcome. It’s good to be with you. So today, our topic is Developing in the Database. Ultimately, we’re talking at a high level about R and Python. And so these are, I don’t know that they’re necessarily introductions, either. We’re not going to get real deep into the technical pieces. One of the thoughts that I had was, as these technologies have been made available is that they are impacting me as a DBA. There’s more that I have to know, and I think one of the things we’ll get into in the conversation is that they’re starting to tear down some of these walls and they’re forcing me, as a data guardian, if you will, to interact more with other members of the team. And so my thought here was I wanted to see how these technologies are changing that role a little bit, but then also, how they’re being used, under what circumstances it might make most sense for adoption, things that I might need to consider as my team starts to adopt them. So that’s the framework, that’s the idea, or the way that we’re coming about these topics today. We’re happy to have Andy Roberts, who is with Microsoft and Chris Hyde, who’s with Hydrate Consulting, with us today.
But before we get into more of that, we do have a couple of compañero shout-outs I want to get to. My friend James Rhoat hit me up. We were chatting about various different things and he’s actually trying to get into a bit more of the BI analytic side and we were talking about a couple things and he mentioned he’s working on a project, and he’s calculated that he has an identity column, and it’s an integer and it’s going to approach the limit here in a little while, so fun times for James. I’m sure many of you compañeros can identify with that experience, and then having to deal with it. so hopefully we’ll get an update from James and let us know how that went. Shout-out to Eric Wentz, reaching out from LinkedIn, saying he’s learned a few things by listening to the show. Now compañeros, I’m not sure which I’m more shocked to hear, one, that he actually learned something from the program, or two, that he admitted it. No, just kidding, Eric. Thanks for tuning in, glad that you reached out and thanks for saying hello. Also want to give a shout-out to Rob Soto. Rob has been hit hard. He’s in Puerto Rico and we’re still thinking about you guys, those who tune in, and wishing you the best. I do not have any family there. My family’s from Costa Rica, which many people confuse with Puerto Rico, but I actually don’t have any relatives there, but I do keep in contact with many of the Latin members in the area and some of them are really struggling. The hurricane for us, who weren’t affected was six months ago, but they’re still dealing with some of that, and unfortunately some of the businesses are closing because it’s just taking so long to get back to normal and the tourism industry, for example, has been hit so hard. So for those of you in Puerto Rico, we are thinking about you and we wish you guys the best.
So, I know, compañeros, we’ve been talking about Tips and Tricks and we were going to put this together and I am just a bit behind on this, so I’m trying to get things together. I have not been doing very good at this and I don’t know that I recognized the amount of effort, potentially, that it would take to put this together each week. Now, granted, I’m trying to add some video into that, so that’s complicating my life a little bit, but thank you for sending those in. We are collecting them, still, but I will try to be more diligent, but probably like in every other episode-type idea is probably what we’re going to be on track with for the Tips and Tricks.
So as far as the SQL Trail Conference, the only announcement that I have for this week is that I am making invitations to several speakers to come and join us. these are going to be our subject matter experts in a number of different fields. Of course we’re going to be using them to help us build an agenda, but ultimately, they have a wide variety of knowledge, and we’re going to try to tap into that in the way that we select our sessions once the conference actually begins. So more details will hopefully come next week once we get the dates sorted out, as we’re doing our site visits this week.
Okay, so now it’s time for a little SQL Server in the News! So Ginger Grant, who was our guest all the way back in episode 52, she was Tweeting this week, I happened to catch it and she pointed me to Microsoft’s AI School, or the Artificial Intelligence. And this reminded me a little bit of a story I heard back when the 64-bit architecture was just coming around. And there were vendors, like Intel and whatnot, the hardware manufacturers were ready. They were ready to move forward with this and obviously if you were on the cutting edge at the time, you needed the bigger memory allocations and all the things that came with a 64-bit architecture. But it just wasn’t moving very fast, and I remember one of the vendors mentioning that they were just kind of waiting for Microsoft to get into the training game, because they were the ones who were going to try to, basically, help everyone get to the next level. And I kind of wonder if we’re in that same vein or that same idea with the artificial intelligence. So admittedly, I didn’t realize that this was available, but you think about it with all the edx courses and whatnot that they’ve made available, it shouldn’t be not so surprising that they have some trainings geared specifically to artificial intelligence. And from the website, it says “Dive in and learn how to start building intelligence into your solutions with the Microsoft AI platform, including pre-trained AI services like Cognitive Services and Bot Framework, as well as deep learning tools like Azure Machine Learning, Visual Studio Code Tools for AI, and Cognitive Toolkit.” So all of that is at aischool, so the letters a, i, school, dot Microsoft dot com. And so, yeah, I took a peek there. So admittedly, that’s not something that I’m jumping into just yet, but I know lots of you compañeros are, and I think that’s something we’ll continue to explore. And we actually touch on it in this episode. Well, not so much the artificial intelligence, per se, but we talk about the influence of these other technologies and how they’re affecting us. and so, like I mentioned, Ginger was a guest all the way back in episode 52. That was when R first came out, and we did get a little more into the heavy lifting, if you will, or the technical components of R there. So that might be an episode to check out if you’re interested in learning more about R.
So our guests today, Chris Hyde is a consultant out of Albuquerque, New Mexico. Hydrate Consulting is his organization, and we’ve actually been trying to get Chris on an episode for quite some time and so we’re glad we were finally able to do it. For whatever reason we just couldn’t pull the trigger, but that’s all changed now, and we’re glad to have Chris on the program. In fact, this episode was recorded after the one that we did on SQLSat events, and we just happened to release them at different times, for a number of reasons. Our other guest, Andy Roberts, works for Microsoft, helping customers onboard to the various cloud technologies, and he’s been with Microsoft for 20 years. So he’s had quite the ride and I am thrilled that Andy has agreed to chat with me and talk a little bit more about Python and R.
The links for our show notes for today’s episode will be sqldatapartners.com/python or sqldatapartners.com/127. And so let’s go ahead and jump into the conversation with Chris and Andy.
Carlos: Well, awesome, guys. Welcome to the program. It’s good to have you.
Chris: Oh, thanks for having us.
Andy: Thanks for having me.
Carlos: So we have Andy from the Northeast and then Chris from the—
Chris: Southwest, yeah.
Carlos: So welcome on the program today. And today we’re going to be talking about R and Python. And I want to set the stage a little bit here. Doing a little bit of digging for this episode, I go to the R Project website and R was introduced into SQL Server and SQL Server 2016 and the description from the website says “R is a free software environment for statistical computing and graphics.” Okay, so maybe that makes sense. You know, we’re hearing a lot about analytics. So, I guess statistical computing, that kind of makes sense. But then in 2017, we know that we got Python Integration. And I go to the Python website and it says under application use, “Web and Internet Development, Scientific and Numeric.” Okay, well maybe there’s some tie-in to R there. “Education, Desktop GUIs, Software Development, Business Applications” and it actually mentions ERPs that have been built with Python. And I think a lot of us from that description might be scratching our heads and being like, “what? What’s Microsoft trying to pull here?” So I guess, let me start with the question, why are we having these integrations into SQL Server?
Andy: I think that you hit the nail on the head with R, right. I mean, so R is a language that’s focused on data, more or less. I mean, if you think about any machine learning or predictive analytics, statistics, you know, all of that is you’re crunching numbers, crunching data. And so it’s a very domain-specific language, got a lot of acclaim, a lot of use, a lot of growth in the past few years. And if you look at the data science communities, it seems that there’s kind of a battleground between R and Python. And I think, if you look at the general use of Python, you’re right, you can do a lot of stuff with Python. Don’t try and write a desktop application and run it inside of SQL Server. But if you look at certain packages within Python that are commonly used for similar applications as R, machine learning, data science, statistics, predictive analytics, those are the types of functionality that we’re really trying to expose in SQL Server. And the rationale for putting it in the engine is the way that these languages and these environments work today is you actually have to move, oftentimes, lots of data out of a relational database into your client memory space, do your analytics and then put the results back. So we’re trying to give you the ability to do as much of that work on the server where the data is as possible. So it’s in that bring the compute to the data type of mindset.
Chris: I think one thing that I don’t really think of it almost as Python Integrated into SQL Server. I think about it as Anaconda Integrated. So that’s the particular distribution of Python put out by Continuum that’s just focused on analytics, data science, and all the good stuff, from my perspective.
Carlos: So Anaconda’s like a library within Python, or– I’m not familiar with that.
Chris: Think of it as a specific distribution in the way that, say, Red Hat is a distribution of Linux. There’s about a hundred different packages that come with it. Most of them, all the good ones that you’ll really want for your data science projects, yeah, they’re in there. Of course you can add any others that are out there and it really is a full Python engine, but the focus is the distribution for analytics and data science.
Carlos: Yeah, so obviously analytics is the hot word. And I also feel like there’s a bit of there’s been a move to the unstructured data. And I think that with those two, so, an increase in unstructured data, obviously we’re seeing XML, whatever you might think of it has been around in SQL Server at least. And then of course they’re introducing JSON and I’m sure it that’s going to continue to grow and I think a lot of people are hopeful it will do better than what the XML did. But then we also have, I think, some limits on what the SQL language can give us. And I kind of feel like those two things in combination with just the demand for more analytics has opened this place and allowed these languages to come in. so who, and I guess Andy you kind of mentioned taking the data back to the compute, who then, is the biggest beneficiary of these features, or who would be most likely to take advantage of them?
Andy: Well, I think there’s two sides to it. One is when you think about the data science process and dealing with data, there’s general exploration of data. I don’t know that the integration that we have right now in SQL Server is the best for just general exploration of data. But when you’re talking about training a large model, giving (?) linear regression or a just a busy classifier on a large amount of data, running that in an environment in a parallelized environment that’s close to the data can be very beneficial. And then if you train that model using certain algorithms, in 2017 we released the ability for a function called PREDICT, which is a real time predictional. So where I think that one of the main beneficiaries that doesn’t get a lot of press is actually pushing the predictions out into maybe an edge server that’s not your main data science server. So now I can actually make predictions on an edge server that’s near a bunch of web servers. So as a user on a site, make recommendations or maybe I’m doing some classification based on their Clicksor (?), what type of customer, what type of offer I want to make to this person. I can do that on a smaller server that’s not housing all of my large amounts of data that I’m doing the model training on.
Carlos: Yeah, so that’s an interesting thought. You mentioned, well, maybe it’s not the best for just kind of looking around in your data, and it sounds like there are, at least for now, maybe some specific parameters under which you want to try to introduce some of these things. So Chris, let me ask you, when your manager comes in and says “hey, let’s start doing Python”, what are some of the questions that we should be asking before we just go ahead and do the installation?
Chris: Well, I would like to meet that particular manager. Cause really I’m not sure that’s happening. It’s certainly not happening with any of my clients. I think in the past you’ve seen the data science, the predictive people, that team is often separated from IT, that they’re an end user. So with the integration, we can bring all of that stuff partnered with the outside group but bring it back within IT so that the data can stay in one place. So I think a lot of the time it’s been somebody like me helping the manager, helping the organization see that you can do all this stuff in one place. We don’t have to separate it out and lose visibility and lose the data lineage. We can do it all in one place, as long as at least from my perspective, I’m never going to be a full-blown data scientist at this point in my career. But I know enough to be able to work with the people who are full-blown data scientists, to bring their thoughts and their processes into a model that we can use within our data ecosystem.
Carlos: It’s going to take a village to put this together in a sense and Andy even talked about some of the models. So then is that still going to be the case for a lot of folks, or the data science folks, or the, we’ll call them the reporting folks, going to have to come and create that model and then bring it back to us? Or do you see more of the, again, I guess I look at it as a SQL Server administrator. I’ve been the one to gate-keep, is a common word, some of the things that get into the database. Am I now going to be taking on those kinds of responsibilities or how do you feel like those handshakes are going to go?
Andy: That’s a good question. If you look at historically how these processes work, a lot of times it was, we’re going to off-load a whole bunch of data from our OLTP system or our data warehouse, bunch of historic data out to a data science environment where a group of special people that you never really interact with do a bunch of stuff. And then we push these magic equations, essentially, back into a team of C++ developers to code that up and push it into production. And I think what we’re trying to do here is, as Chris mentioned, is really trying you keep all that in one place. Now, I think the other side of this is the dynamics have changed a little bit. So a lot of what we’re trying to do at Microsoft in general is, I hate the term what we use a lot is, democratize the process. Enable people that don’t necessarily have a PhD in statistics or have been studying this for 30 years to jump into the process and play a part. And so if you look through how we approach these problems, a lot of what we talk about is the team data science process. The way I feel that this came about is, if you look at when I talk to a customer, a lot of times it’s how do I find a data scientist? How do I find this one person that not only understands my business, how my business operates, the intricacies of my specific company, my industry, all of my data, my source systems, the problem I’m trying to solve, and all the math and equations behind how to solve that problem and give me an answer tomorrow?
Carlos: They tend to think of them as unicorns, almost, right?
Andy: Right! That person does not exist. And if they do exist, they’re getting paid a lot of money already in a financial services or healthcare company, or some big consulting firm. So what I think with the team data science process, we’re really trying to say, “here are the parts of the puzzle to solve one of these problems. And you need to start with the business problem and here is the different people that can provide input during this process”, and then give the other tools to solve that. And again, that’s kind of a long-winded answer, but when you start looking at SQL Server as an entry point into that, a lot of people understand SQL Server, that’s where a lot of people’s data is already housed. So getting people into SQL Server and then executing R and Python code, which is kind of the language of data science allows people to solve some of these problems in a little bit more team oriented way.
Carlos: Now, so Chris, looking at your personal experience. So I know you from the SQL Family community, if you will. But would you say that you came more from the business intelligence background?
Chris: That’s where I’ve been for the last quite a few number of years. I came in through the DBA door, I think a lot of people did that. And the DBA door often leads naturally to reporting, because in many places the report writer is the DBA. And then that kind of became as I started learning more about business intelligence just beyond just SSRS, the things that are available and I started playing with that and bringing that to the company I was working for at the time and I brought it to the next company and then all of a sudden I guess I’m a BI guy, now. I’m quite happy with that, by the way.
Carlos: So, what’s been your experience in diving into some of that and being part of the process, because I think, like Andy said, they’re looking for this very specific person. They want them to understand this business, but that person doesn’t exist. It takes a lot to know about the data and to be able to put a lot of these concepts into place. So I guess, how have you straddled the two ideas of here’s what I know, here’s what’s coming, and then here’s what I can do?
Chris: Well, I think there’s also a third area that I’ve needed to straddle, which is the domain knowledge, the knowledge about the business. What problems are they having that they don’t know how to solve or they don’t even think can be solved?
Carlos: Yeah, that’s interesting. I know we’ve talked about this on this program before, but this idea of the new valuation or what’s going to separate technology folks is going to be that domain knowledge. People are looking, not just for technology people anymore, but they want people who have experience in healthcare or in manufacturing or whatever that is. And I can only imagine that that is very similar in this situation. Like just learning Python or R is only the first step, then there’s the application after that.
Chris: Yeah, I think that’s even maybe the second step. The first step is more learning the kind of algorithms that are out there. Learning statistical distributions, where they can be used, learning things like what is k-means clustering what’s an application for that. And once you know those kind of things and you can hear a problem and you can think “okay, maybe that particular algorithm is something that I could use to build a model to help them solve that”. And then for me, it’s “how do I actually implement that now?” But without having some of the base level, and I mean for me it’s really no more than a Statistics 101 or 201 level. But at least having some idea that if piece is there and available and something that you can apply, and then from there figure out how to actually meld that algorithm with your data.
Andy: And on that point, I think that I said earlier you don’t necessarily need a PhD in statistics, but you do need to understand at just a base level, even, how statistics works and how numbers work.
Carlos: Standard deviation as just an example, pulling up my statistics from college.
Andy: Even just number one, to gut-check your results. You can’t just you know– if you don’t understand a little bit about the model-type you’re working with works, then you’re not necessarily going to understand if your results are correct. So one of the things we can really help you do is just come to a bad conclusion a lot faster using these tools. So understanding some of that upfront avoids that problem. Now I don’t think you need to understand all the matrix or arithmetic that’s going on behind the scenes as you’re doing back propagation in a neural network, but understanding what the inputs look like and their distributions and what the output looks like and is that kind of what you’re expecting, I think is very valuable to test that.
Carlos: That’s coming from the person who is having to write that or put those things together. I am curious, as I put my DBA hat on, I’m still doing a lot of that for customers. What kinds of things should I be concerned about? I feel like I’m a little nervous, because it’s one thing for an administrator to have to learn SQL to be able to read store procedures or things like that, but now I’m getting into all these different things. I may have to go back and brush up on my statistics a little bit to get in there. Should I be concerned?
Chris: I don’t think you have anything to be concerned about, but just the world of data is getting larger and larger and larger, and certainly the opportunity for specialization is not going away. Teams are getting larger and larger. Now it becomes more and more important to be able to work within the team system, so you need to know enough to be able to talk to the data scientist on your team. But that’s not necessarily any different than today, when in a lot of cases you would have to know enough about the BI stack to talk to your BI developers. As a DBA, if that’s your bread and butter, you’re probably not deeply involved in the internals of analysis services.
Andy: You have to understand the queries that analysis services is executing to process either real-time results or to build an in-memory model and understand how to make those work quickly and efficiently, but not necessarily the MDX and the DAX expressions that are being used to hit that model and why that model exists, even.
Carlos: So then going back to something you had mentioned about all this data and then where it should reside? So are we starting to see people chunk that up and put it in different places? And so like, “you know what, instead of giving you the whole enchilada, or however many terabytes, here, why don’t you take last quarter’s data. I’m going to export that, stick it in somewhere and then let you kind of play around with that. And then once you feel like you have model, we can apply it to a larger set”, something like that. Is that a common approach or scenario or I guess the question really is, how are we starting to develop these models and how much data do you need to do that?
Chris: How much time do we have this morning?
Andy: I went to a four-hour talk a couple years ago that was just titled “Do we need bigger data or better algorithms?” and it was really all about this. It was how much of a sample size do we need to even start doing the exploration to decide what type of analysis we’re going to perform and whatnot. Or do we care about that, do we just let the training loose on the entire data set that might be a petabyte of data? What are the pros and cons of these? One of the quotes that I took out of this four-hour session, and probably the only thing I really remember was just talking about sampling and how we kind of get scared of sampling because if you don’t understand how to sample correctly, there can be biases and there can be bad data and all sorts of issues when you take that sample and you expand your model to the larger data set. But if you think about it, we sample all the time. When you go to the doctor and they draw your blood, they’re taking a sample of your blood. If they had to test all of your blood, that would be a bad thing. You know, we sample the air when we breathe to detect what’s around us. if we had to actually breathe in every air molecule to determine whether someone was cooking bacon downstairs, it would become very much less appetizing. So there’s definitely part of where there’s more of an art than a science, depending on the type of the analysis you’re doing, depending on the backgrounds of the people performing that analysis and the type of data that you have, the answers going to vary.
Carlos: So would you recommend a start small first approach and then increase as needed? Or I guess it will always just depend?
Andy: I mean, as a DBA, especially if I didn’t know that person writing the queries against my data, I would definitely. But as with Chris, I’m definitely not a full-blown data scientist, so I don’t fully understand the implications all the time, making those recommendations.
Carlos: So, I know that one of the things that Microsoft has mentioned, they used to talk a lot about feature comparisons with Oracle. We’re starting to see a bit more MySQL comparisons, again because of those analytics features and the open-source world, they want to make sure that they have ways to offer services there. We got R in 2016. 2017 we got Python. Are we done for a little while, or do you think more is coming?
Andy: I think that when we released R, we definitely made a lot of noise about the fact that hey, we wrote this as an open language extensibility module, or whatever the exact word is that we use, and I think the hint was always Python’s coming close on its heels. I haven’t seen any other hints that are being talked about in large press and I haven’t seen any other languages that are really being used at the same level for this type of work. Yeah, that’s not to say that we wouldn’t do that, just nothing I can comment on.
Carlos: Now with the advent of this, and I guess I should say while again, from an analytics perspective, I think the data science realm, there are lots of specific use cases under which these tools are handy, but I think if we step back, most of the business intelligence or that reporting, it’s still done in the SQL language. These analytics are really just a smaller subset of the larger business intelligence pie.
Chris: I think this is an increasing part of the pie. If you think of analytics as a whole, you start off with descriptive analytics. That’s the traditional BI, that’s the who, the what, the where. But just having that data, we’re collecting that and compiling the historical data in order to be able to make a change, in order to be able to run our businesses more effectively. And that’s then where the predictive piece comes in. Now I still, in a lot of companies, especially the smaller companies, they’re still struggling to get to that full descriptive intelligence. I mean, that’s probably not a popular thing to say, because we think of the BI stack and SQL Server not having too many major changes recently. So we think every company should be fully up with their data warehouse and all their ETL running in and all the data being cleaned appropriately, but a lot of companies are just not there, yet, and that’s really the base that we need to move into the predictive analytic space, is having good, clean data that we’re confident about. And we’re just not there in a lot of instances. Now, the larger companies that have invested a lot of time and money on this, yeah, they’re there. As the smaller companies continue to get to where we think everybody should be, that’s when really the opportunity for using predictive analytics reaches them, and that’s when they can start investing more on that path.
Andy: Yeah, I would completely agree with that. And I think if you look at other areas, too, where you start to see, for example, R surface up. You start seeing R surface up in Power BI in a couple different places. Obviously in SQL Server, it’s kind of almost becoming ubiquitous and things that touch data at Microsoft and then guessing that Python is not far behind that in coming into various products. So you start to see a little bit of a blend of people using the right tool for the right problem at that point in time. But again, once they’re there, once they’re able to get data in a format where it’s able to be reported on in a consistent, repeatable way.
Carlos: Okay, so I’d be remiss if we didn’t at least get into a little bit of the technical components. So, from R, at least on the installation, it’s just like “hey, get my analytic services, and I can do a stand-alone or I can have that in with SQL Server when it gets installed. I admit I haven’t tried it with Python 2017, but as far as getting that set up and then support it, what are kind of the major pieces there?
Chris: Yeah, that’s the main starting point, the base installation is pretty simple. One of the nice features about the Anaconda distribution is the package management is very clean and very easy to use, so if you need to include some other 3rd party packages to your environment, it’s very easy to do with that. So, when you’re calling the SPX external script procedure, and that calls the Launchpad service, which instantiates your instance of either Python or R, and initiates the binary exchange language data transmission between SQL and whichever data science engine you’re running. That architecture is fundamentally the same for both products.
Carlos: Okay, so knuckle-dragging Neanderthal that I am, I guess I’ve never really thought to go through that piece. So I tend to think of images, so we never want to store images on the database, we’d store them in a file server and then the application would call them. Are those files, those Python or R files, sitting outside the database in a folder and they’re going to get called from that location, or am I over-simplifying that?
Andy: Yeah, your R script could be stored in files in a column somewhere, or it could just be in-lined into your stored procedure that you call. But I think the key that Chris was getting at is that the R or Python process is a separate process. It doesn’t run in the SQL Server memory space. And it’s still resource-governed and still managed and still secured appropriately through the different steps that he outlined, but the script itself is seen within SQL Server and then executes externally.
Carlos: Is there another way to call it?
Andy: The other way to call it if you’re in an R client or a Python interpreter, you can actually set SQL Server as your remote executor. So instead of saying hey, execute in the local version of R or Python that I have on my workstation, when I say EXECUTE, send this code to SQL Server and have SQL Server then call the R process or Python process on that server and take advantage of those resources and the parallelism, etcetera.
Carlos: Right, okay. I feel like if I look at an execution plan, I’m going to get like it’s a remote procedure. I’m not going to see all the details ever, just it’s like “hey, I’m going outside and I’m doing something and coming back”.
Carlos: Well, awesome. So those are really the questions that I had. Do you guys have anything else you want to touch on or things that you think we should be discussing?
Andy: The one thing I would say from the installation and operations side that if you’re, say, a dot net developer coming into this space, one thing may throw you for a loop if you’re use to dealing with the GAC and a shared environment for everything is both R and Python, they can have multiple environments defined on a server. And a lot of developers take advantage of this in their own work, but the way that SQL Server works is it, even if you R already installed on your server, it will install its own environment inside of the SQL Server installation, essentially. And so when you install packages, whether it’s R packages or Python packages, you need to make sure you install it in that library that SQL Server sees. If you just install it on the base R environment, SQL Server won’t–
Carlos: It’s not going to pick it up.
Andy: It won’t pick it up, right. And that way, the DBA can kind of also walk down what is running on that server and maybe limit things that are communicating with the network, if that’s not a desirable thing that they want to do. But it gives a little more isolation, a little more control for that for SQL Server.
Chris: So, one thing I wanted to talk about briefly, kind of on a separate path is, we’ve covered a lot of things that may sound like complete Greek to the traditional DBA, but today, there are a lot of resources available to get started in this data science world, or add some of these data science-y techniques to what you’re doing. It used to be that some of these things were kind of the domain of college statistics courses, so it’d be quite expensive, but now you can go to something like Coursera or edx.org or YouTube, even, and find a lot of this information to get started in understanding the statistical methods and then from there, there’s so much information out there about applying the statistical methods. So, there is not the huge barrier to entry into this area that there once was, going back to again, the Andy’s democratization (?).
Carlos: Right, okay, so very good. So you know compañeros that when I start picking it up that they have dumbed that down, so that even the knuckle-dragging podcast host can get it. Well, awesome, thanks guys again for coming on. Can we go ahead and do SQL Family?
Chris: Oh, of course.
Carlos: So Andy, let’s get started with you, how did you get first get started with SQL Server?
Andy: So when I first started programming, I was working for a small consulting company. It was actually me, my dad and two other guys and we had a project, I was doing mostly FoxPro work at the time, if anyone remembers the good and old FoxPro. And the customer wanted to use SQL Server for something and we were bidding on the project, so my dad handed me two, I think it was a Sams book and a wrox book, and like 6 o’clock at night he plopped them on my desk and said, “Andy, I need you to be a SQL expert tomorrow”. So me, pretty much fresh out of college, I crammed for that exam. I came in the next day and at least was conversant in most of the SQL stuff they threw at me. And the rest is history.
Carlos: Wow, there you go. Now that’s probably the quickest turn-around I’ve ever heard. That’s pretty impressive. Chris?
Chris: So my secret origin story is that I was a machinist for a number of years after I left college without having completed it, shall we say? But one place that I was working, I was working the lathes, and I developed a really terrible allergy to the solvent that they were working on the parts, and my hands just cratered up like the surface of Mars. I mean, I looked like Wade Wilson in Deadpool, just down to the hands, there. And so I had to get out of that and so they moved me into the QA arena and I was on third shift at the time working nights, and there wasn’t a whole lot of actual checking to do, and I started kind of teaching myself Access, the dreaded A word, to bring some of the data that we were collecting together and get some reporting on it. And at some point after doing that for a number of years in the QA arena, I thought, “well, if I’m going to do databases, I need to do them professionally”. And I found an opportunity that involved learning Oracle, but then the next opportunity after that, I got to take those Oracle skills and transfer them to SQL Server and I’ve just stuck with SQL Server ever since.
Andy: I was going to say, my brief stint in Oracle was around that same period of time. We were deciding where to go as a company to partner with, Microsoft path or Oracle path, and I’d already known how to install SQL Server, a pretty quick install. And I started downloading the how-to’s to install Oracle 7 something on Linux and I printed out the 70 pages, and I said, “no”.
Carlos: Forget about it, where’s my ten-foot pole?
Andy: I’m sure it’s gotten easier since then, but at the time, it was not my cup of tea.
Carlos: If you could change one thing about SQL Server, what would it be?
Chris: I really can’t think of anything off the top of my head. I mean I’m pretty pleased with where it’s coming from, where it’s going. And if I run into a little issue that I can’t deal with, I just hop on Twitter and SQL Help, hashtag it and I get the answer. Or I Google for some of the great blogs out there, so yeah, I’m really in a good place with SQL Server right now.
Carlos: Andy has a job for you at the end of this interview. Andy, what about you?
Andy: I’m kind of in the same boat. There’s a couple things, one of them was always running on operating systems other than Windows because in my job I talk to a lot of customers that are Linux shops, and we did that, so that was always my big ask. The other was around some of the scalability limits and trying to get parity between SQL Standard and Enterprise for some of the db features, and a lot of those have really come a long way.
Carlos: That’s right, and you got that in service pack one.
Andy: I’m sure I have something if I really dug in it, but I don’t have any like big pressing things off the top of my head right now. You know, here’s one. Active transactional databases.
Carlos: Transactional databases, yep.
Andy: Preferably in geo-dispersed areas, that would be awesome. We kind of have Cosmos DB for that.
Carlos: Yeah, I was going to say, the team will direct you to Cosmos.
Andy: Exactly, exactly. But if you could get that to run on premises at somebody’s data center, in SQL Server, that would be magnificent.
Carlos: Oh yeah, that would be huge. Then I’d be a little nervous about actually not having a job anymore, cause people would just keep spinning up instances instead of fixing problems. They’d just like “eh, I’ll just spin it up”.
Chris: That’s no different than the classic throw hardware at it, but you still have a job even after that.
Carlos: Well, that’s true, but I feel like at some point, there’s $100,000 Dell servers or whatever, they eventually will blink. Okay, Andy, what’s the best piece of career advice you’ve received?
Andy: I think it was “Take the job at Microsoft”.
Carlos: There you go. Now how long have you been there?
Andy: Almost 20 years, so a little bit over 18 years, and it was well, a really long story, but my father and I actually joined Microsoft on the same day in 1998 and we both almost passed and then a bunch of people we knew at Microsoft was like, “are you kidding?” And yeah, so we took the job and it’s been great. It’s been a great career so far.
Carlos: Yeah, there you go. I was trying to think, ’98, that’s back, I guess it was Windows 95 and Windows 98, the whole Start (?) thing.
Andy: Yeah, SQL 7 had launched the week before I joined Microsoft. At the time we’d come into the office every day and download, there was like a group of four or five of us that downloaded the daily build of Windows. We couldn’t do that on our home networks back then, it was a dial-up. So we’d go into the office, some person would download and they would do a canary install and then they would send a text message out to everybody else and say “all right, come in and install”, we’d kind of rotate through.
Carlos: Yeah, man, how things have changed.
Chris: So I’m sure I’m not the first person to say this on the podcast, but the advice to start speaking and to start sharing my little corner of knowledge with people. That’s made a huge improvement in my career over the last 5 years, just from connecting with so many different people, making friendships, making professional connections. The trajectory just changed it completely.
Carlos: Very cool. Gentlemen, our last question for you today, if you could have one superhero power, what would it be and why do you want it?
Chris: Well, I kind of like all the superheroes that have some form of immortality, because there is so many books out there that I haven’t read, so many movies that I haven’t seen, so much music to listen to and to really grock that, it’s going to take several hundred lifetimes. So if I could have a little bit of that, that would be good.
Carlos: There you go. So that does change your retirement plans, I think though, as well. There is that.
Chris: Well, compound interest is your friend when you live forever.
Carlos: That’s true, that’s true. Andy?
Andy: Yeah, I think another way to be able to cram all of that existence in, is if I had some sort of just machine empathy. Some machine mind control so I didn’t have to spend, I could just say “go, do that” and then I could free up time to spend with my kids and my family and watch movies, etcetera.
Carlos: Well, awesome. Well, gentlemen, thanks again for being on the program today. I’ve thoroughly enjoyed it.
Chris: Me, too. Thank you for having me.
Andy: It was great. Fun time. Thank you.
Carlos: Yes, and as we continue this, we may reach back out and see where we are next year with some of these developments. It’s an exciting time to be involved with some of these things.
Carlos: So my take-aways for today’s episode are one that we’re going to have to sharpen our skills, not only with technology, but even going back to some of these math concepts. And so it will take a broad approach to be able to utilize some of these new features and use them effectively. The other big take-away for me was the idea of consensus building or team-building. Again, we tend to use the word from a DBA perspective as the guardian, as the steward, gatekeeper, and I think that with the introduction of services or technology like R and Python, we’re starting to see a democratization of the data, which we talked about from a team’s perspective. But I also think this affects us as those DBAs, and we need to be part of that democratization, if that’s the right word, for our users. Yes, those services are in SQL Server, but others are going to start taking advantage of it. And I don’t see a single person, if you’re the DBA in your organization, or even a group of DBAs being solely responsible for that. It’s going to have to be shared, and I think that some of these walls are going to start coming down. Lots of people request that we talk about “what am I going to be doing in the future?” And I think that maybe not so much the what, but the culture of our roles is going to change, because we are going to have to start working with other teams. We’re not going to be able to put up those walls and say, “no, you get to come in, you get to come out” or whatnot. We’re going to have to work within the context and frameworks of these other teams. And we’re going to be sharing tool sets, and we’re going to be sharing processes, and I think we’re going to be adopting the ways that other teams have worked. And so I think this is going to be one of the biggest changes, as we look to the future. So I am interested, compañeros, to see how you’re using Python, and I’d love for you to reach out, or R, even. Reach out, let us know how that training process is going. Ultimately, we’re very interested in sharing good courses, resources, things that you have found useful. I’d be very interested in knowing all of those things.
I think that’s going to do it for today’s episode. Our music for SQL Server in the News is by Mansardian, used under Creative Commons. Of course, you can connect with us on social media. I like to connect with people on LinkedIn. I am @carloslchacon. If you connect and you’re connecting because of this episode, please do let me know what your experience with Python or R have been, or what questions or thoughts you might have around it, the impact that it will have on you, and I’d be interested to strike up a conversation about that. We’d like to thank Andy and Chris, again, for being on the program. We do appreciate it, and compañeros, we’ll see you on the SQL Trail.