Does your organization use a data warehouse? Maybe you have a cube but find it hard to maintain? This week on the SQL Trail, Steve and I chat with Jason Horner about building a good data architecture. When talking about a data warehouse architecture, or even just a data architecture, people often think about building the data warehouse and specifying the server hardware, building the ETL, things like that–they tend to lose sight of the larger picture of data architecture. Jason gives us his “four pillars of data warehouse architecture” and how each of them impacts what you are able to do from a reporting prospective.
Listen to Learn…
- Jason’s four pillars of data warehouse architecture
- Why most warehouse are organic, rather than planned
- How building a data warehouse is like building (or rebuilding) a house
- Important recovery factors for a data warehouse
- Why Jason thinks you should start your data dictionary in Excel
- Why Jason says your data lake can be an intermediary stop for your data warehouse
- The value of using source control for your data schema
Episode 65 Quote:
“Whether it’s actual doing the dimensional modeling and making sure that you have a good dimensional model, that’s the number one thing that helps with data warehouse performance.” – Jason Horner
About Jason Horner
Jason is a consultant for Pragmatic Works. He specializws in data warehousing, data platform, geo-spacial power-show. Jason is also a Microsoft Certified Master and Microsfot MVP. Follow him on Twitter at @JasonHorner. Read more about him here on his about.me page.
Follow @JasonHorner on Twitter
[Free Training] Data Warehouse Implementation with SQL Server Jumpstart
[Article] Building a Data Warehouse with SQL Server
[Article] SQL Server Data Warehouse Cheat Sheet
Transcription: Building a Great Data Architecture
*Untranscribed introductory portion*
Carlos Chacon: Jason, welcome to the program.
Jason Horner: Que paso Companeros.
Carlos Chacon: Yes, it’s good to have you on the show. We’ve kind of been talking about this for a little while, and it’s nice to have you here. Our topic today is kind of the idea of what would make a good data architecture. I think most of the folks listening, they have data responsibilities, whether they be database administrators, or even the reporting, whether it’s some transactional system that they have. Then kind of taking that, and migrating it into a reporting environment, or even eventually into something as big as a data warehouse.
You have some interesting thoughts on maybe ways that organizations could improve that process, and make it better so that they’re not kicking themselves later down the road with some of their poor choices, as they build out this infrastructure, this architecture rather.
Jason Horner: Yeah, absolutely Carlos. I see, worked as a full time employee a lot of places. I’ve been a consultant for about two years now. I’ve seen a lot of different environments, and you really hit off on a good point. A lot of these data warehouses, they’re built organically right. It’s starts off there’s somebody wants some kind of a number. Maybe it’s a top line number for sales, or some such thing. Usually it’s the application developers or somebody that are tasked with building up some reports. Then you guys both now as production DBAs, you start screaming, bringing pitchforks to their desk, “what are these report inquiries doing on this OLTP system, they’re slowing everything down. We’ve got long blocking and blocking chains.” These kind of problems. Usually what people will do is they’ll say, okay well we’ll use transactional replication. We’ll move the data over into a second copy, and we’ll let you do whatever you want there. Then ultimately that kicks the can down the road a little bit further, right? They get some mileage out of that, but then the queries don’t perform, because the analytical questions get deeper, the business demands more. What I like to …
Carlos Chacon: The data itself just gets bigger as well.
Jason Horner: Yeah, absolutely right. You can’t use some of those same OLTP management techniques on that data, right? You have to remodel it and build it. I really like to think of it, like think about a house. If you’ve ever went shopping for a house, you’ll see a lot of houses, and maybe they don’t look to pretty on the inside, but as long as the bones of the house are good. As long as the inner two by fours and framing of the house is good, you can work with that. You can remodel it and you can build it. That’s really how I see setting up a framework for data architecture.
Steve Stedman: Okay so given that you’ve got that data architecture that’s built out for your OLTP process, and you’ve got the good bones, like you said there. What kind of things do you look at to figure out how do you get that into a data warehouse environment?
Jason Horner: My model is kind of, I use four pillars right. It really boils down to a solid platform, information management, data integration, and application life cycle management. Those are really the components that I look at and try to figure out good strategies along all those accesses.
Steve Stedman: Okay, so when you refer to the platform, where are you going with that?
Jason Horner: The big thing with the platform is obviously as you know, I know backups and recovery and corruption are near and dear to your heart. It’s really defining and RTO and an RPO. So recovery time objective, how long does it take you to recover, and a recovery point objective. To what point can you recover to? I think a lot of people don’t think about that for the data warehouse. It’s not as critical, some would say, because a lot of times you can recreate that data from the source system, or maybe you’re getting text files or other feeds, but some systems the data warehouse becomes a de facto system record.
Steve Stedman: That’s something I’ve seen as well, where we just say we can rebuild it from the source data, but the often times over time, you lose the source data, or it gets purged so that data warehouse is the only place that you might have some of it.
Jason Horner: Yeah, absolutely, and think about how long it would take to, in just 10 years worth of daily files. It could be a long haul.
Steve Stedman: It could be close to 10 years.
Jason Horner: Yeah, exactly. Other things to consider is high availability. Do you need to use failover clustering is always on appropriate. There’s a lot of other technologies there. Maybe even the answer is you don’t need high availability. Maybe you want to do a hybrid scenario, where you have always on, into the cloud. That’s another possibility there. Then the last piece I would say is, oh go ahead Carlos.
Carlos Chacon: I was going to say, you said something there so availability groups in the data warehouse. Now I’ve seen it, a client is working with, they have it, but it seemed like, I never thought that was a good idea. Ultimately, I guess my thinking there is that while yes you have the highly available solution, is the data warehouse the right place for that, and all of the benefits that you lose I’ll say, by going to a transactional log replication type structure. It seems like when I load my data, doing table locks, bulk inserts, and reducing the amount of logging that gets generated, to make that go faster, yes that makes sense. I’m curious to know, how have you seen the data warehouse in an availability group scenario?
Jason Horner: It’s going to be a certain profile of customer that’s going to do something like that, because like you said, it really impacts how fast you can go on some of these minimal logging scenarios, and what you can do with index maintenance. Some of the pains there of being a fully logged model. What I would say is if you think about a customer that’s integrated that data warehouse, and that information from there, into more of an operational side, where it’s actually feeding an application. Maybe it’s providing near real time analytic intelligence back to the application to make a decision on something, then they can’t afford for that data warehouse to go down.
Maybe it’s appropriate for a small section of the warehouse, maybe you take a slice of that data, and expose it via that. Then if you need something that’s more longer term, let’s say 5 years and back, or some similar time frame like that. Maybe that goes into a different architecture. A lot of companies, the data warehouse has really become the lifeblood, so you can’t have it go down. Maybe a failover cluster is a way around that, just to enable patching scenarios and upgrades. There’s certainly other more lightweight technologies that might be appropriate, that aren’t necessarily the new hotness.
Carlos Chacon: Sure I was not suggesting that an availability option wasn’t a good idea. I was just kind of interested more in the availability groups because it seems like in the data warehouse I’m seeing interest there as well, and I kind of feel like maybe this is not, your use case may vary
Jason Horner: Yeah, absolutely. It’s really set, and that’s why I think it’s important that people actually think about this. Often times it involves bringing in other people, not just the data warehouse team, but relying on your DBAs, relying on your infrastructure team, so on and so forth.
Carlos Chacon: Okay, so now as we take a look at that, we’ve kind of made some of those decisions, now we’re going to start actually building the nuts and bolts to make either the ETL process for that recording to start to take form. For information management
Jason Horner: Yeah, absolutely. Information management is really key. I spend a lot of time talking to people at SQL Saturdays and other conferences about components of that. Whether it’s actual doing the dimensional modeling and making sure that you have a good dimensional model, that’s the number one thing that helps with data warehouse performance. If you’ve just taken and done a lift and shift, and moved an OLTP scheme over and tried to build a data warehouse on top of that, it’s likely not going to work, as the data approaches larger sizes.
I’m talking about 500 Gigs, Terabytes, those kind of size ranges. It’ll start breaking down. It’ll work pretty well for smaller sizes, given appropriate hardware, but it’ll really start to become painful. Now that being said there’s a lot of exciting technologies in 2016. Things like the operational analytics, which combines column store technologies with memory OLTP, which I think is really going to be a good enabler for some of those scenarios, to kind of bridge that gap.
There’s other things too. I think one of the key things, and you’ll see a lot of stuff in 2016 around master data services. Getting master data management, and then really empowering the business users to manage their reference data, manage their hierarchies, and other pieces of data that actually enrich that data model. When I say enrich, I’m saying data that’s not stored in the OLTP system, but that might be hidden in an analyst’s spreadsheet, or sticky notes in their cube for example.
Hand in hand with that, go things like data quality. Making sure that the results are [inaudible] and accurate. Then having a data dictionary so people understand how to use the model, and what the columns mean. Information management is a lot more than that. There’s some stewardship, and there’s some governance as well, but those are the key things that you really have to start thinking about.
Steve Stedman: Okay, so when you mention the data dictionary, and having that available. What kind of tools do you use, or how do you go about doing that, so that it can be consumed by the people trying to use the data?
Jason Horner: It’s a great question right. There aren’t a lot of things. I mean obviously there’s third party vendors that have come up with solutions to build data dictionaries. Often times customers or clients or places that I’ve been, they don’t have the money or they don’t want to fund it, so I often recommend just starting with Excel, or even using a SharePoint list to define some of these core attributes. It even goes beyond just a data dictionary. It could be something like a source to target mapping, which kind of specifies how do columns from the source system get transformed and then ultimately placed in the data warehouse.
Starting with something simple. Excel spreadsheet, or maybe even using SQL server. SQL server has a lot of system table views that we can use to fill in some of that basic meta data. Then we can use extended properties, or we can use modeling tools to basically bridge that gap and bring that stuff together. I think it’s a really important thing to have, is a data dictionary, because so many people will be interacting with that data warehouse over time.
Steve Stedman: Sure, and it really sounds like use whatever you have access to use there. That will give you something that you can work with, and then build on.
Jason Horner: Yeah, absolutely Steven. The other thing too, that’s important to remember is start small, and then as your home grown solution or your Excel solution becomes painful, maybe then look at buying a tool. A lot of times people put the cart before the horse, and they go out and they spend the money on the tooling initially, and that doesn’t solve the problem. It just gives you now two problems that you have to deal with. Your original problem, and then figuring out how to correctly implement this tool.
Steve Stedman: Yeah, good point, good point.
Carlos Chacon: That’s an interesting thought, because when you said Excel, my first thought was oh my gosh, that’s never going to get kept up, so any changes happen, it’s just going to get lost. I think it is interesting that you’re talking about, basically the institutional process. Like “hey guys, let’s get buy in, let’s make this thing happen”. Kind of starting small because even there I could also see people saying “well let’s just say I start with a single dimension”.
Again it’s a piece of the overall warehouse, they’re going to say “well it’s not complete, therefore I’m not going to use it.” That’s so frustrating when you’re trying to make changes, but ultimately I think you’re right. Being able to then give that as a data team member, I want to give something to the analyst, or to the other reporting people, so they’re not coming to me with questions about where these things are, or making bad assumptions about where the data is.
Jason Horner: Absolutely Carlos and there’s a lot of interesting things that you can do there. I’ve seen places where they have reporting services as their primary reporting tool, and in each of the columns, they’ll turn that into a hot link that points back to either a Wikipage, or even another report that actually surfaces the data dictionary that way, or they use SharePoint and they integrate SharePoint so that they give the user a soft landing spot to start exploring these reports. I think Power Bis is doing some of that. Especially some of the stuff with the natural language query. Where you can build these more robust semantic models, and allow the user to find and enable more self servicing areas.
Carlos Chacon: Yeah, very cool.
Steve Stedman: Okay so Jason the third pillar that you mentioned was the data integration pillar.
Jason Horner: Yes absolutely, and this is something that I think is key to get right, because what I often see is what I call data sprawl, or information sprawl. Where people just start using replication willy nilly, they don’t think about it, and data just ends up in six different warehouse. That’s not really what you want right? You want to think about what are my source systems, what systems to I have to go after, how am I going to consume that data?
Is it going to be a flat file dump, is it going to be transactional replication, is it going to be a direct SQL connection to that, am I going to do extracts? You’ve got to think about my delta detection, my incremental load process. I also have to think about I’m going to want to bring that data into a staging area, and often times you want a persistent staging area. Are you going to use something like Azure Datalake, are you going to use HDinsight or some other base technology just to store this data, because it’s going to be a lot of data coming in and the warehouse may only need a small subset of that initially.
The operational system may archive data out or it may get changed over time, and you just need to start accumulating it. That’s really the core of the datalake pattern. It’s just dump the data somewhere, before you know what you want to do with it, and then you always have the ability to go back and carve out the pieces that you need, to put it in your warehouse. You can also enable explorational data warehouse or data [inaudible 00:26:55] scenarios, where maybe somebody needs to just go in there and get a subset of the data that’s too big for the data warehouse, and that hasn’t been formally modeled, but they need to run a model on it, or an experiment on it, and pull out that data.
Those technologies are really great for enabling that without taking on the full cost of building a third normal form staging layer, as suggested in architectures that you see, or even Ken will, at the end of the day says we build this staging layer in SQL. That’s really expensive right?
Steve Stedman: Oh yeah, definitely.
Carlos Chacon: That’s a very interesting concept that you bring out Jason, is that idea that my data lake can be an intermediary stop for my data warehouse. I think traditionally, some of the feature sets that are coming out around that, because I’ve always thought about it in the “streaming analytics” quote unquote process, meaning I am a website like weather.com, getting hundreds of hits. I have all of this information that I may not want to keep but I want to get a real time feel for what’s trending for example, and I want to make a decision based on that rather than just information that’s going to stay with me for a year in report.
I think that is actually an interesting idea to potentially explore, is that if you were to go that route, you’d then give your analysts a place to go and play around, and kind of get used to some of those new tools that are coming out. You would reduce load on the transactional system, but it would also reduce the pressure on the data warehouse scheme, because the analyst will have access to the data and the stuff that actually needs to get reported on. Year over year, quarter- that kind of slice and dice information can then be processed as they determine what it is that they need.
Jason Horner: Yeah, absolutely Carlos, and that’s the thing, and now you’re not having people kind of building their little one off data warehouses. They’re going directly to where that’s stored. You don’t have multiple people hitting source systems, which is kind of an anti-pattern that I’ve seen. Where you may have three data ware houses. They each extract let’s just say for example customer data. They each extract customer data out of the different ERPs that you host, and maybe out of your CRM system. You’re doing kind of that once, move the data, and then letting people go in and get the data that they need.
Another thing that I see that’s kind of in line with that is people, not necessarily moving way from cubes, but going to a model where we’re going to prototype a quick solution in Excel. Using things like Power query, and power pivot data models, and these things to build something up, and get a top line number out to management and the business users and get some really quick feedback on that. Then iterate again. Then once it’s ready and the requirements are flushed out, and we’ve addressed all the data quality issues, then let IT take that over or the data warehouse team, and productize it. Integrate it into the data warehouse.
Steve Stedman: Yeah, and I think Excel and PowerBI, is a great way to go about doing that, because it let’s those analysts or business users have more of a direct access via that datalake option, without slamming your OLTP system.
Jason Horner: Absolutely, absolutely.
Carlos Chacon: Yes, very interesting
Steve Stedman: Then the fourth pillar we..
Carlos Chacon: We just have…
Steve Stedman: Go ahead Carlos.
Carlos Chacon: All right Steve, go ahead.
Steve Stedman: I was going to say, and the fourth pillar that we have there, is the application life cycle management that you mentioned. Do you want to touch on that a little bit?
Jason Horner: Absolutely Steve. This is something that I’m really passionate about, and it’s an area where the tools have kind of let us down. Historically if you think about working with reporting services, analysis services, SSIS, the source control integration, the ability to do things like continuous integration, those things haven’t been, there hasn’t been a great story there. It’s getting better, it’s getting better all the time certainly, but the ability to use version control. A lot of times I’ve seen environments where their version control is the production database, and they’ll back up and restore cubes, they’ll back up and restore databases. One of the fundamental areas where I see people fall down, and it boggles my mind because it’s so easy to solve, is managing that relational schema for the data warehouse. Even if it’s for an OLTP system, there’s tools out there.
Redgate has SQL compare, they have their SQL source control tool. Even SSDT is a solution I like just because it’s free, for the most part. You can start building these database projects. It can help you support building out a dev, UAT, and production environment. Where you can target builds of the database out. Then initially DAKPACs right, they had a really bad rep. These data tier applications. They didn’t support a lot of features, they were painful to work with, but now they’ve become more of a viable option.
There’s still some problems there, especially working with larger tables, but it’s definitely worthwhile to investigate these things and get those schemas under source control. I think that’s really key. You can’t rely on the production database backup copy to be the golden record for your schema.
Steve Stedman: No, I’ve seen how that can be dangerous. Okay.
Jason Horner: Right.
Steve Stedman: How, let me start over on that, so then how is all of this impacted, or is not impacted, by the newer cloud technologies out there?
Jason Horner: Well I think what it’s done is it’s allowed people to go faster and to choose what things that they want to set up internally, and you don’t have to deal with the IT management side of that, and what things that they’ll let somebody else manage for them. A classic example of this is the source control. You have VSO online, which is basically a cloud hosted version of TFS, Team Foundation Server. You get version control, you get project management, you get all these good things, and support for agile templates, and Scrum, and whatever process you want to follow.
You get builds and all these things, and it’s really great because a lot of times, I remember back in 2006, to set up TFS required 4 servers. You had to sacrifice a few goats, you had to do it on a full moon, and getting these things to work, just before you could make your first check in. That’s ridiculous, I just want to click a few buttons, and start checking in code, right? I don’t want to manage that, I don’t want to worry that oh my gosh, the TFS database is suspect. What do I do? Now I’ve got to go hire Steve to come in and fix the corruption. As much as I’d like to drive some work Steve’s way, I don’t want that headache. I don’t want to slow down my development process.
Steve Stedman: On that, as much as I love working on corruption, I don’t wish it upon anyone.
Jason Horner: Exactly. Some other things to think about is okay, maybe I’m getting data from the web anyway. Maybe it’s, I’ve had customers that get real time feeds from sensors out in the field, so Scada data, from various sources. Well if it’s already coming from the web, why not just, your first step is dump it into HDinsight or dump it into Azure Datalake, and then once it’s there, you can enable some basic analysis from that standpoint, or you can then move it into a hybrid scenario where you’re bringing it on prem, into a data warehouse, that lives on a server that you manage. That’s maybe a little bit beefier to handle very specific reporting use cases, or you could just put the whole thing in the cloud.
The huge thing that came out last year at Ignite, was Azure DW. Not only does it give you an MPP solution, which scales very well, but it gives you a cloud based solution that you don’t have to manage. Then there’s not an initial capital outlay to get going on this. You spin it up, see if it works, if it does great, and you can manage that cost more effectively. If you don’t need it running 24/7, 365, you can scale it up while you need it, and then instantly shut it down and only pay for the storage.
That’s a huge win compared to something like APS, where you’ve got to figure out how many racks do I need, I’ve got to get data center space, there’s very specific cabling requirements. APS is a great solution, don’t get me wrong, but a lot of organizations don’t have the IT expertise in place, or they don’t have the bandwidth to take on yet another thing in the data center.
Steve Stedman: Yes, well having worked recently with the Azure data warehouse, I’d completely agree with what you’re saying there. I mean it’s so easy to get started and get going with it. Compared to other more historic options that we’ve seen in the past, definitely.
Carlos Chacon: Yes. I think it answers that question for the small businesses, because I was saying well we’ve thrown out some of these technologies, and datalake, and I think it could be very easy for a smaller organization to say, well we don’t have all that stuff. I’m just going to stick with what I know. I think you answered that question Jason, is in that the cloud technology makes you, try it for a day, a week. Yes there’ll be some cost incurred, but you’re talking about a couple hundred dollars maybe, maybe.
To see if that’s something that will work for you, and you’ve gone through the process, and you’ve gotten access to, I guess we should define a couple of things. We’ve got Azure data warehouse, that’s Azure DW, it’s an MPP, with massively parallel processing. That basically is big iron, if you will, to be able to do things very very fast. One of the things we didn’t really talk about, that even the first platform to set that up, is how long am I going to have to be able to process all of that, and being able to do that in an MPP environment is going to be faster, or you’ll have more resources available to you, should you need to beef that up for your processor.
Jason Horner:Yeah, absolutely. The pattern I see that people implement all the time is okay, we’ve got data coming in, we’re going to scale up to process this data. We’re using instead of an extract transformation or an ELT pattern, you’re usually using an ELT extract load and transform. You’re using things like CTAS statements, so create table as select, and you’re just kind of doing a tSQL based pipeline there, but you’re scaling up while you need it. Then maybe you dial it back because the reporting’s been offloaded to say a farm of analysis services servers.
That’s what I like about the cloud. I can come up with these great architectures, and not have to wait for IT to catch up with me. I don’t have to sit there and say, here’s this proof of concept where I’ve got a farm of AS servers, that are fronted by some reporting services servers, and I’ve got a SQL DW. I don’t have to wait for IT to open up all these firewalls, and order boxes, and get them provisioned. Then of course the big thing is we’re seeing more hyper convergence, we’re seeing more virtualization, and so what’s the number one problem with virtualization today. It’s over provisioning. If I use a cloud based platform like Azure, they’re managing that for me, so I don’t have to rely on an IT system admin whose knowledge of virtualization is 5 years in the past.
They don’t understand that you can’t over commit memory, and that IOPs aren’t free. You can’t just build all these virtual resources, there has to be physical resources behind it to support the load.
Carlos Chacon: Very interesting, yes, so lots to consider there.
Steve Stedman: I guess before we wrap things up then, oh go ahead Carlos.
Carlos Chacon: I’m sorry, you mentioned some of the tools that we’re a little bit behind. There is one tool, we happen to be recording this during the Ignite conference, and one of tools, one of the announcements that had been made. Maybe not hand in hand with this, but it’s going to be a tool that, it will be leveraged, as you start looking at reporting, Jason do you want to talk about that for a little bit.
Jason Horner: Oh, yeah sure, so one of the big announcements, something I’m really I’m really bullish on. Big inter-geo spacial analysis, and I think, a lot of data warehouses. They have location data, but they’re not leveraging geo-spacial, and so one of the things that Power BI announced at Ignite is this partnership with ArcGIS to provide some backing for maps. ArcGIS is a platform by ESRI or E-S-R-I, depending on what your accent is. They’re a known leader, first mover in the GIS industry. Geographic Information Systems.
They’ve done a lot of work with things like spacial statistics, and really showing how to analyze things beyond just plotting points on a map, or color coding maps. Doing these Porous puffs. You’re seeing a lot more of capability to build compelling dashboards that integrate maps and tabular data, you’re seeing more ability to kind of do mashups, so it looks like there’s capability there, where they can give you demographic information from ESRI’s ArcGIS platform. They have all sorts of things on zip codes, and what census information is there.
They’ve integrated that, they’ve built all that, and so you can layer your data in with that. It really shows that they’re starting to give some love back to some mapping. We know back in reporting services 2008R2 they introduced Bing maps component, and it was great for what it did, but it definitely had some limitations. There’s been other kind of stabs at that. Powerview had some mapping capabilities, extremely limited, but really great product. It used to code named GeoFlow, and is now I think Powermaps. It did 3D mapping right, so if you think about your Google Earth model, and being able to overlay your data on that, and create a video, that you can maybe embed in a PowerPoint presentation. That’s a really compelling thing for businesses.
It looks like Microsoft is continuing to kind of build on that quiver of mapping technologies. I’m excited to see that, and see where this goes. It looks like it’s in preview right now, so I don’t think they’ve completely finalized what they’re going to be offering. It’s definitely something I’m going to be following pretty closely.
Steve Stedman: Well that certainly sounds like an exciting new move there that I’d like to find out more about. As it becomes available. Shall we do SQL family?
Carlos Chacon: Very good.
Steve Stedman: The first SQL family question is on how do you stay up to date with technology? With things changing as fast as they change today?
Jason Horner: Do any of you guys watch Agents of Shield?
Steve Stedman: Oh yeah.
Jason Horner: Oh yeah, so do you guys remember the big quote from Season 2, Discovery requires experimentation? That’s true, that’s something that I’ve really started to live by, is I used to be a big blog reader. I would read every blog that was out there, and twice on Sundays. I would read a lot of books, I would reread books, but then I wouldn’t actually go and experiment with it, and so you wouldn’t find these little gotchas. I still read blogs, and try to keep up with things, but not as religiously as I used to.
What I do is, when I need to get up to speed on a technology, I pull it down and I start experimenting with it. Seeing where the limitations are, trying to come up with scenarios that might break it, or how things behave. That’s really the big thing. Microsoft Virtual Academy is a huge resource for me. I love taking those classes, there’s so much good content up there. It gets you kind of jump started, and then I’ve got an Azure subscription, so I spin up the Amazon new technology, I can build out environments. It’s great for testing some of the high availability technologies. Great for even testing old stuff.
Things like log chipping. Where you don’t necessarily don’t want to install that, or you can’t install that all on your machine, and you don’t want to run VMs locally, because you don’t have the resources or whatever. Azure’s really my hot rod. I do all my development in there, I build things up, I tear them down as I need to. It just helps. You’ve got to experiment, you’ve got to get your hands on this technology. It’s the only way to become proficient.
Carlos Chacon: You’ve been talking about a gamut of tools, but if there’s one thing you could change about SQL server. We’ll allow you to open it up, to kind of a SQL server environment, as for some of the other tools they go along with that. What would you change?
Jason Horner: There’s so many things. The thing that’s really encouraging me, we’re starting to see the SQL Dev team really taking things from Connect, and from some of the other community trouble boards that people have set up, and really starting fixing these longstanding Connect issues. That’s what I’m very encouraged by, but where I think we really need some love is the SSMS, or the management studio project and solutions structure. It’s very outdated. The source control integration doesn’t work very well anymore. I always, even as a DBA, I want to have some way to manage my scripts.
Putting them in a project to me makes a lot of sense, and being able to have folders, and a solution. The way they currently handle connections across multiple projects is a real pain. It’s not ideal for sharing it across multiple developers because things get hard coded in there. It’s just an outdated model. I hope they can kind of give that some love going forward.
Steve Stedman: Yes, I hope they do to.
Carlos Chacon: That’s interesting. I agree with you on the projects, because that source control integration, if you’re used to using … if you’re using TFS, kind of switching to the visual studio while they’re making them more similar every day, is kind of a pain. You’d like to be able to work with your SQL server management studio and integrate with TFS and do that all with that one interface.
Jason Horner: Yes, absolutely, and I think we’re starting to see more and more of that. It always surprises me when, especially working with Azure DW, and APS a lot for different clients, you can’t use management studio, or you couldn’t as of some of the CU updates. Basically you lose all that and you have to work in SSDT, or Visual Studio basically. Visual Studio Shell. Seeing that they’re bringing the telesense over, and it’s getting more parity with the Management Studio query editing, and query execution, where it’s really lagging though is it doesn’t really have a lot of the DBA capabilities built in.
I would hope that at some point we see a very lightweight SSMS kind of environment, where it loads very quickly and it just has the core things that the DBA needs. A lot of the kind of old crusty people are, bring back query analyzer, and all that. I agree with that. Some of these tools are very heavy weight, and they take a long time to load. You start working with them for a while and all of a sudden you’re using 500 gigs of memory, and you’ve got to restart them. I just hope that they can get that source control integration back as well, because that’s something that’s sorely missing.
A lot of DBAs aren’t good with those practices. It’s because the tooling isn’t there. It’s so painful to have to then shell out to a command line and work with GIT, or Subversion, or whatever TFS, any of these versions control providers.
Steve Stedman: Okay, great. Jason what is the best piece of career advice that you’ve ever received?
Jason Horner: I’ve probably got a lot over the years. I think the one thing that’s enabled me to be so successful is really to learn the jobs of the people around me. I don’t mean that so you can tell them what to do, because that’s what a lot of people would try to do. For example, if I go into a client and I have to do some DBA work, well I’m going to have to interact with their infrastructure team, their SAN administrator, maybe their network administrator, and the developers, so I need to be able to talk them on terms that they understand. I needed to be able to talk to the SAN administrator about LUNS and about IOPS, and about all these kind of storage technologies. HBAs, so on and so forth.
I need to be able to quickly develop a rapport with them, and the best way to do that is to really understand pieces of that. Now that doesn’t mean that I’m going to go be the SAN administrator anywhere, or I even have that capability. It just means I can talk intelligently with them. Even things like the project managers. Understanding a little bit about Agile, or if you’re stuck with Waterfall, those kind of management processes, and how they work, how they operate. It just helps you.
It gives you a better rapport with people, it builds your credibility. In doing that, people will trust you. People will often say as maybe an interview question or just a general question. What’s the first thing you build when you start on a new project. People will say well, you’ve got to build the database, or I build the ETL process. Well the first thing I build is trust. I do that by being honest and straightforward with people, and speaking in languages that they can understand.
Steve Stedman: Wow, it sounds like very good advice there.
Jason Horner: Thank you.
Carlos Chacon: Our last question for you tonight Jason, if you could have one superhero power, what would it be and why do you want it?
Jason Horner: Oh man there’s just so many to choose from. I like the last X-men movie where you saw Quicksilver and how he went into the mansion and he had the super-speed, but it was just really cool how they kind of did that. I’ve been a big fan of a lot of superheros over time. Gambit’s ability to take and charge a card and make it explode, I probably could have used that at a few client sites occasionally. I don’t know, it’s a hard question to answer. I believe we all have a super power within us, and that’s the ability to become a better communicator. Ultimately that’s what enables IT. Is to be able to communicate with people, to be able to translate business requirements into technical solutions, and be able to work with other people on your team.
Carlos Chacon: Very good, awesome. Well Jason thanks for being with us tonight.
Jason Horner: Yeah, thank you guys so much. This has been great, it’s been really good talking with both of you guys, and I know it’s a bit of a journey to get this scheduled, but I’m glad we did it.
Steve Stedman: Yes, definitely. Thanks for being here. I appreciate it.
Jason Horner: Yeah, and Steve thank you so much for all you’ve done. You’re community corruption challenges, have really inspired a lot of people to spend some time on that. I think it’s something that not a lot of DBAs run into. To give them these kind of scenarios and have them work through it, and sometimes collaboratively, really increased a lot of the IQ around corruption repair, and perhaps even backup and restore best practices. Kudos to you for all you’ve done there.
Steve Stedman: Well I appreciate that. Thank you.