SQL Server 2016 has several new features and this episode explores the stretch database feature. One of the unique attributes of this feature is that it uses both the on premise instance and the Azure platform.
I chat with Anthony van Gemert, one of the product managers for stretch, about how the features works and under what scenarios organizations might use stretch technology.
Are you planning to use stretch database? Let me know via twitter. I am @CarlosLChacon.
Transcription: Stretch Database
Carlos L Chacon: This is a SQL Data Partners podcast. My name is Carlos L Chacon, your host, and this is episode 25. Today, we’re talking about Stretch Database, a new feature in SQL Server 2016. I have with me the product manager for the Stretch Database, Anthony Van Gemert.
Now, the Stretch Database feature is unique in the fact that it is the only feature that I am aware of that uses both the on-premise instance and the Azure technology in a kind of combination way. So, we’re going to be talking with Anthony about how that works.
I think it may be a niche solution, but I think it’s definitely one that for organizations that want to try out some of the storage options that have large storage requirements, this might be something could take advantage of fairly easily. The integration actually looks pretty, pretty cool.
As this is the first episode of the new year, Happy New Year to you, if you haven’t heard that a million times already. I hope you’ve made some good resolutions this year which includes listening to a few new podcasts. If you’re new, we welcome you, my new compañeros, to the program and I hope you enjoy the content we share here.
As always, I’m interested in getting your feedback, on iTunes or on Stitcher. You can leave your comments also at sqldatapartners.com/podcast. It’s always good to have you compañeros. If there’s something you want to hear, you can reach out to me on Twitter. I am @carloslchacon, I’d love to hear from you and compañeros, welcome to the program.
Children: SQL data partners.[music]
Carlos: Sir Anthony, welcome to the program.Anthony Van Gemert: Thanks for inviting me.
Carlos: Thanks for taking a little bit of time. I know that lots going on there in the Microsoft campus and a windstorm to boot today, so thanks for chatting with us a little bit.Ultimately, today we do want to talk about Stretch. What Stretch Database is and some scenarios around that. Help me understand what Stretch Database is. Why would I want to consider that?
Anthony: Well ultimately, just the basic overview of Stretch is, Stretch is a feature of SQL Server 2016, and what Stretch allows you to do is selectively migrate your existing historical data, your call data, from your on-premise storage solution.It could be a SQL server or your SAN network, and we allow you to selectively migrate this data to the cloud. It’s transparent, your applications, your queries, and your tools and be aware of it.
We don’t impact your operations at all. It’s secure. We secure the data during transit and of course, you have all the additional encryption that always encrypted as you that we pretty support as well. The idea is that once you’ve moved this data to the cloud, it’s always online, always available but what it does is allows free up existing disc space.
This call data, this data that you used to have. A kind of your archive data, the historical data, the data that you didn’t really use very often but important data that you like to access when you need it. We take this off of your network and we store it in the cloud so that obviously it reduces that space you need.
It reduces your costs on maintaining your existing, on-premise data. It essentially reduces cost all around as far as managing storage and actually maintaining your data.
Carlos: I think one of the common scenarios is this I have some large historical data. Maybe this is sales information archive data. I just wanted to keep around that there are users who will query from time to time but it has a large footprint in my environment.Maybe takes up a large chunk of the database. I think the idea here is that I’m going to take one of those tables. I’m going to create an SQL Azure database, and I’m going to push that over there to Azure.
I’m going to get the benefits that you talked about and reduce storage cost and some of the administrative pains that I could potentially avoid there.
Anthony: That’s essentially it. The call data that we talk about is you’re actually right. This is large data. This is sales records. It could be that something patient that may have a medical history. It could be public record if you sort of local government things like that.This is data that is building up in the network. We all know that the data grows and grows. Your IT budget is always shrinking. You can always guarantee that but your data, you can also guarantee this is just going to continue to grow and grow.
We give you the ability. We build in that functionalities that allows you to take this data, identify this data, and push it to the cloud. The beauty of it, you don’t need to…You don’t know have multiple locations to gather this data. This data is still accessed from the local source for local instants.
Your data your user still access it through using same applications, hitting the same source service. Nothing changes. Your policies remain the same. Everything is managed locally.
We just simply give you the ability to go from that local server to the cloud, fetch the data bring it back.
Carlos: That is super impressive, and in that my application is now to change. I’m going to be able to migrate that. Recalling its stretch database but I’m really migrating table by table. I’m going to specify this migration.
Anthony: Actually, table by table. You can migrate your table. Also, we’re going to give you the ability to actually migrate just the individual rows of the table. Yes, you can migrate an entire historical table if that’s how your system is set up.If, for example, you have data that ages out after a certain period, we can give you the ability that you just migrate those rows that you care about so that you can actually keep your hard and your call data in the exact same table.
Carlos: It’s a word of thanks that they will notice [inaudible 06:34] that these things continue to evolve and change. You talked about specifying that row by row data. It’s not currently in the CTP released. Will it be there in the final release?
Anthony: It will definitely be there by RTM. Obviously we will update our users as we release our CTP releases.
Anthony: Yeah, versions. That will definitely be available by RTM next year.
Carlos: OK. Yes, very cool. That will give a lot of control, that ability for the users to be able to specify what they want to move over. You talked about individual tables. Ultimately, I am choosing the tables, then the data within that table that I want to move over to Azure.
Anthony: Correct. The choice of that data, we do provide you with tools that help guide you on which data would we consider Stretch Ready, the data that you don’t access very often. Ultimately the decision is yours. If you’ve decided this is call data and it’s Stretch Ready, you make that decision. You make the decision to migrate it to Azure.
Carlos: It will be interesting to see the 2016 Upgrade Advisor. Lots of need little features back in there, lots of Azure integration that I think this is one of the tools that folks will have to be able to use to understand if their data is Stretch Ready.
Anthony: Absolutely. What we are building with the Stretch Advisor you will get a set of rules built into it that we’ll run against your database and we’ll make a recommendation based on the data that you run through the tool. We’ll make a recommendation on the specific tables and will give guidance on those tables. If there is a table that is Stretch Ready, obviously we’ll identify that.The fewer tables that we consider Medium Stretch Ready will provide guidance on why that’s the case. Obviously again the decision is up to you then to whether you want to migrate that or not.
With each subsequent, a release we’ll be updating those rules as we support more and more functionality in Stretch.
Carlos: Now we’re chatting here in November of 2015. RTM, obviously it’s SQL Server 2016. We know we are still few months out. Things are subject to change obviously. Do you have any guidelines or any thoughts on what some of those Stretch Ready requirements would be at this point?
Anthony: A Stretch Ready table would be a table that you don’t access very often. If this is the table that you’re not making not a lot of traffic on this particular table, not a lot of inserts, deletes, updates, data that is essentially very infrequently accessed.They are type of data that auditors would be looking at historical data where essentially if you order to come, it comes knocking. This is the data that would be running queries against to order your business.
If this is data that your day-to-day applications are hitting on a regular basis, update, insert, delete, then that would not constitute a Stretch Ready set of data.
Carlos: You mentioned auditors. Occasionally, they are going to come and select that. That’s hopefully not more than once a year. If this data was occasionally used even like on a quarterly basis, for example some reports that needed to hit this for whatever reason, would that still be candidate eligible or…?
Anthony: Definitely, that would be candidate eligible. If this is historical data that would need to be accessed as part of your day-to-day business, it’s a historical data like old customer debts you are following up on a customer history, that would be a perfect candidate. You would access this data as part of your day-to-day operations. You would access it from the same applications.You would be making the same updates, deletes, inserts, whatever into your hard data. At the same time, you would be reading data from these cold table and it would be seamless.
Carlos: Let’s go into that and talk a little bit more about the changes that my database would experience one such Stretch. I’ve decided to take some of that, identify that eligible data. I pushed it there into the archive.Some of the things that are going to change is that, now the size of my on-premises database – I guess, I shouldn’t use that word on-premises, because the database is still one and the same. It’s still connected. I can only access it through my on-premise tools. I’m thinking of, if I take a backup of that database…
Anthony: Yes. Right.
Carlos: …that archived data that is now in Azure is no longer part of that backup, correct?
Anthony: Correct. This is how it reduces your maintenance costs significantly. You’ve gone from backing up and restoring potentially a massive database.Now, all you really need to care about is the data that you have on your source database, which is considered to be yours’. This is your active data, your active day-to-day data. You only need to backup and restore that data and manage that data, reduce your perf tuning, index building on that data.
Anthony: A Stretch instance on the Azure side will take care of all the backups and restores and re-indexing and performance tuning in Azure. You don’t have to worry about that. It significantly reduces the amount of efforts to manage that same database.
Carlos: The other change that will happen, should you want to look at your execution plans, once the data is over there, is that, if it’s going to access that data, you’ll see a remote query option or a parameter there and because that data is now going over to SQL DBAs and the database system that’s running and grabbing that?
Anthony: When you’re clearing your data, obviously you’re clearing it from the local source. The feature is coming but we don’t have the feature to allow you to differentiate between, just give me the remote stuff or just give me a local stuff.There is a feature that we’re building in. That’s a functionality we’re building in. At the moment you send your query to the source database.
We package up that query and we send it to Azure. Whatever remote query is done in Azure, we do all the compute there and just send you back the results. The idea is that again, it has minimal resource reduction on your local server when running queries and maintaining a database.
Carlos: All right. You talked a little about what Stretch database is. Under what scenarios we would use it, some of the benefits. But it might be helpful to talk about what Stretch database is not. Right. So it’s not a replication tool?
Anthony: It’s not a replication tool, no.
Carlos: …or a DR tool?
Carlos: Right. In this instance, if the data still exists in only one place? It just happens to be in the cloud and all of the processes or all of the things you’d have to do to ensure consistency and availability for that data you still have to implement?
Anthony: Correct. The tables themselves still exist on your source server. Essentially, we take a mirror of the tables and your indexes and create a mirror in Azure. We’re literally just moving the raw data to Azure. This is not for replication or anything like that. Effectively, it’s a large database but just significantly easier to manage.
Carlos: Sure. Some of that might be helpful as we talk about the migration strategy of that data and I think, again things are changing all the time but there are two migration options. One is the trickle migration, where you can specify where the data will go and then the Stretch database will take care of actually moving those rows slowly, in small chunks at a time versus a, “Yes this table is ready to go over, how about it?”
Anthony: Yes. The current CTP 3 functionality that we have and this is the standard functionality. We have a set trickle rate for getting your data to Azure. We have a set number of rows that get moved at a time and it’s very small. The idea is that once you set the migration going, it shouldn’t impact your local performance. It shouldn’t impact your operations at all.We try to make it as small as possible so it has as minimal impact as possible. The idea is that as soon as you activate and enable Stretch and begin migrating data, it does literally just trickle. Large databases could take a significant amount of time. Obviously, there are cases in countries where the telecom systems aren’t as robust as is for example here in Redmond.
What we’re looking at is a functionality that allows you to, in fact determine what that packet size is and schedule your migrations to be during a particular time where there is significantly more bandwidth available and potentially cheaper. So that you could create a larger package sizes and migrate those during a specific period.
Carlos: Sure. So during the overnight hours potentially I can crank that up a little bit more than one might in daytime.
Anthony: Exactly. The idea is, this isn’t a cloud archiving solution where you create a massive big package and then migrate and push that to the cloud. This is specifically designed to not impact your day-to-day operations. This should run in the background and note it should be utterly transparent to you and you users.
Carlos: Now that I’ve got that data over, we did talk a little about backups. I backup the database locally. I’m not going to read all that. I have lets just say a billion rows, that table is being Stretched, those billion rows no longer exist locally. I back it up. I have a backup minus a billion rows. On the Azure side, I’m giving the redundancy, the reliability of having the multiple copies and things like that.
Carlos: In the event that I needed to restore that database to, let’s say, a test system. What’s that process going to look like?
Anthony: Very simply you would restore your local backup. Restore it to a different system. A part of that backup is the metadata that contains the connection information to your Azure instance. What you do is when you restore that backup to your local instance, you would send command that says create a copy of that cloud data.We create a copy at the same time and you could initiate a connection between the new instance that you’ve just deployed and a copy of that data. It doesn’t impact your production data.
Anthony: You can test against that particular copy. We will support snapshot by RTM. We don’t currently support snapshots the idea being that you’ll be able to deploy multiple systems, for testing using snapshots as opposed to creating a copy. A copy obviously is just more space.We don’t have that just yet, but you can create if you wanted to, set up a parallel system for testing or developing. Restore the database to a different instance. Create a copy at the same time, then do the handshake. Connect the two and your system will be up and running.
Obviously, you can test against that, and when everything is…We’ve done the reconciliation, because what we will do is reconciliation between the data that’s local and the data that’s in Azure. You’ve confirmed that everything’s as you’d expect. Everything’s reconciled. Obviously you could connect to your production.
Carlos: That reconciliation, that will take place all in the background?
Carlos: That’s going to be pretty nice. I had a thought as you were describing that process. What would happen if I were to issue an update command against my archive data? In the test environment I’m doing something silly, and an update statement gets loose. What would happen?
Anthony: Essentially, if you did that, and specifically on the CTP3, the current CTP3 release, we wouldn’t allow that. You wouldn’t be able to actually make that update to the Stretch table.
Carlos: That’s probably pretty smart. Going into that is, again in the case of the backup, is what would you do then if you wanted to take a backup? How would you get that information back to before it was updated? Lots of implications there.
Anthony: You would lose that moment in time.
Carlos: Keeping it frozen or solid helps me make sure that it’s consistent whenever I want to get it back. What kind of testing would you recommend for a company interested in looking at migration?
Anthony: Basically to see if Stretch is the right product for them.
Carlos: Right. It would be more like testing the waters, like, “Hey, we want to try this out, but, you know, we don’t have copies of you know of our terabyte systems laying around.”
Anthony: Essentially all you need is a large set of data that you don’t touch very often. A large set of what you would call your historical data, something that you’re not updating or changing regularly. That would just be a perfect test bed. Essentially install 2016 SQL server and Stretch comes as part of 2016 SQL server and test it on that table.If you’re on MSDN, we’ve just released a new version of the AdventureWorks database to support SQL server 2016. Within the actual package that you download from the MSDN download site, you’ll find a set of sample scripts that have been created specifically for Stretch database.
We’ve created a new table in AdventureWorks called Order tracking. It allows you to insert new rows into that particular table. That table is set up for Stretch. It’s what we’d call a fantastic candidate for Stretch. It allows you to test all the functionality on Stretch.
Once you’ve enabled your Stretch on that particular table, we have a sample script that’s been added to that download package. That can give you step-by-step instructions on how to test Stretch, the Stretch functionality, against that Stretched database.
You’ve got your pre-stretch testing, testing performance pre-stretch, and then stretching the database. You can run multiple queries against the local and the remote data. You can get an idea of the performance and you can get an idea of what…a feel for the performance on that particular table.
Carlos: That would give you an option, it’s set up, you can kick your tires there, get a feel for the process and then once you feel comfortable you begin moving that over.
Anthony: What we would recommend if you’re seriously thinking of trying out Stretch, what we would recommend is running the Stretch adviser over a production database. That will give you a very accurate idea of how Stretch-ready your actual production data is.
Carlos: We know that the Stretch implementation is baked into SQL server management studio now, in the 2016 version. Are there going to be add-ons to SQL server management studio 2012 or 2014 to allow us to Stretch?Let me clarify. I’m assuming that I’m running a SQL server 2016 database, that’s why I need to be able to stretch. Let’s say for whatever reason I haven’t updated my local client just yet. Am I still going to be able to manage those components from 2012 or 2014?
Anthony: With the current implementation, no, you wouldn’t be able to do that. This is specific to 2016. That may be different closer to RTM. I don’t have visibility of that just yet. Currently you’d only be expected to be able to use the functionality in 2016.
Carlos: We’ve taken the time, we’ve kicked the tires, we’ve used the AdventureWorks database, we think we’re ready to go. We’ve moved our data over into Azure, now our database is Stretched. A couple of things, which probably should have gone into testing. Is there best ways to estimate cost, of what it’s going to cost me to have my data over there?
Anthony: There will be a fix. I don’t have the fixed pricing model yet. That will come out closer to RTM, but there will obviously be a compute core cost for setting up your Stretch instance. In the upgraded adviser we will be building in a local storage cost estimate.You’ll be able to, when you run your upgrade adviser, we analyze the databases that you identify and we will give you a…you plug in how much it’s going to cost you per month per gig for your local storage. We give you a total of the amount of data that you have on those databases. You can just plug that into the Azure cost pricing estimator. That will give you an idea of what your storage costs will be in Azure.
As we get closer to RTM, we will have a clearer pricing model where you can actually calculate both the compute and storage costs together, but today it’s storage costs that we have. We can help you get the storage costs, but the actual compute costs unfortunately I can’t, because I don’t have the pricing model yet. I can’t give you that just yet.
Carlos: Sure. We are still in CTP so things are obviously subject to change. Once we were to migrate the database over there and for whatever reason let’s just say that our time has expired. In the case of some government regulation I have to keep my data for seven years. A year has passed. I no longer need what is now the eighth year of data.Do I go and get that back? How am I going to be moving that data back to premise?
Anthony: You will have the ability to un-migrate data. You’d be able to, using the tools, be able to un-migrate that data. You’d also be able to using the normal Azure functionality, be able to actually purge that data from Azure.You can still do that. You can use the Azure functionality for that. If required, you can un-migrate that data. Bring it back from Azure, bring it back to your local system and delete it from there if you want.
Carlos: Probably best practice is going to be to use the Azure tools to purge it from there. Wrapping up a little bit this discussion, we’ve migrated our data, are there any other implications? Some of the things that I either can or can’t use in my system, now that I have Stretch enabled. Availability groups, for example?
Anthony: Availability groups will be supported. Log shipping, yes. That will be supported. We don’t currently support replication, but we’re investigating it. It’s in the discussion, but we don’t currently support it. As we chart towards RTM I may be able to give you a further update on the replication. As far as partitioning, yeah. We support partitioning no problem.Encryption, absolutely. The encryption as far as if you’ve always encrypted, if you’ve encrypted. Whatever you’ve encrypted on your source remains encrypted, obviously. Fully supported there. The big thing that a lot of our users have been talking about is temporal tables. We will fully support that as well.
Carlos: Actually, help me understand that a little bit. When you say temporal tables, what does that mean?
Anthony: These historical tables, you have might data at a certain age of this particular data, this historical data, you could just essentially identify this data to automatically migrate at a certain period, certain time. It’s complete, hands-off, you identify the parameters. It would just take care of it for you.The data itself will still be accessible, you access it the same way, but we would have moved this historical data to Azure.
Carlos: It’s almost like a sliding window type thing. Once my data is X age, I’m going to move it over. You mentioned partitioning. One of the features that will be available in RTM which we talked about, will be the ability to define which rows, or based on a certain age, something like that. Say, “Yes, go ahead and move those over.”If I have a currently partitioned table, will that table be able to be stretched?
Anthony: Yes, it will be. That will be fully supported.
Carlos: That will be very nice. I think actually probably more likely is that there will be tables that got a little too big, a little too fast, maybe? That weren’t partitioned. [laughs]
Anthony: We have customers who have, I think the largest one was 1.3 trillion rows, obviously partitioned. I believe the majority of that is call data. We would certainly…That is a perfect candidate for Stretch.
Carlos: Very good. Well, compañeros, there you have it. A little bit of information about Stretch database. As we mentioned, as things get a little bit closer to CTP, we’ll bring Anthony back in, have a little chat, and get some of the updates on the team, what they’ve been doing, and make sure that you have the latest information.Anthony, before we let you go, we want to switch gears here just a little bit. Now we’ve gotten to the SQL family part of the program. I’d like to talk a little bit more about what makes you tick, the things that you enjoy. One of the things we enjoy talking about is different tools.
What tool do you enjoy using? It could be a free tool, paid tool, how do you use it and why do you like it?
Anthony: I suppose in a SQL sense, I would say definitely would be my upgrade adviser tool. I love woodwork, so my favorite tool is actually a table saw. [laughs]
Carlos: Very nice.
Anthony: That’s how I relax.
Carlos: What’s your current woodworking project?
Anthony: Nothing at the moment because it’s too wet at the moment. The woodworking season has passed.
Carlos: If you could have one superhero power, what would it be and why would you want it?
Anthony: Breathing underwater.
Carlos: Breathing underwater, Aquaman. OK, very good. Are you a big scuba diver?
Anthony: I do like scuba diving, yes. The idea, there’s so much unexplored planet underwater. Imagine being able to go wherever you want to explore that. The rest of the world that other people haven’t been to yet.
Carlos: Right, all that buried treasure down there.[laughter]
Anthony: There’s that, of course. There’s that as well. You’ve got to fund this somehow.
Carlos: That’s true. All that time you’ll be taking there. Anthony, thank you so much for being on the show again, I do appreciate it.
Anthony: Thanks a lot, Carlos. It was a pleasure being here.
Carlos: Compañeros, you can check out sqldatapartners.com/podcast for the show notes today. We will have information available as to the things get a little closer to CTP. We’ll see you on the SQL trail.[background music]
Children: SQL data partners.