Episode 167: Azure SQL Data Sync

Episode 167: Azure SQL Data Sync

Episode 167: Azure SQL Data Sync 560 420 Carlos L Chacon

Azure SQL Data Sync has been generally available for almost a year now. Have you heard of it? Are you using it? Microsoft’s page tell us that “Data Sync is useful in cases where data needs to be kept up-to-date across several Azure SQL databases or SQL Server databases.” When you want to learn about a new product or feature, who better to ask than the Senior Project Manager in charge of it? Today we chat about Azure SQL Data Sync with Xiaochen Wu from Microsoft. He explains how it works, why you might want to use it, it’s current limitations and what’s in the pipeline in the way of improvements. It’s an exciting time to be in technology and a great conversation!

Episode Quotes

“Azure SQL Data Sync allows you to synchronize your data between Azure SQL Database and any other SQL endpoint.”

“Data Sync will allow you to choose which table and which column in the table you want to sync. So basically, you don’t have to sync the whole database, you can only sync the data which needs to be synchronized among all of those databases.”

“Data Sync [is] a managed service. Once it’s set up, it should just work. When it stops working, you can call Microsoft and we will look at it or we’ll fix it.”

Listen to Learn

00:40     Intro to guest and topic
01:40     Compañero Shout-Outs
03:35     SQL Server in the News
05:30     SQL Trail Announcement
06:19     What Azure SQL Data Sync is and what it can do
09:09     Why you would choose this feature over replication
10:51     Doing bi-directional syncing over multiple databases with sync groups
13:13     What happens if it’s a ‘member wins’ but two members have different values?
15:18     How Azure Data Sync can help you understand performance
17:30     Microsoft is working on giving periodical updates on your sync
18:42     Knowing what you want to sync will help you determine your cost
21:04     What syncing distributed databases looks like from an architectural perspective
23:40     Comparison to CosmosDB
25:34     What does the Sync Agent do?
29:05     Some limitations with Azure SQL Data Sync – database requirements
31:02     The initial process to getting started with Azure SQL Data Sync
32:35     What happens with schema changes?
34:15     Thank you to Xiaochen and Microsoft for being willing and available to help us
34:50     SQL Family Questions
40:12     Closing Thoughts

About Xiaochen Wu

Xiaochen Wu is a Senior Program Manager on SQL Team at Microsoft. He has been working with SQL Server and Azure SQL Databases for more than 10 years. Xiaochen is now working on backup and data integration for SQL Server and Azure SQL Databases.

Credits

Music for SQL Server in the News by Mansardian

*Untranscribed Introduction*

 Carlos:             Compañeros, welcome to another edition of the SQL Data Partners Podcast. My name is Carlos L Chacon, your host, and I’m excited today to be joined by Xiaochen Wu, who is the Program Manager at Microsoft responsible for bringing us– well, responsible and his team, for bringing us Azure SQL Data Sync, which is our topic today. So Xiaochen, welcome to the program.

Xiaochen:       Thank you, Carlos and hi everyone. Also, I’m the Program Manager from the Azure SQL Database team, so I’m looking at some features including backup restore and other data integration features, including Azure SQL Data Sync. So, I’m glad today I have the opportunity here to talk about Data Sync with Kevin, Carlos and other people here.

Carlos:             Awesome, yeah, so we do want to welcome Eugene.

Eugene:           Howdy.

Carlos:             And Kevin is here, as well.

Kevin:              Ahoy.

Carlos:             Angela is neck-deep in a project we’re actually working on, so we’re going to excuse her for today’s episode. Okay, so before we get into the conversation, I do have a couple of shout-outs I want to give. So, first of all to Stephanie Bruno, for putting together the Power Platform Conference, helping put that together in DC. I’ve got Lumina Gatbonton, Jonathon Lipke, Brett Burnam, Swati Goel, DeNisha Malone, who is over there on the West Coast, now. Naga Perni, giving us some love. He’s giving us a little bit of love on Episode 81, the Index episode with Randolph West, so thanks, Naga, for giving us the shout-out. Doug Purnell. We missed Doug. We had an event here in Richmond here just recently and we didn’t see Doug. Doug was over supporting his son, with some fundraising, so good for you, Doug, for staying home and taking care of your family. Giving a shout-out to David Stokes and the rest of the folks that we saw at SQLSaturday RVA. You know, Kevin didn’t make it up to RVA, I noticed.

Kevin:              No.

Carlos:             It was a big pull, I think.

Kevin:              I ended up going to Cincinnati.

Carlos:             You’re forgiven. Okay, Eugene, did you want to do a little–

Eugene:           Well, yeah, two things with that. I attended Angela’s pre-con and want to shout-out to her because I’m going to steal all of her pre-con ideas about how to teach about Power BI. And then Carlos, I didn’t know if you know, but Stephanie Bruno actually lives up here in Pittsburgh. She’s the co-leader for the Pittsburgh Power BI User Group.

Carlos:             Yes. Yes, so you are very lucky to have her. Very cool. They did a great presentation on the Elizabeth Glaser Foundation that she works for and some of the cool things they’ve able to do with Power BI and how that’s been able to help, ultimately, their AIDS, I shouldn’t say research, but their cause with AIDS and treatment. So, yeah, very cool.

Eugene:           Very cool.

Carlos:             Okay, so time for a little SQL Server in the News, or maybe I should say lack thereof. We are recording this after the first day of April, and again, I’m going to blame it on the fact that I was at the Power Platform Conference up in DC, I’m going through social media, and Yvonne, who’s one of the other Program Managers for SQL Server, posts something about CosmosDB support in SQL Server 2019, which I fell hook, line and sinker for. Until I was trying to– I’m like, “oh yeah, I need to remember that. I need to put that in the SQL Server in the News.” And I go back and I click on the article, and I’m like, “huh”, I’m trying to read more about it and then I’m like, “ta da, April Fool’s”.

Kevin:              Yeah, when you realize the date it’s posted.

Carlos:             That’s right.

Kevin:              So I saw that and the first thing I thought was, “well, shoot, that’s just Polybase.”

Carlos:             Yeah. Yes, that was funny. So I’m even more embarrassed, because I was chatting with some other Microsoft folks about it and they were like, “oh, woah, that’s interesting.” And so anyway. So some other news, we’re going to give a shout-out to SQLSaturday Raleigh.

Kevin:              Yeah, so SQLSaturday Raleigh is coming up April 27th. It is the best, the premier event for data platform on April 27th in the city of Raleigh in the building that we’re going to be at. I make no claims about outside buildings, there may be better events that day. But sign up SQLSaturday.com, click on Raleigh, sign up for Raleigh, be at Raleigh. There will be people there, more people than just me. It won’t be just me, I promise.

Eugene:           Fact checked, true, 100% true.

Kevin:              You’ll get to see Angela. You’ll get to see basically the entire SQL Data Partners group in action.

Carlos:             That’s almost true, that’s right. Several of us will be down there, that’s right. Okay, and so I have one other announcement I want to go ahead and make and that is the SQL Trail dates have been announced. So, we’re going to have SQL Trail on October 16th through the 18th and our website should be up by the time this episode goes live. We’re going to be making a couple of other changes this year, but one of the big changes that I’ll give is that the Friday event is what we’re going to call it will be much bigger than in past years, and we’re looking to incorporate, and I’ll say maybe be a bit more traditional in our approach to conferences on Friday. So we’ll have more details there, but because I know October is a very busy conference time, for those of you who are looking for something to do and can’t travel to far places, October 16th through the 18th will be in Richmond, Virginia. Okay, so let’s go ahead and get into the conversation with Azure SQL Data Sync. The show notes for today’s episode will be at sqldatapartners.com/ads, which hopefully there’s not a lot of other abbreviations out there for that, or sqldatapartners.com/167.

Kevin:              The Azure Data Studio episode’s going to be very angry.

Carlos:             Yeah, yes, that’s true.

Eugene:           That’s funny. Actually, related to that, I finally understood why Azure Data Studio is a thing, because Erin Nelson showed me how to work with SQL, Power Shell and Docker all in the same tool and it finally clicked for me why they made a weird SSMS clone.

Carlos:             Oh, interesting.

Eugene:           Yeah, so if you’re into devops or scripting or that kind of stuff, I see the value, for sure.

Carlos:             There you go. Okay, so Xiaochen, let’s go ahead and jump into Azure Data Sync. So I guess first we should go ahead and let’s define it, and then I know a lot of the questions that we get are about what it’s not, or maybe some comparisons to other things. But first of all, give us the 10,000ft overview of what is Azure SQL Data Sync and why would people be interested in using it?

Xiaochen:       Sure, yeah, definitely. So basically, Azure SQL Data Sync is a feature we have in Azure SQL Database. It allows you to synchronize your data between Azure SQL Database and any other SQL endpoint. So it could be another Azure SQL Database, it could be a SQL Server running in your on-premise data center. It also can be a SQL Server running other cloud vendor’s environments, for example, AWS RDS, and it enables multiple scenarios. So first scenario it enables is very important, which is a hybrid scenario. You can have your SQL Server deployed on-prem and have your SQL workload deployed to Azure SQL Database and then running your workloads and run your services in Azure. And you can sync your data between those two databases and Data Sync allows you to do one-directional or bi-directional sync. So when it’s doing a bi-directional sync, you can do rewrite workload on both endpoints, which means you can have active-active model between two databases. And another important scenario Data Sync enables is that you can deploy your applications globally very easily using Data Sync service and also, I believe we have about 70 Azure data centers around the world, so you can deploy applications close enough to your end user and use Data Sync to sync the data which your application is using and sharing, end line.

Carlos:             Wow, yeah. So you’ve got, even then, that idea of being able to get data a little bit closer. We’ve been trying to do that for a long, long time.

Xiaochen:       Yep.

Carlos:             So why would I choose this feature over something like replication?

Xiaochen:       Sure, yeah, the biggest difference between this and replication are, first off our replication is a one-way replication, if you’re talking about transactional replication. And if you’re talking about merge replication, it only works in the on-premise world, which means SQL Server. It doesn’t work in any cloud, any SQL Server running in cloud. It doesn’t work in Azure SQL Database, it doesn’t work in AWS RDS. So when you are trying to do a bi-directional synchronization between your Azure SQL Database and the SQL Server, basically using Data Sync is your only choice here, unless you want to build your own tool or your own service to do that, and also, Data Sync allows you to sync between any SQL endpoint. For example, transactional replication is a feature only available in SQL Server Standard and above editions, so Standard and Enterprise. So if you’re trying to sync data, for example, between Azure SQL Database and SQL Server Express, you can use SQL Data Sync, because Azure SQL Data Sync supports any version and any edition of SQL Server. Well, when I’m talking about any version, I’m talking about starting from SQL Server 2005. At this point we don’t support SQL Server 2000.

Carlos:             Sure.

Kevin:              It’s a little surprising you go back that far.

Xiaochen:       Well, there are still people using 2000, for sure.

Carlos:             Yes, unfortunately, but true. Okay, so this was announced for general availability in June of 2018 and so again, this is another tool available here for us. Now one of the cool features is that idea of this bi-directional sync. So then, tell me, when I have two databases, I update one, it’s going to update the other, I update the second, it’s going to update the first.

Xiaochen:       Yes.

Carlos:             You mentioned the number of data centers. Let’s just say for easy math, because I only have so many fingers here, let’s just say I have four databases, at what point, or what maybe considerations do I have to start thinking about if I’m actually going to be updating four databases and what that sync’s going to look like?

Xiaochen:       Sure, so Azure SQL Data Sync is using a hub/spoke model, which means, for example, if you have four databases, and you want to sync data among those databases, you have to choose one of your databases as a hub database. It can be any of your databases, as long as it is an Azure SQL Database. And all of the other databases, all the other three databases, we call it a member database. So, your member databases can be Azure SQL Database and it can also be a SQL Server running anywhere. So when you are setting up the Data Sync, you just go to the Azure portal and find your hub database and create a unit, which we would call a sync group and when you’re defining a sync group, you need to give us a credential to access all your endpoints, including the hub database and adding the member databases, which could be in a different region or could be on-premise SQL database, to the sync group. And after that, Data Sync will allow you to choose which table and then which column in the tables you want to sync. So basically, you don’t have to sync the whole database, you can only sync the data which needs to be synchronized among all of those databases. And when you’re doing a bi-directional sync, Data Sync will also help you to resolve the conflicts automatically, so you can define the conflict resolution policy and say it’s hub-way or member-way, and furthermore, you can also schedule the sync. So you can schedule to say, “I want to sync the data every 5 minutes, every one hour,” and we will do the syncing from the service, so you don’t have to go to the portal and click the button to sync the databases.

Carlos:             Now, a question on which one will win. In the scenario where we have the four databases, is that then a choice I’m making at the Azure Data Sync level, if you will, meaning that in all of the syncs the hub will win or all of the member databases will win or is that like as I add one, I get to choose how that works? Does that make sense?

Xiaochen:       Yeah, yeah. So if I understand correctly, the question is the conflict resolution policy, it’s a member level configuration or it’s a sync group level configuration.

Carlos:             Right, there we go.

Xiaochen:       This conflict resolution policy is a sync group level policy, so it applies to the whole sync group, every member in the sync group. The major reasoning is that if you set it on a member-level and then you set different directions on the different members, it could lead to an infinite loop.

Kevin:              So, let’s say I’ve got two member databases. Both of them update the same row during the same period. We then sync and my setting is that members override the hub? To make it more difficult, we update the hub, we update the two members, they’re three separate values. How does conflict resolution deal when we have a member wins, but two members have different values?

Xiaochen:       Yep. So in that case, you will get a random value from one of the members, depending on which member synced with the hub first.

Kevin:              So it’s first write wins instead of last write wins?

Xiaochen:       No, the last write wins.

Kevin:              Ah, gotcha, last write wins, okay.

Xiaochen:       Yeah.

Kevin:              Okay, I’ve dealt with merge replication conflict resolution terrors and I’m not a huge fan of last write wins, but it’s still better than merge replication conflict resolution terrors.

Xiaochen:       Yeah, I mean for merge replication, the good thing is that you can define your own conflict resolution policy. For Data Sync, we are working on something like that right now, but it’s not available yet.

Kevin:              Okay, so since I have the mental model of replication as the thing that seems closest that I can compare against in the on-prem world, one of the trickiest parts I’ve seen with replication, transactional or merge, is around diagnosing performance issues. We have the idea of a tracer record that I can add and get a little bit of information, but even that can be difficult to diagnose problems. What kind of tooling or what’s been surfaced in Azure Data Sync that could help me understand how performance is going and what might be slowing things up?

Xiaochen:       Actually, that is one of the biggest challenges when people use Data Sync on a large scale, because today, Azure SQL Data Sync, we are using triggers to track change instead of like transactional replication or merge application, it’s log-based trend tracking. So that may introduce some overhead on your workload, which is only as a source or destination database, because when you’re running inserts and updates, it will file the trigger and also when we are doing the sync, we have to join between the base table and also the tracking table to figure out what the changes are, and then we apply those changes to the destination. You can see the pattern here is quite standard, it’s do a join and then apply the changes. And if you are running SQL Server 2016 or above version or you are running Azure SQL Database, we recommend you use QDS, the Query Data Store, to track the performance of those queries. And it’s very easy to tell, because we have a different scheme for Data Sync’s stored procedure and Data Sync’s tracking tables, so you can tell which query is triggered by Data Sync and you can track the query performance and you can also track the resource usage using QDS. If you are using an earlier version of SQL Server, where QDS is not available, I guess you have to use SQL Profile or XEvents.

Kevin:              So, is there a way for me to see how far behind a sync may be? Like let’s say I pushed the button a few minutes ago, but the boss is coming in and saying, “hey, are we done yet? Are we done yet?”

Xiaochen:       Yeah, you can go to the Azure portal and there’s a refresh button to refresh the log. We will log when the synchronization is completed, and we’re also working on some improvements to make this more granular so we’ll have periodical updates, even when the sync is running. So because sometimes it’s very hard to predict the performance, especially, I would say 70% of our customers is using Data Sync to do this hybrid sync, which means between an on-premise SQL Server and Azure SQL Database. The performance of a hybrid sync can depend on so many factors: the network and how busy your source database is and how busy your destination database is. So we are thinking to give you some data, which you can do the prediction, but it’s very hard to ask it to give you something like a percentage to say it’s 20% done or something like that.

Carlos:             One of the questions I had, so you have to use Azure SQL Database as the hub so it’s involved there, but from a sizing perspective, ultimately it sounds like whatever you want to sync is then going to help you determine your cost, because that’s going to be the level of Azure SQL Database that you need.

Xiaochen:       Yes, that’s correct. So, talking about costs and except the hub database and a member database, we also need another database, which sometimes we call it a sync database and sometimes we just call it sync metadata database, to store all the metadata and also the log. We highly recommend our customers to use the dedicated database for that, because it’s very easy to mess it up. If you delete some table from the metadata database, there’s no way to recover it. And for that database, you can use a very small Azure DB, I would say a basic database should be enough, unless you have a large number of sync groups. And for the hub database, and you are correct, the performance overhead, I will say the extra resources you need to provision to support the sync between your hub database and the member database highly depend on how many tables and how many columns you are syncing and how many changes you are making on those tables, basically the change volumes.

Carlos:             Right. Okay, now, is there normally parity? You mentioned a lot of the scenarios you’re seeing is the hybrid approach where I have an on-premise database and an Azure SQL Database. Are those two pretty much the same generally, or there’s all kinds of different sizes?

Xiaochen:       I would say the performance overhead will be similar between the on-premise database and the Azure SQL Database, but it also depends on the sync direction. So, in general, the performance impact on the target database will be lower than the performance impact on the source database. I know it sounds weird, but that’s what happens, because on the source database, especially if you are making a lot of changes on your table, on the source database we’re going to join this base table and the tracking table and when you have a large table and you have a lot of change, this join can take a lot of resources and run for a considerable amount of time.

Carlos:             Sure. So we have more and more developers listening to the show and so one of the ideas, maybe from an architectural perspective is, now I have distributed databases in different places, and I’m assuming that I’m going to have distributed applications as well. From an architecture perspective, are– I don’t know if it’s one-to-one, if that makes sense, but the application that’s going to be pointing to that replicated or synced database, Azure SQL Database, and then in the case of the United States is where, because again, I’m in Richmond, Virginia. So, Richmond, Virginia’s where my database, my on-premise database lives, and where my application currently is now, but I have clients that I need to service in Europe and in Australia, so I’m going to create replicas there. Are these applications just going to those databases and then we’re just going to turn on Data Sync so that behind the scenes it ultimately takes care of all of that?

Xiaochen:       It depends on what your requirement is for this replication of your database. So if you only want a read-only endpoint for your database in another region or for a specific client, you have much more choices. You can use Data Sync, you can also use something like transactional replication to replicate, especially when your source database is on-prem SQL Server, or Azure SQL Database Managed Instance, you can use transactional replication to do a one-way replication to your destination database. So that will give you the minimum overhead on your database, because transactional replication is log-based instead of trigger-based. And also, Azure SQL Data Sync is optimized for bi-directional sync, so it has some overhead when you’re doing the one-way sync, compared to the transactional replication. But if you want to have a fully distributed application which works like a peer to peer in different regions for different clients and you want to synchronize some metadata or share some data between different deployments, then Azure SQL Data Sync will be the feature for you. And also because it is part of Azure SQL Database, it’s a managed service. So one of the challenges for our customers who use transactional replication is that you have to create this whole publisher/distributor/subscriber topology and then manage it by yourself. So for Data Sync, it’s a managed service. Once it’s set up, it should just work. When it stops working, you can call Microsoft and we will look at it or we’ll fix it.

Carlos:             Okay, interesting. Yeah, so that is a little different approach. They are taking some of the effort of the load off of there.

Xiaochen:       Yep.

Eugene:           So I have a question. We already compared this to replication. The other kind of comparison that comes to my mind is CosmosDB. And so, my understanding in CosmosDB is you get kind of this almost slider of what amount of transaction consistency you want. So you can choose all the way from strong consistency where everything has to be totally in sync, to eventual consistency and you have some options in between. If you are starting a Greenfield project, so brand new, you don’t have an existing requirement for a specific schema and you don’t necessarily need kind of your standard SQL schema, is there any reason that you might recommend going with Azure SQL Database plus Azure SQL Data Sync, compared to say, just starting fresh in CosmosDB?

Xiaochen:       It totally depends on your requirement of the application. So if it is the case that you just mentioned, I’m starting a new project which doesn’t have any requirement to use a relational database, because CosmosDB is not a traditional relational database, it’s more like a NoSQL Database, so it’s schema-less and if that works for you, and you want to have a global distributed application deployed, then CosmosDB will be a perfect choice for you. You don’t have to use SQL Database and go through other relational models and use Data Sync to sync data. CosmosDB, you can just configure the synchronization and your data will be automatically distributed globally. If you’re relying on some features or some characteristics in the relational database, or say you are relying on some feature which is only available in SQL Server, then you can deploy your application to SQLDB and use Data Sync to sync the data across regions or say, across deployments.

Eugene:           Gotcha, no, that makes sense.

Carlos:             Now when we interact with a database on-premise, one of the other pieces we need is a sync agent. Any thoughts from an overhead perspective or is that– it just is a service that connects back to the hub? I guess maybe talk a little bit more about what the sync agent does.

Xiaochen:       Sure. So the sync agent is basically a Windows Service. So you have to install it on a VM which has access to your on-premise database and also have access to the Azure SQL Database. When I’m talking about the Azure SQL Database, I mean both the sync metadata database and also the hub database. So first of all, you need to open the outbound 1433 port to allow the sync agent to talk to Azure SQL Database. And on the Azure SQL Database side, you need to configure the firewall rule to allow your sync agent to connect to the database, and of course, you need to give us a correct username and password. And today, the sync service is a 32 bits application, so you don’t have to install a very powerful 256 gig memory machine, because it can only use up to 4 gig of memory. We are preparing a new release of the sync agent, which will bring us a 64 bits release. And also, I believe we released a new version of sync agent back in January or February, which we added a very important support, which is a command line configuration and installation support, which means you can install and config your sync agent from command line instead of only from UI. So that also enabled a very important scenario is that you can run a second agent in a Docker container. For example, if you have a SQL Server running a Windows Docker container and you can install Sync Agent in the same container or a different container, which can connect to the SQL Server and use Data Sync to sync data between a SQL Server running a Docker container and Azure SQL Database. So that allows you to, for example, if you’re using Azure IoT Edge and you can use this to sync data between your Azure device and also the Azure environment.

Carlos:             Wow. So is there anything that we won’t be able to container-ize here?

Xiaochen:       Well, the limitation right now is that you can only use a Windows container for the sync agent. We don’t have a Linux version, yet.

Carlos:             Gotcha.

Kevin:              Is that on your roadmap?

Xiaochen:       Yeah, it’s definitely on the roadmap, but I don’t have a timeline for that.

Kevin:              That’s fair.

Carlos:             Sure. So speaking about timelines, I noticed that another area that’s not quite supported is the SQL Server Database Managed Instance. Is it the same idea, there?

Xiaochen:       Yeah, that’s definitely in our backlog as well, and that’s actually one of the highest priority work items we’re having right now. The reason we didn’t enable that by the time of GA is that we have transactional replication and that can meet, I would say 70% of the requirement of data replication in Managed Instance. So we’re saying, “okay, Data Sync can wait” and we can prioritize some other more important features re-enabling Managed Instance. But yes, it’s definitely coming.

Carlos:             Okay, very cool.

Kevin:              So, when you’re setting up Azure Data Sync, what kinds of database requirements do we need to have? I’m actually looking through the list, so I’m cheating, but for example, every table has got to have a primary key.

Xiaochen:       Yes.

Kevin:              Are there any other big things that might catch a lot of people that, “hey, I have to make this change to my database before I can use Azure Data Sync”?

Xiaochen:       Primary key is one of the limitations here, and another limitation we’re having here is that we don’t support different schemes. So basically, if you have a table with the same name and a different schema, that will not work with Data Sync.

Kevin:              Oh, so if you have for example, dbo.table1 and staging.table1?

Xiaochen:       Yes, that will not work with Data Sync because we cannot distinguish those tables between different schema. That’s also something we definitely need to fix in the next semester, I will say, the next release cycle. And there’s some data types we don’t support. I believe, only a few, and I don’t have the list.

Kevin:              That’s fair.

Xiaochen:       Yeah, I cannot read it, but you can definitely find those limitations from our online documentation. And another limitation we have is you cannot have a DATETIME TO as the primary key. I mean, you can, but that will run you into some issues, because we cannot support the precision of the DATETIME TO–

Kevin:              Past one second.

Xiaochen:       Data type as the primary key.

Kevin:              Yeah, so the notes say that it’s only supported to the second.

Xiaochen:       Yes.

Kevin:              So if I do have a DATETIME TO that is like a DATETIME TO 0, where I’ve only got the seconds, then it’s still okay.

Xiaochen:       Yes, DATETIME TO 0 is fine. DATETIME TO 7, I think, yeah, seven will be an issue, here. I think you can use up to, 1, 2, 3, 4.

Kevin:              Okay, okay.

Carlos:             So when we talk about, I guess, seeding or starting the initial process. Again, I guess I’ll use the example as I have an on-premise database, Richmond, Virginia. I now want to make this available in different nodes. Am I restoring that to Azure SQL Database and then starting the sync, or is the sync going to, again, to use an Availability Group work, ‘seed’ the database and move all the data up there?

Xiaochen:       So at this point you can only use Data Sync to do the initialization, so when you are doing the initialization we recommend you to keep the target table empty, so in that case, Data Sync will use a bulk insert to move the data from your source database to the target database. If there’s any data in your target table, even though they might be identical, Data Sync won’t know that, so the Data Sync will do a row by row insertion from your source to your destination. Basically, it will compare every single row and then that will run for a very long time. And we’re working on a feature right now which we can allow our customers to skip this initialization, so basically you can, for example if you have an on-premise SQL Server and you have a Managed Instance, you can just do a backup and restore and then set up the sync and skip the initialization because you know those two tables are identical. You don’t need to do a bulk insert to the initialization to your target table before you start the synchronization.

Eugene:           So, a related question, how does it handle schema changes? So I have a table, I add a column, then what happens?

Xiaochen:       So Data Sync will not sync schema changes. For example, if you are adding a new table, you need to add that table to the sync group manually. Data Sync will start the initialization on that specific table after you add that table. And if you are adding a column, you also need to add a column to the sync group manually and basically, any schema change you are doing, you need to update the sync group manually to affect those changes, unless it’s a data type change and the new data type is compatible with the old data type. So for example, if you are changing from BIGINT to INT, you don’t have to do the update, because it’s compatible. And we do have a solution here to use Data Sync to sync the schema change, so basically you can configure a DDL trigger and have a table to sync all the schema changes among all the databases. I believe we have a blog or something about our documentation about that. If you search Azure SQL Data Sync Schema Change, you will be able to find that.

Carlos:             Okay, other questions or other things we want to hit on before we wrap up?

Xiaochen:       I think we covered most of the commonly asked questions from the customers, and if you have any further questions, feel free to post a question on the Microsoft Forum. We have online documentation, we have troubleshooting guides and we also document all the limitations for Data Sync.

Carlos:             Yeah, so I guess I should say, on behalf of everyone, and I know it wasn’t your decision, but thanks for making yourself a bit more available. I mean, even coming on this podcast, having the Program Managers make posts, and be available to receive questions and answer them, it has been extremely helpful to all those trying to implement these new features or understand them better. Yes, so we do appreciate your time, there.

Xiaochen:       Sure, yeah, my honor.

Carlos:             So should we go ahead and do SQL Family?

Xiaochen:       Sounds good to me.

Carlos:             What is your favorite movie? All-time favorite movie?

Xiaochen:       It’s the Dark Knight.

Kevin:              That’s a good one.

Xiaochen:       The Batman, yeah.

Carlos:             Okay, so I guess the first one?

Kevin:              No, that’s the second one.

Xiaochen:       The Dark Knight is the second one.

Carlos:             Oh, the second one? Okay, good. City or place you most want to visit?

Xiaochen:       The Antarctic, the South Pole.

Carlos:             Very nice, man of my own heart, there.

Xiaochen:       Yeah.

Carlos:             Now what makes you want to go to the Antarctic?

Xiaochen:       It’s just far enough. It’s sounds very interesting and it’s going to be a long trip to get there, so I hope I can get a chance to visit sometime. I like to travel, so I travel to a lot of places, but that would be my ultimate target.

Carlos:             Okay, very nice. A food that reminds you of your childhood?

Xiaochen:       I would say it’s an ice cream bar, looks like a clown face, just like a kid’s face. It’s mixed of vanilla ice cream and chocolate ice cream, so the hat and the eyes and the mouth of the face is chocolate and the rest of the face is vanilla. It’s very cute and very tasty. I can’t find it anymore.

Carlos:             Yeah, I was going to say, is it still available? I don’t know that I’ve seen it.

Xiaochen:       Yeah, maybe you can still find it somewhere.

Carlos:             Okay, very good. So now you have to tell me, so where did you grow up?

Xiaochen:       Oh I was born in mid-China. If you are a big fan of martial arts, it’s very close to the Shaolin temple and I moved to Shanghai with my parents when I was four years old and then I lived in Shanghai for about 20 years. I went to college in Shanghai and I worked in Shanghai for 5 years after graduation and before I moved to Redmond.

Carlos:             Very nice, very cool. Now, how did you first get started working with SQL Server?

Xiaochen:       That was a school project in college and we were using SQL Server 2000. I don’t remember what exactly that project was, but it was some database project.

Carlos:             Okay, so you can go back and send your professor thank you notes for choosing SQL Server. No, I’m just kidding!

Xiaochen:       Definitely, but I believe at that time, we used Oracle more than SQL Server, but I really hate Oracle because the installation is so complex.

Carlos:             Sure, sure. Now, obviously you’re the Program Manager and you’re working to help improve things, but if there was one thing that you’d like to change about SQL Server, what would it be?

Xiaochen:       I won’t say it’s a change, but if there’s one feature I can add to SQL Server, I would say it will be the multimaster support. Because that would be a very powerful feature for SQL Server, I would say for any relational database. But it’s also a very big change for us, so that would be a very interesting project to work on as a Program Manager.

Carlos:             No, that’s right.

Kevin:              Yeah, even with CRDTs, multimaster is– it’s the Holy Grail, basically. I would love to see that.

Xiaochen:       Sure, definitely.

Carlos:             And ultimately, I mean, maybe to make sure that I’m understanding this right, that’s basically the rise of NoSQL, is to be able to push that out a little further?

Xiaochen:       Yep, so basically, I think the multimaster in SQL, I would say that’s a way to find a good balance between the relational data model and the NoSQL. And yeah, I mean, our customers definitely have the requirement, have the needs to have multimaster support in the relational database world and sometimes NoSQL just doesn’t work. For example, those mission-critical platforms, they cannot afford some data loss or some transaction level inconsistency, so they still have to use relational model, but multimaster will be a cool feature for them.

Carlos:             No, that’s right. Now what’s the best piece of career advice you’ve received?

Xiaochen:       I would say that best advice I received from my first M3, so basically my boss’s boss’s boss. So he said to me that “just don’t stay in your comfort zone. You need to go out of your comfort zone and always get challenged, so that’s how you grow.”

Carlos:             There you go, and that’s why you’re taking on multiple programs at Microsoft right now.

Xiaochen:       Yeah, exactly, yeah. Actually, I’ve been doing different roles in Microsoft. I started with Test Engineer and then converted to a Development Engineer and now I’m a Program Manager. So even though I spend my whole career, 10, 11 years at Microsoft, I try different roles, and different things in Microsoft.

Carlos:             Very cool. Xiaochen, our last question for you today, if you could have one superhero power, what would it be and why do you want it?

Xiaochen:       I don’t know if it counts as a power, but I want to have everything Ironman has.

Kevin:              The money, the suit.

Xiaochen:       Not really the money, but the thing I like is that he has a lot of geeky stuff to play with. The suit and all of the devices and all of that stuff.

Kevin:              The car.

Xiaochen:       It sounds interesting. Yeah, it sounds more interesting than a single superpower and yeah.

Carlos:             There you go, so very large R&D department at your disposal.

Xiaochen:       Yeah, yeah, exactly. Which I can decide what I want to do.

Carlos:             That’s right, you get to choose the backlog for once, right?

Xiaochen:       Exactly.

Carlos:             Okay, Xiaochen Wu, thank you so much for being on the program today and giving us that overview of Azure SQL Data Sync.

Xiaochen:       Yep, thank you very much having me here, Carlos, Eugene and Kevin. And thank you very much for everyone who’s listening to this podcast and again, thank you very much to choose SQL Server and to choose Azure SQL Database and Azure SQL Data Sync. We hope we can build a better product and we hope you’re happy with our product and our services.

Carlos:             Awesome. Now, if people want to connect with you on social media, how could they do it?

Xiaochen:       They can find me on LinkedIn and they can just send me a request. You just find Xiaochen Wu at Microsoft. I believe I’m the only Xiaochen Wu at Team Microsoft at this point. And you can also find me on Twitter at my Twitter account. That will be @allenwux.

Eugene:           Yeah, people can find me @sqlgene on Twitter, and they can go to sqlgene.com to check out my blog.

Kevin:              You can find me at the hub, resolving conflicts with all the spokes.

Carlos:             Thank you, compañeros, for tuning in today. We do appreciate it. That’s going to do it for our show. You can reach out to me on LinkedIn as well at Carlos L Chacon, and we’ll see you on the SQL Trail.

1 Comment

Leave a Reply

Back to top