Episode 85: Speedbumps Upgrading to 2016

Episode 85: Speedbumps Upgrading to 2016

Episode 85: Speedbumps Upgrading to 2016 560 420 Carlos L Chacon

Have you done your migration to SQL Server 2016 yet? Don’t worry—you aren’t alone. With all the glory of new features and improved performance, not all upgrades are equal. In this episode we talk with Javier Villegas who returns from episode 2 to talk about some of the issues he ran into on the upgrade to 2016. He stumbled on a process that normally runs for 7 minutes prior to the upgrade and then 26 hours after the upgrade. While it all worked out, he shares the issues he had and gives us a reminder that testing has not been deprecated.

 Episode Quote

Listen to Learn

  • Javier’s experience migrating to SQL Server 2016
  • Why using different production environments during migration is a good thing
  • Trace flags and cardinality estimators features and hurdles
  • Database settings and configurations in the new version

Javier on Twitter
Javier on LinkedIn
Javier’s Blog
Javier’s MVP Profile

About Javier Villegas

2016 upgradeJavier is a Microsoft MVP Data Platform and a member of the MSDN since July 15, 2008. He is also a PASS member since 2008. He works at Mediterranean Shipping Company as a DBA Manager. Javier has been dealing with SQL Server since version 6.5. He is from General Pacheco, Argentina. Besides being an SQL Server expert, Javier is a soccer fan cheering for his team the Velez Sarsfield.

Carlos: Javier, welcome to the program.

Javier: Thank you! Thank you, Carlos. Thank you, Steve for hosting me one more time.

Carlos: Yes, that’s right. Special thanks to you as being one of the first, the guinea pigs if you will. When I had this idea way back in May of 2015, maybe even before April, and went down to Buenos Aires and I said, “Hey, anybody want to record a podcast with me?” And Javier raised his hands so we’re glad we could have you back.

Javier: I remember that time, yeah.

Steve: And I know since that point in time we’ve down quite a bit to improve the process and how it all works to get the show out so we should have some good quality here. Also, didn’t you re release, re-mastered versions of some of those old episodes Carlos?

Carlos: We did. We’ve re-mastered them talking with Julien our editor and you can took a peek at them. Taking all the “uhms” and what not that I’m so consistently doing and helping balance some of that particularly in the beginning, the sound balance wasn’t all that great. Well so, Javier, our ultimate topic today is on migration to 2016. Some of your experiences with it and some of the road bumps that you kind of ran into. But first, before we kind of get into that, tell us a little bit of background about your organization and what you’re doing with SQL Server?

Javier: Ok. I am the DBA manager of a group of 4 DBAs at Mediterranean Shipping Company. What basically my team responsibilities are basically have the OLTP environment running fast, running always and running in good shape all around the world. Recently, we’re also introduce, we are also taking care of some analysis services in particular to tabular model for PowerBI which is something that we are beginning to so those are the tasks that I’m doing these days.

Carlos: Very nice, and yeah, probably from that Episode 2 I remember the story you told me that you had a data center in Chile. I don’t remember where you had to failover to but there was an earthquake there in the country and you ended up failing over. How many, there was like 2 minutes something like that you were able to failover, right?

Javier: Yeah, in a matter of a couple of minutes we made the failover from one location to another. Same in New Zealand for another earthquake and at that time we were using database mirroring. Now, the product evolved to availability groups always on and all the beauties that we know today.

Carlos: Sure. You must be quite the failover artist then to have done it again.

Javier: Yeah, you know, as every DBA we just try to avoid that to happen.

Carlos: Yeah, that’s right. To have to done it twice in two different countries, I don’t think there are too many people that can say that. 

Javier: I don’t know but the key is to once in a while do a failover test. Have a bunch of power users who help you on a Saturday morning so you can do the failover and recheck that the whole failover process works smoothly, that’s the ultimate goal. And you know, on every exercise that we do of this testing failover process we found stuff to improve basically the speed of doing this switch.      

Carlos: So ultimately our topic is on that conversion to 2016 and you made the change but it wasn’t always a smooth transition. You ran into a little bit of bumps. So why you tell us about that and some of the hurdles you had.

Javier: Yeah. Well, our main OLTP platform around the work was a mixture of SQL 2012 and 2014, right. And failover cluster instances with always on and all these things for higher availability and Disk I/O recovery. My company was part of the top program so we got involved with the bits very early in the process of the SQL 2006 release. So we were going back and forth with the practitioners with all these features that they planned to release. At some point, we have to validate our business core application with the releases that Microsoft was giving to us. We found some unexpected behavior. I mean just to give you a quick example, one of our most critical SQL shops for the business it runs every three hours and it generates some sort of a cache table to get some critical reports and the tables are used by other processes, right. As I said, that runs every three hours and normally takes 7 minutes to complete, right. When we were on this testing process suddenly those 7 minutes went into one hour.

Carlos: Wow, so what’s the problem there?

Javier: You know, the study gets even more choosy because I said, “Ok, maybe I need to do an updated statistics because this is a new environment. I just restore my databases.” So I took some time and I made an updated statistics for all the tables, and instead of one hour the end result was 26 hours. 

Steve: Wow. I’ve heard SQL Server 2016 just runs faster but apparently there are some edge cases that we need to work around. What did you end up doing with that?

Javier: Yeah. Well, you know, it was something that I spoke with the product managers at Microsoft and we raise the hand as soon as we found this because it’s unacceptable for our business, right. One day, or even more than one day just to get this table wants, this cache table wants so we send them all the call, we send them a copy of our database system. We started working together very thoroughly on this issue. Well, it turned out that our coding technique is basically using a lot of the table functions and we used table functions, not the inline table function, the regular one. Of course we don’t use scalar functions at all that’s a different discussion that we can have in another opportunity, but kind of forbidden. We don’t use it our own so we use table functions. And we use them like in a chain, like several layers of table function. One calling another and another maybe cross supply and joins with another table function, something like that, alright. The new cardinality estimator in 2016, it was just considering this like these calls to the table functions were like immediate so the estimation was unreal. Actually we’re not even doing an estimation. So the end result at that point was not to use the new cardinal estimator just turn on a trace flag to escape the CE. I mean, use all the other features for SQL 2016 that are great but temporary turn off the new CE.

Carlos: CE being cardinality estimator.

Javier: Yeah, definitely.

Carlos: So I guess the problem was then that the optimizer, right, the cardinality estimator was going in it and it was looking at those table functions because they were kind of nested if you will it didn’t pick the best plan. Or it was being too aggressive there perhaps.

Javier: Yeah. It just saying when the CE reached some certain level into these layers of table function it’s just considers them they return immediately and that was not the reality. That is why we switched from 7 minutes to one hour, right.

Steve: So then when you switched the trace flag to go back to the old cardinality estimator. Did it go back to that 7-minute runtime right away or did you have to do anything else?

Javier: Yeah, no, no, it’s just that. Also, another possibility is just keep the compatibility level for the user databases in a pen because actually this is not a new problem in 2016. This is something that is also happening in 2014. So you keep the compatibility level for your user databases in a 110. It’s like, you know, false entry to use the old or the legacy CE and we got back our 7 minutes.

Carlos: Ok. Yeah interesting and I knew they have made some changes there but I guess just like everything else you have to test it because you could be, and obviously they have use case potentially for that but that nested table functions kind of makes me “Woosh”. Get the heebeegeebees a little bit. You need to go through your code and test it in a new version.

Javier: I remember long time ago like in 2008 or something, me and my boss we were invited to the Microsoft campus to discuss about new releases of SQL etecetera. They kind of give us a presentation of table functions. Well we got in love with them and we decide to switch all our TSQL coding to that way of writing because as I said, we love it. It’s quite clean, clear and we just love it. We were just fine for years until we found this kind of hiccup. But you know the Microsoft is working quite hard on this case and others to make you through now. I know that in be next they are planning to do some other changes to the CE.

Carlos: Now are you using, are you doing anything with Linux or SQL Server in Linux? Or you guys are pure Windows?

Javier: In my production and Q&A environment we are using all Windows Server now switching to Windows Server 2016 because there are some particular features that we need. Which just to tell you in one sentence, we love the possibility of having Windows 2016 and SQL 2016 because we can do always on on machines that are not on the same Windows domain, right, that’s the main.

Carlos: Right, another feature.

Javier: Yeah. This is something that we were asking for years. Now we have it and now we are using it. But in terms of Linux, yes I have a bunch of labs using some. I believe that almost all the distributions Ubuntu, Red Hat. Now, I believe that, I have the version to run on SUSE Linux that is the ctp 1.2 Linux. That is where the CTP 1.2 that was released a few days ago. I’m validating stuff against Linux and actually I love it.

Carlos: Very cool.

Steve: With SQL Server 2016 one of the other big feature that was added then was the query store. Now, is that something that you’re making use of?

Javier: Wow, man. This is my favorite new feature in SQL 2016. This is something that you know, before. I mean, in terms of being a DBA and doing a troubleshooting on this case that you’d have something that for month. I report that always runs in let’s say 5 minutes, and suddenly one day it started running in 2-3 hours. And nobody make a change, nobody touched the code, etcetera. Those cases were extremely challenging to troubleshoot. But now with query store it’s just a simple click to have your previous or your optimum execution plan for that particular thing so, yes, I’m using it. As I said, this is my favorite feature in 2016, query store.

Steve: So for people who are new to 2016 and haven’t tried that out yet, any tips or recommendations that you would give or hints for people?

Javier: Yes. This is something that, as I said, I’m using it on my daily basic task and this is like the first version of query store, right, in SQL 2016 was the first time that we saw it. There are some things to improve like every feature. In this case the only thing that, I mean in the connect site I made this recommendation as many other SQL professional. Now query store is per database so you have to enable it, and maintain it, and queried per database. And you have all these performance information within the same database. The recommendations or the comments for Microsoft and I know that probably they should be working on this is just have the query store per instance, right, being able to store this performance information on a separate user database. This is something that I would love to see in future SQL versions.

Carlos: Right, to be able to get it out there.

Javier: Yeah, I mean beside that the current implementation is great because as I said, before it was like a blind task. And it used to take me long time to do this kind of troubleshooting when performance just change without any changes etcetera. Now it’s just, you know, go to the query store panel and I’ll just check the query plan for a particular and I’m forced to use the old one, right, it’s good.

Carlos: I know when they first, of course I was talking with Borko before it was actually released. 2016 was still kind of being baked if you will. They were working on some of the reporting and I’d admit that I haven’t got into it as much as maybe I should I have. But one of the, so the reporting and some of the charting that they were going to make available was still kind of coming together. Have they continued to make improvements and I know SP 1, or Service Pack 1, was the big focus was on the new features that being made available in the lower editions. Do they continue to make changes to that reporting, so that reporting features in the query store?

Javier: Well, yes. Honestly, I haven’t seen that very deeply. I also focusing on Service Pack 1 in other kinds of new reports like the one for always on. Those are new reports in where you can do better troubleshooting for example latency. Because you know, as I said, probably query store I believe that was the first feature that was completed before SQL 2015 RTM. That was already like long time ago. So even if it’s to be one it’s quite mature. Of course, as I said, I mean you give this to the DBAs and we ask for more, right. But in Service Pack 1 the new reporting for always on, I mean the dashboard for always on for availability groups is really improved. So those reports are now greater than, they provide you a better way of troubleshooting of always on. Before it was kind of a blind experience.

Carlos: Very cool, Borko will be very happy to hear that. PM manager we keep in touch from time to time just being, you know, from time to time. So he likes getting feedbacks, right, telling that people are enjoying the feature.

Javier: Yeah, yeah. No, I mean, Service Pack 1 was a hint. I mean something that I’m still trying to understand. It’s the first time that they bring down Enterprise features to other editions so, wow. They have these new features, the new DMBs for troubleshooting the improvement in always on is really great.

Steve: So another thing that, I know when we’re talking before the show that you mentioned was the new database settings or database setting that are scoped per database in 2016. What are you doing with those?

Javier: Yes. Well, database configuration. This is something that is one of the great features for DBAs because now you can set a bunch of configurations or settings that before you had just for the whole instance or per session. Now you can do it per database like for example specify the max DOP, the max degree of parallelism per database. This is something that is really powerful. You don’t have to use the max DOP hint or start playing with Resource Governor to specify a different max DOP. Also, there is another setting which is great, which is the legacy cardinality estimation. This is connected with the problem that I mentioned before. You can specify now per database to use the legacy cardinal estimator without specifying any trace flag or anything. You have your user database you will go to these CISCO database settings. You change the legacy cardinality estimator to “ON” and all the under that user database will use the old CE. Same for other setting like for example you can do a clear procedure cache per user database now which is something that is also good.

Steve: Ok.

Carlos: So those are all settings like inside the database properties if you will.

Javier: Yeah, from Management Studio you go to your user database. Right click “Properties” and you’re going to see them there. You have the default which is the ones that are configured for default, for all user database which of course are off just to try to don’t modify the behavior by default. Also, another one that you have which is also connected with the legacy cardinal estimation is the query optimizer hot fixes. You know that Microsoft with every service pack with every cumulative update they introduce changes to the cardinal estimator which by default they keep them disabled just to, you know, don’t change the behavior in something that is running within the same SQL version. Before this, or actually parallel you can turn on a trace flag and enable all those hot fixes for the CE. Now with database scope configuration you can enable those hot fixes in the CE per database which is great.

Steve: And so if you have an older database there or something that is not kind of going to work well with those changes you can then turn it off for that one and then turn it on for others and be in good shape.

Carlos: Yup, fine tuning.

Javier: Correct, yeah. Of course, I mean, all of these have to be carefully tested on, not in production, like that’s the advice. Don’t play with this in production but actually this is something that is easy to turn on and off.

Steve: Ok. Well, shall we move on to the SQL Family questions?

Javier: Yeah, I’m ready.

Steve: Ok, so how did you first get started using SQL Server?                       

Javier: Well, I mentioned this story in my previous episode, Episode #2. Like 20 years ago I started with the Microsoft courses with Windows Server 3.51 or NT4 and one of the elective courses that were there. I mean there were like 3 courses. One was Exchange, one was Active Directory, and the final one was SQL Server 6.5 at that time. So everybody had to choose the regular training plus one of these three. My first choice was Exchange and was full. There were no place for me so then I say, “Ok, Active Directory seems interesting.” It was also full. So they say, the SQL course has plenty of space so that was the one that was there for me. So I took and after a couple of days I say, “Oh men, this is so boring.” But then after the professional career changes and I got the opportunity to get focus on database engines at some point in one previous job that I had. It was like with multiple vendors in terms of databases and also operating system. I remember using with Sun and Oracle and Linux with MySQL and Windows with SQL. It was like a data center with all that zoo of environments. At some point I put my career into the Microsoft rail and I focus everything on Microsoft especially on SQL Server so that is how I got involved to that.

Steve: Do you ever wonder how different your career may have been had that Exchange class not been full?

Javier: I haven’t think about that too much.

Steve: Ok.

Javier: I actually, you know, I think that being a DBA is something that you have to love and you have to feel it because we all are kind of particular sort of people.

Steve: Yup.

Javier: So you have to love it and at some point long time ago I said, “Ok, this is what I love.” So let’s go ahead and try to do this and do it well.

Carlos: If you could change one thing about SQL Server would it be?

Javier: Wow. Well, I mentioned a couple of things regarding query store. But something that I will love to, I mean, first of all if you ask me this question like 6 months ago I will say multi platform. I have that thing already but now that I have, you know, this I will say make the failover process on availability group even easier in terms of being able to replicate to the secondary logins, the SQL and all the things that are not use of database in terms of settings, everything because right now you do an availability group failover for disaster recovery purposes. Even if it is a test or if it is real you have to do a lot of manual task to fix the logins you know. I have like in all the production environments that I have around the world I have like state-of-the-art Power Shell script that try to take care of synchronizing the logins plus the SQL plus the SSIS packages, etcetera. I would love to hit the failover process and just forget about it.

Carlos: Alright. What is the best piece of career advice that you’ve received?

Javier: Well, few years back I received this piece of advice which I always remember. I used it like on my everyday task since I’ve received it. And it is, “Try to be focused”. Just be focused. You know, it could be a bunch of things but being focused on what you’re doing and even if you’re, I mean, ultimately that’s the thing that will allow you to accomplish a task in a good way especially when you’re under pressure. So being focused on what you’re doing is a great piece of advice that I got.

Steve: Very good advice.

Javier: Yeah.

Carlos: Javier, our last question for you today. If you could have one superhero power what would it be and why do you want it?

Javier:  Wow, so this is something that I’ve been thinking a lot and probably it’s not going to be related or totally related to technical thing. And maybe it’s not a real power but this is something that I would really love to have. First of all have an additional arm and hand. And let me complete it, also, being able to, I mean, if you don’t have to sleep because of a reason you don’t feel it so. I mean, don’t have the necessity of sleep. This is not because I have to work at night, etcetera. These two things, the additional arm and hand plus not having the necessity to sleep is to being able to handle my two little boys. Yeah, one is 1-year old and the other one is almost five so that is why I need these superpowers.

Steve: Yes, having two sons myself. My wife and I having two sons definitely can understand that. Although, mine are a little bit older and it does get a little bit easier as they get older.  

Javier: Really. That is something good because now I know that if I wait it’s going to be easier.

Steve: Yup. But you’ll come across different things like the first time they dent your car.

Javier: Yeah. I’m far away from that.

Steve: Yes.                       

Carlos: Well, Javier, thank you so much for joining us again. Oh, sorry.

Javier: No, no, I just wanted to say that it’s not totally technically what I just said but it’s real.

Carlos: That’s right. Well, Javier, thank you so much for joining us on the program today.

Javier: It’s always a pleasure to speak with you, Carlos and Steve, so thank you for having me. And we’ll look forward to be again with you in the future.

Leave a Reply

Back to top