Episode 76 Availability Group Improvements

Availability groups provide some exciting features in mixing high availability and disaster recovery; however, from a performance consideration, there are a few drawbacks.  With the advances in SQL Server 2016, our guest Jimmy May, set out to test the features and see if they could get some really high performance out of an availability group with two synchronous replicas.  In this episode he talks with us about some of his findings and some of the pains associated with getting there.

Episode Quote

“I’ve been a big fan of flipping the faster bit ever since even before I was a member of SQLCAT.”  Jimmy May

Listen to Learn

  • What improvements have been made to the log transport process
  • Changes to the log redo.
  • A new startup parameter -k and what it does
  • Why hardware validation is still important
  • Jimmy was the guest of what celebrity from the 90’s?

Jimmy on Twitter
Get the whitepaper with all the notes from Jimmy’s test

About Jimmy May

Jimmy MayJimmy May is a SQL Server technologies for SanDisk, a Western Digital brand.  His primary skill set is related to tuning & optimization of SQL Server databases. He formally worked at Microsoft on the SQL CAT (Customer Advisory Team).  When he is not working on databases, you can find him on skiing wherever they might be snow.

Carlos:             So Jimmy, welcome to the program.

Jimmy:            Hey guys, thanks for having me. I really appreciate it.

Carlos:             Yes, it’s nice of you to come out. Always interesting to talk to the former Microsoft folks. Guys have worked on the SQLCAT team, lots of experience under your belt. So thanks for coming and being willing to share a little bit of your time with us.

Jimmy:            Sure. Thanks!

Steve:              And I know, I’ve seen Jimmy present a couple of times and I’ve always had a great time learning along the way so I hopefully will have that same experience today.

Jimmy:            Crossing our fingers.

Carlos:             And one last thing just to butter this bread a little bit more. One of the nicest people I think in this SQL community always willing to talk, kind of get some feedback, or share your thoughts, opinions with those at conferences, SQL Saturdays, you know, what have you. And so if you ever get a chance to go and say Hi to Jimmy take advantage of it because you won’t regret the time.


Jimmy:            I really appreciate that guys.


Carlos:             Ultimately we want to talk about tonight is we want to talk about your experience that you had in putting together an Availability Group, and trying to understand some of the pain points that customers are experiencing, and then trying to figure out how to get around those pain points, and kind of putting the proof in the pudding if you will at some of the advancements in SQL Server 2016.   Yeah, so let’s go ahead and dive into that. Give us a quick recap of kind of that how you put that together and then we’ll dig in to which you found.

Jimmy:            Ok, well, the goal is to, as you stated, we want to put together an Availability Group architecture that was actionable not just, we weren’t just trying to set a super high world record numbers that something we can tassel the transom for an arbitrary customer to read and implement. And along the way we, of course as you know, I work for a company that solves a lot of flash. And I’ve been a big fan of flipping the faster bit ever since even before I was a member of SQLCAT. And so that was big part of the picture and there are some challenges even on flash, storage latency, things like that. And we have a handful of lessons learned. We proved out the, really exciting a log transport improvements that were a bottleneck in Availability Groups in 2012 and 2014. No longer we get that data across the wire in a near real time. However, there is still some issues with the log we do at the high end performance and along the way also implemented some other findings. The –k startup trace flag which is something that virtually any installation can take care of. We’ll go into it more in detail I hope later but we have startup trace flag, fully supported, not well documented to a throttle checkpoints at a level that you designate and that’s pretty exciting. There’s amount of lessons learned.

Carlos:             So why don’t you take us through I guess what that setup looked like. You have a couple of boxes and ultimately you wanted to use Availability Groups and you wanted synchronous commits from the primary to both secondaries.

Jimmy:            Yeah, to setup itself was largely pretty vanilla and we do that by design. The only “exotic” pieces were some of the hardware components to make sure we didn’t run into bottlenecks. But we had three 2U boxes – HP DL380 G9s. Our friends wouldn’t like us to call them commodity boxes they’re super fantastic and amazingly performant but they were off the shelf boxes. Two sockets, we used Broadwell processors, also started with Haswells but the tragic, during the duration of the project we upgraded. 256 Gigs of RAM, nothing crazy. And we had some pretty high network throughput. It didn’t come close to hitting the bandwidth that we had. For availability, we put in two metal [inaudible – 4:15] 40GB a piece and we theme those using standard Windows theming. Pretty simple to setup and configure something which I had not a whole lot of experience but it turns out to be very easy to setup. And even a non network geek could do it with a lot of ease. So we have a total aggregated bandwidth of 80GB so if you wanted to use it. And that would serve as well want to put together multiple AGs, Availability Groups. And what else can I tell you? Oh, the data of course of, the storage, the stars of the show. We used, flash doesn’t come in just a lot of us are used to seeing the, what used to be Fusion I/O, PCIE cards, right? Well Flash comes in 2½ inch standard small form factor flavor now. And what we did was we started out with eight, decided to end up with ten 2½ inch disks and a Flash. Only 800GB a piece, relatively small, there is coming a 1.6GB flavor. And in fact, our company makes 4TB 2½ disks. It’s amazing the whole 2½ disk in your hand and you’re holding a 4TB. And very very soon we’re coming out with a 8TB disks. By the way, these were, a total of ten 800GB disks.

Carlos:             Inserted right to the box?

Jimmy:            Right to the box. Yeah, that’s one of the amazing things about these “commodity” servers off the shelf. A variety of volumes including HP. Make this boxes with 24 slots. Not a special order. You would say, “Hi, I want the box with 24 2½ inch slots in the front.” And you get it. And so think about that you put 4TB disks in there you got 986GB of raw storage. Put the new 8TB disks in there and you’ve got almost 200TB of raw storage in boxes that’s about as big as two pizza boxes. It’s amazing. It’s really amazing and you’ve got the new processors from Intel. It’s just amazing you have CPU to drive that capacity and the performance. Anyway, but we used six 800GB disks for the data and four 800GB disks for the log.

Carlos:             And then it’s all yours. You’re not going to share those disks.

Jimmy:            That’s correct, it’s a non-shared storage. Oh my! All yours.

Steve:              I’m just listening to describe that. I’m just thinking about configurations we have where we have shared storage or a bunch of virtual machines sharing the same hardware. And I was just thinking the configuration you’ve got there is certainly going way more performant than what I typically see.

Jimmy:           No question about it. Well, wait until I start talking with performance number. You know, again, a guy who used to dedicate his career to flipping the faster bit. We got some latencies that were so low that I had to double check the math. We will get into that later, hopefully.

Carlos:             Ok, so I guess, let’s go back and now kind of set the stage for some of the pain points you managed to resolve there that we’ll talk about. And so ultimately again, in our podcast we want to make sure that we kind of set the level of playing field and we’re going to try to explain some of these concepts. So with the Availability Group, you have a server. You’re setting it to synchronous commit meaning you have a second server. The transaction has to write to the first server, send it to the second, commit on the second, get a reply back and then the transaction is complete.

Jimmy:            Exactly, that we have three replicas not just two or one primary, two secondary. We actually have to get this act from two servers before we proceeded.

Carlos:             Right, the act meaning an acknowledgement?

Jimmy:           Yes, yes, before the data can be hardened or committed back on the primary log.

Jimmy:            Well, the way it work in terms of AGs is the log is flashed from the primary database. Let’s call this the main server, the primary server. The data is flashed to the Availability Group’s logs. In this case we have a simple one database one AG. Ok, and the data on that log is what is shipped across the wire. It’s received by the secondary AG logs and then it’s applied the replica data files on that side.

Steve:             So when it’s supplied on that side then that causes the write to the data file appropriately and then that log get saved as the local log file. Is that correct?

Jimmy:            Yeah, it’s the local log file. And on fail over for example that log file becomes the primary log file. The secondary, the recovery queue is emptied. The secondary is the AG does a little bit of inter manipulation to say. Preparing myself to be the new primary and then that what was the secondary becomes the primary and the user transaction transpired there just like it did originally in the primary.


Steve:              So then you mentioned some enhancements with SQL Server 2016 around that process of getting the logs transported then the logs redone on that end.


Jimmy:            Ahh! We were so pleasantly surprised. As early as far back as CTP1 we were seeing this enormous in performance improvements of getting the data across the wire. So you see historically and this is part of the hashtag that just runs faster. SQL 2016, a lot of amazing performance improvements have been implemented. I’m so truly, again, this is Jimmy May former Microsoft employee, former member of the product team, but third party now, ok. So I can say this with complete impartiality, “Huge fan of what the SQL Server product team has done.” SQL Server 2012 is when AG is introduced and there was a bottleneck. It was tragic. AGs were pretty cool conceptually, you know, an enhancement of database [inaudible 11:09] but we were limited to 40 or 50 mbps across the wire. No matter what hardware you had. No matter how many CPUs, no matter how much memory, no matter how broad your network pipe was. There was a hard coded bottleneck build into the Availability Groups. It was a visage of the database [inaudible 11:30] code buried very deeply in the valves of it. When it’s code was written way back when there was no hardware in the world that require the kind of throughput that you could get 4-5 years ago and certainly not today. So SQL Server 2012 and 2014 there was this hardware bottleneck that you just could not work around, 40-50 mbp. In 2016, they unleashed the hounds. Again, without no special tuning, same server, hardware you just use a different version of SQL Server – 2014 vs. 2016. And I’ve got some great charts. We have, in fact, I’ve got a whitepaper you guys need to, it’s part of the podcast. You need to publish the link to it that has the documentation for all these. And what I hope is a very lucid presentation. And the appendices are amongst the most valuable pieces of a whitepaper. You got the usual “yada-yada” and the body of the paper. But the appendices have some very interesting findings that a geek like me and like you guys would be interested in, including charts comparing head to head matchups of SQL Server 2014 log transport vs. 2016. Again, just changing the version of the software, version of SQL Server on the same hardware. We went from that 40-50 mbps up to a quarter of a gigabyte. Boom! And that genuinely expanse the scope of applications that are suitable. Even warehouse, ETL, any kind of index maintenance etcetera. It was easy to hit that bottleneck in 2012 and 2014. And the problem, let me back up a little bit, the problem with this bottleneck is it isn’t just that your bottleneck and you can only get 50mbps across the wire. While that data is being generated on the primary application it is queued up on the primary bottleneck waiting to get across the wire. Waiting to get, you know, inserted into the log transport mechanism. And if something happens to that primary you’ve lost data. I mean it’s possible you can regenerate it depending on your application but at a fundamental level that data is gone. And that’s not a formula for higher availability whereas now up to, easily up to 250mbps, and with little tuning you can get more, is in real time sent across the wire via the log transport mechanism hardened on the secondary logs in real time. It’s just amazing. So on fail over, boom, the data is there. No data loss, so not only do we have better performance, you’ve got better HA and then DR. Ok, so that’s the log transport mechanism. Do you want me to dive into another aspect of the AG log related stuff?


Carlos:             Yeah, I think we are going to talk about the log redo?


Jimmy:            Yes, exactly. So we’ve just been talking about the log transport and the exciting performance improvements that 2016 gave us. Log redo is the other piece of the Availability Group log related mechanisms that is required for the process to work properly. And the log redo is simply a continuous restore process. Your

data that shoveled across the wire is hardened on the secondary log and then via redo process. It’s applied to the constituent secondary replica database files. This is something you eluded a little while ago, Steve. Nothing too exotic about it just a continuous restore process. The challenge here though is that the high performance levels we are generating. The redo can’t keep up. Now the team, the SQL 2016 team, did a great job. They’ve turned it into a parallel process. That helps but it still just can’t keep up. And so at the maximum kind of levels we were doing the redo was falling behind. That’s not a formula for true high availability. Disaster recovery, yes, but in the event of fail over with if the recovery queue in the secondary are building up, when you do try to fail over that recovery queue is got to empty before the server is actually available to accept transactions. Now the good news is at this point there aren’t very many applications that require the kind of throughput we were throwing. You know, again, it’s pretty impressive. This 2U boxes, two sockets, you know, two CPUs, a little bit of flash, workload generator throwing 250mbps across the wire. Now how many applications require that kind of performance? Well, not that many frankly today. Even some of the apps I used to worked for at SQLCAT just a handful of them. So the great news is, as always the SQL Server team is very receptive. They are aware of the issue and they are actively working on it. So unlike the log transport issue which took them two versions of SQL Server to remediate, this issue, which again is not a problem for hardly anybody in the world today. It was likely to be remediated of relatively medium term before becomes a problem for the rest of the world, for your day to day application.


Steve:              So then just to recap to make sure I understand that bit. If you’re pushing the data through on the log transport at up to that quarter of a gigabyte per second then your log redo queue is going to fill up on the secondary side and get backlog to the point that it may take a while to catch when you fail over.


Jimmy:            Exactly. Exactly correct. Good summary, very peaty there. I wish I could be that peaty. Yeah, and so the log redo recovery queue just continues to grow at these high performance, at these high rates of performance. Most high performance applications aren’t get even close to that kind of a throughput and you’re going to be ok. But this is a theoretical event. It’s really interesting though, again, I mentioned the appendices in this whitepaper we just published. The appendices has this in detail. It actually shows some of the internals how you can interrogate it looking at what’s going on the secondary replica etcetera to see for yourself what’s going on.


Steve:              And we’ll have to include the links to that in the podcasts notes.


Carlos:             That’s right which will be at sqldatapartners.com/jimmy.


Steve:              Alright, so unto another topic. I know you mentioned the –k startup flag. Can you maybe jump into that a little bit and what you’ve found in there?


Jimmy:            Ok, this is one of the most exciting takeaways of this work. I know during our podcast, when we are level setting. I asked of you guys if you’ve heard about it. And the truth is very few people have heard about it. I first heard about the –k startup trace flag in SQL Skills Training during immersion, IE 01 with Paul, Kim, Jonathan, Erin, Tim, Glenn etcetera recently.


Carlos:             I managed to look it up on the MSDN and it’s not listed on the engine services startup options.


Jimmy:            It is fully supported. And it has been, believe it or not, since 2005. So we need to get that fixed. You know what we have great connect item. It is hard to find documentation. In fact, when I was first, I remember using it, I was involved again as a third party person, post Microsoft. I was invited by the CAT team come in and assist with the lab about a year ago. And I first, I saw this implemented in real life with one of the customers that do crazy cutting edge stuff. And my friend, Arvin, one of the first MCMs in 2008 was implementing it with this customer, and so I got the chance to see the behavior of –k in real life. And it’s

amazing. It’s something that I’m pretty sure a lot of places that need cutting edge performance are going to start adopting. And here is why. Even in conventional shops, you know, as a consultant for Microsoft services, you know, 10 years or more so ago. My old day job, big fast database, when checkpoint occurs, yeah we didn’t talk about it, we talked about –k. What is –k? It throttles the checkpoint to a user defined level. Many of us who have any experience in I/O or familiar with the checkpoint spikes that occur. You know, your dirty pages in the buffer pool, SQL Server at times sufficiently often to keep the database recovery on fail over to be, again, by fail over, I mean on restart I should say, independent of Availability Groups. Now, this –k is important not just for availability groups, not just for any stand alone SQL Server instances that experience challenges with checkpoints overwhelming the disk I/O subsystem. And it’s very very common, it’s in for years and these challenges can go anywhere from a spike when the dirty pages get flashed disk to a stalagmite. You know, a little bit of a pyramid, the checkpoints basically never finishes. And so 5, 10, 20ms latencies for spinning media. 1 or 2ms latencies for flash can suddenly turn into 100ms, 500ms latencies for an over long period of time. And it just hammers performance and plus you’ve got inconsistency problems, you know, suddenly the whole system during the duration of the checkpoint throttles the entire system. So in our case for example during the default behavior without implementing –k we were getting, without checkpoints, 99,000, you know, between checkpoints 99,000 transactions per second. During checkpoints we were only getting 49,000 transactions per second. So if you look at the chart in the whitepaper you’ll see what looks like literally a rollercoaster. Just high, vhoov, high, vhoov, so it’s quite a ride but it’s not something that your user wants or your system wants. Implementing –k, and you implement it by simply adding trace flag like any other followed with no space by an instanture representing the number of megabytes per second of checkpoint throughput you want SQL Server to provide but no more. And in our case, the sweet spot, we did several tests. In our case, the sweet spot was 750, -k 750, said, “Hey, SQL Server I don’t care how many dirty pages you have never shove more than 750 mbps down at my disk.” And doing so allowed this rollercoaster ride manifested by the default behavior we all know and hate to a remarkable flat system performance. Checkpoint pages per second are throttle etc. in 750 mbps and that allows the system and because you define it, you develop the testing, you define the sweet spot and suddenly CPU is basically flat. Your transactions per second are basically flat, log flashes per second are basically flat and consistent. And the great news is that not only do you get consistent behavior but the average is far higher than the average of the rollercoaster ride. And so, yeah, I’m telling you I posted the paper when it was published to the MCM distribution list and I had several comments back about it. “Oh, where’s this been all my life.”, and that kind of stuff. I think you’ll be seeing, you’ll start to be seeing some adaption. And so again, one of the biggest takeaways of this Availability Group paper isn’t just the great things that we can do with the 2U box and a little bit of flash and improvements to 2016 but also an outcome is the first time I’ve ever seen it documented anywhere. The outcome of the impact of the –k startup trace flag.


Carlos:             And when we start talking about trace flags one of the things we want to at least make sure that we understand the downside. Or at least why it is not enabled by default. Maybe a good reason or a good thought. And if I think about it it’s really then what you’re going to potentially increase, again, kind of your mileage may vary depending on your sizes or what not. Is that when that instance restarts you

may have to go through more of those redo logs to get back. Is that a fair?


Jimmy:            Ah, that’s a good guess. That’s a very good guess but first let me be clear. This –k isn’t directly related to anything involving Availability Groups. It’s basically.


Carlos:             I guess said redo log while I meant the log, right. So you’re going through the log and move forward or back.


Jimmy:            Right, so you could do a –k that is too low and you basically get a sustained checkpoint that never ever actually finishes, and that’s what we did, that’s part of our seriously. And you would run exactly that situation the recovery on database restart, on server restart will take far longer than you want. But it turns out by doing the, when I refer to the sweet spot, part of the sweet spot means not just limiting the value so that it’s below the limited disk I/O system can sustain in terms of how many dirty pages it can take per second. But also you want the checkpoint, you want that to be never sufficiently high so you don’t run into that situation where you restart the database and it takes forever to recover. And so that is a potential downside, because, you know, we didn’t talk about this beforehand so I’m impressed that you came up with that. But the reality is with just a little bit of testing, we test that thoroughly and take less than a day, half a day probably. And we get a number that is again not, so the recovery is normal you can restart the database, you know, in a few seconds yet we’re not overwhelming the disk I/O subsystem. You asked the question, you Steve, why isn’t this on by default? That’s something I hadn’t thought of but the answer came immediately to my mind. The SQL Server product team by its nature they do things conservatively. No harm etcetera. And like trace flags 1117, 1118. How long have we wondered why aren’t these the default. Why don’t they create a TempDB files or four, whatever by default. Well, they were conservative, right? They finally realized in 2016. They said, you know, the case was made, “We’re going to implement 1117, 1118 at least in the context of TempDB by default.” It’s a known bottleneck. It’s been the best practice for years for users to implement it. We’re going to save them from trouble and finally make it the default even though it’s been the best practice for 3, 4, 6 versions, who knows. I have to think really hard how far back we have to go discover when they were introduced. So I think it’s kind of like that. Especially as flash becomes more and more a predominant host for SQL Server data and log files. And with these crazy performance we’re able to provide like a temporary servers. It’s a shame that people are going to have to resort to software needlessly with the default checkpoint behavior. So, it won’t surprise me if some subsequent version of SQL Server we see that implemented by default.


Steve:              However though you would still need to have a way to adjust it to your specific environment even if there was a default for it. I would assume.


Jimmy:            No question, no question, that’s going to be, no, maybe SQL Server can be smart enough. Maybe, because you know, because first it’s no harm, it would be easy to implement a default that in some cases could be harmful. You’re right there Steve, absolutely. But SQL Server is getting smarter all the time so who knows what they’ll think of next.


Steve:              Ok, so on that then, how about any other best practices that you learn while you’re going through this process of building this out and testing it?


Jimmy:            Oh man, I could go on. At least give me time for two please. Hopefully a bit more. But the two next most important ones I think hardware validation, again this is independent of AGs. This is independent of SQL Server 2016. It’s been a long best practice I have as a geek parachuting in to cities all over the planet for Microsoft’s consulting services. I evangelize, evangelize, evangelize, “Validate your hardware before you put it into production.” You know, you flip that production bed on that big million dollar sand before validating it thoroughly then you realized you have a problem. You’re in trouble. Seriously, promise, they’re not going to be fix very easily. And no one is going to be happy about it. So the time to validate performance is before you put your hardware into production, whatever the hardware is and especially storage. So the goal is to, I would say, “When you buy a new hardware you need to hit two different

thresholds.” One you need to hit the specs that are documented. You can download from whoever your vendor is. The second one is to make sure you hold the sales geeks feet to the fire for that vendor. Make sure that the promises they made you’re able to hit and if you can’t hit them get their people in there until you do. And so circling back to the work that we did we were struggling. We can apply with our own best practices, believe it or not, the cobblers kids have shoes in my shop. I was able to hit the numbers I needed. I said, “What, here’s the specs? Oh, what’s going on here? I was consistently by 20% below what the nominal specs where.” And we struggled for couple of three days until we realized we had the wrong firmware on the SSDs. We were really expecting our hits and had to call in some bigger geeks than I am. For the record, validating the hardware wasn’t on my plate. It wasn’t my job to do that.


Steve:              Someone else’s job but it’s to your problem, right?    


Jimmy:            Yeah, exactly, but the point is we discover this not halfway into the hardcore testing where I had to restart all the experiments. This was at the fundamental part in the work. We were pretty early on and we got it done. Takeaway from this is not merely, yes, validate your hardware but also, and this is very important, this is flash awaited, most of us don’t upgrade the firmware or drivers for our spinning media, ok. I certainly never have. Occasionally, when I’m asked, I say, “Yes, yes we do.” But though if answer buts, answer no questions. Flash, like many most of the other electronic in our servers require updates, both the drivers and firmware. And make sure you have the latest and greatest. Test and verify and all of that of course. But get your greatest and latest stuff out there. And I know from experience working with SandDisk and no Western Digital and formerly Fusion I/O. That’s stuff could make a big difference, so two takeaways there. One other one I want to throw at, one of the takeaways I have. We have a luxury working on all flash environment using crazy parameters to do our backups and restores. We backup and restore multiple terabyte databases routinely as a matter of course in a matter of minutes, and it’s pretty cool. Whereas environments I worked in the past, you know, multiple terabyte database can literally take hours, overnight, multiple days. So it’s cool to be able to sling those bits around in a matter of minutes. Then we do this by modifying some long time parameters that have been available to us in SQL Server for basically over a decade. Since, well, gosh, a decade and half, two decades. The number of devices, max transfer size and buffer count. My historical default has been to use 8 devices for data files, max transfer size, you can’t make it more than 4mb, the default is 64K. So at every request from a data file the backup process will grab a 4mb chunk instead of a 64k chunk. And also buffer count, and this is how much memory backup processes are allowed to use. And the default varies, as I understand it based on the hardware configuration that the database is on. But to maximize performance, backup performance, we multiply the number of logical cores times four. This is something that my old buddy and mentor Thomas Kejser taught me, and I’ve been doing that for years. Well, you may remember, circling back to the purpose of this whitepaper was to provide some actionable guidance for people to implement. And part of a real life, real world scenario is doing backups wherein we chose to implement our log backups every five minutes. And that we needed to find values for both database backup and log backup that wouldn’t impact the latency of the application. So instead of tuning for backup performance we have to balance that with application performance which is something I wasn’t used to because I haven’t been a production DBA for well over a decade now. And that was a lot of fun. So if you were to download the whitepaper you would find that the parameters we used, I don’t need to go into the details here, we used different parameters for data files vs. the log files. Easy to implement but the point is we did some of the heavy lifting for you and provided a template that you could use to implement it in your own installations.


Steve:              Yeah, that’s great. I would bet that probably 99% of the backups I see out there are just going on with the defaults on those.


Jimmy:            And it’s tragic.

Carlos:             Yeah.

Steve:              Absolutely.

Jimmy:            I know tragic is probably overkill of a metaphor but we are talking squandered resources. Okay. So there we go.

Carlos:             Well, so Jimmy, thanks for coming on and chatting with us today. I know that you’re willing to take the pains and kind of take one for the team and then share with us so that we can learn from your experience.

Jimmy:            Are you kidding? This is the best job I’ve ever have in my life. I mean SQLCAT is a lot of fun but this job is just a blast in terms of, so I don’t mind. These are the kind of hits I have to take, fine keep them coming.

Carlos:             Before we let you go shall we do SQL family?

Jimmy:            Sure.

Steve:              Alright, so the first SQL Family question is on keeping up with technology? How do you go about keeping of all the changes that are continuously happening? Besides writing whitepapers.


Jimmy:            That’s really a tough one. You know, because the SQL Server is so broad and deep and, you know, the impostor syndrome, a lot of people are. Gosh, I could go 20 minutes on this. I’m keenly aware of my deficits, they are broad and they are deep. Man, I just take it until I make it. I’m on candid at this point of my career can be candid with my deficits. And my buddy John Stewart our shiny new MVP. He told me, I think it was John who told me, “If you stay one chapter ahead of your customer you’re an expert.” That’s part of the secret and the truth is I’m actually taking a new tech. I live an intentional life, seriously I create a vision statement from month to month, quarter to quarter, year to year, I craft my goals. And one of the reasons I’m very excited about this December, I have a couple of weeks really hardcore weeks off that between ski sessions, ski days I’m going to craft my vision statement. And that includes a hardcore plan, not just I’m going to read powershell[inaudible — 37:18], I’m going to plot when I’m going to do those chapters. Michael Fall another shiny new MVP highly recommended that book among other people. And also I’ve enrolled in and I have just started the Microsoft Data Sciences professional curriculum. And my goal is to finish it by the end of December. So to answer your question besides taking it until I make it, besides being candid, besides not buying into impostor syndrome, I live an intentional life and I plot out my goals. And one of the things, speaking of impostor syndrome, a dear sweet, a beautiful gem of a friend, Marilyn Grant during a training session at Microsoft Consulting Services said the following and this is very important. It really helped me in my dark days when I didn’t, when I though I wasn’t smart enough to tie my shoes, stranded by brilliant people within Microsoft the customer side, and that is the following. You are as smart as they think you are or you wouldn’t be here. And that’s a very valuable aphorism for me. Okay, I hope that answer, not peaty like you guys but I hope it was, I you enjoyed it.

Carlos:             So you’ve worked for SQL Server for a long time, Jimmy, but if there is one thing you could change about SQL Server and maybe, you did talk about that redo log, if there is one thing you could change about SQL Server what would it be?


Jimmy:            That would be it right now. The redo log, it’s the big bottleneck. It’s the thing right now that is keeping, that has the potential to keep SQL Server from actualizing its true capabilities. Now again, not a lot of people need a hit right now. But it’s limiting us. I can’t invest time documenting AGs solutions if we have this bottleneck. If that bottleneck is remediate tomorrow we would be off the races and with various OEMs documenting one Availability Group solution one after another.


Steve:              So what’s the best piece of career advice you’ve ever received?


Jimmy:            Man, you guys are hammering me. I used to be in grad school so this is actually a question I’ve heard before so I know the answer to this. I used to be in grad school, I don’t know his name. We’re doing our thing that day and he without solicitation gave me this advice, he says, “Find, specialize, find a niche and dig deep.” And that’s paid off for me. You’ve heard some more things maybe from

other people but it’s really paid off for me. Some of you may know my original “claim to fame” was I published the stuff related to disk partition alignment. I evangelized it. Hey, you know, I did invent that but I was, let’s call it clever enough, smart enough, had a head enough with around me when I heard about it I was just gobsmacked and couldn’t believed this existed. And you know, no one really knew about it. And so I searched for an opportunity to actually do the testing, seized that opportunity, validated it for myself and we’re off to the races. I mean it was good enough for, that is what brought me to attention of the SQLCAT team and SQLCAT team has been a very valuable part of my success and put me to a position to be successful at SandDisk, Western Digital for example with the Fusion I/O people. So specialized, find something and go for it. One other thing, I’m sorry can I interject two other things? They just came to mind.

Steve:              Of course.

Jimmy:            Oh, thank you! Focus on the fundamentals. If you notice you go to your SQL Saturday sessions etcetera. Some of the sessions on fundamentals or some of the most well attended. And it’s true, fundamentals are important. You can’t get too much of them. And speaking of those sessions, community, community, community, can’t get too much community. You know, you guys, referred to the SQL Family questions. These whole set of questions is based on SQL Family. This close knit group of folks we hang out with. Pretty cool.


Carlos:             So you’ve mentioned a little bit about your background and some of the places that you’ve been. But give us the nickel tour how did you actually get started with SQL Server?


Jimmy:            Oh my gosh, I could get in trouble with this answer. Okay, you said, nickel, I went to grad school, went to California, came home a few years later with my toil between my legs and got a job. That’s the part that get me in trouble. But I got a job way back in the days of Windows 31 and I needed to do some, we did everything in paper and I said, “We can computerize this. I didn’t know anything about computers but I thought I was, I could figure it out.” And the computer that I wanted to use that had Excel on it was always busy. So I went this other computer it didn’t have Excel but have this thing called Access on it. And I thought, “Well that looks close enough to a spreadsheet to me. I’m sure I can figure it out.” And I had a hard copy manual of Access 3.0. I smuggled it at home with me and curl up with it every night. I never slept alone.


Steve:              So Access was your gateway to SQL Server.

Jimmy:            Yeah, Access was my gateway. In fact, I remember the first time I saw SQL Server. I sat down and I thought, “Where’s the GUI. What am I supposed to do now? What is this crap?”

Steve:              Alright, so I think our last question is if you could have one superhero power. What would it be? And why would you want it?


Jimmy:            Oh, I would be, I wish I had more charisma. I know I’m a nice guy, but I wish I was more charismatic. But the truth is, I want to be indefatigable man. I need my sleep, you know, as I approach my middle age, wink wink, I need my, I need not only my 8 hours. I could start turning into Mr. Pumpkin or you know, in fact, I went to the Microsoft. I was invited to go to Microsoft Server holiday party the other night and actually had to make an implicit, my friend got an implicit commitment from me that I wouldn’t start whining after 9 o’clock. I would man up and hang out until 11. So, that’s my thing, if I could get by with six hours of sleep on consistent basis instead of 8 I’ll be a different man.

Steve:              I’m with you there, make it until 11 is much harder than it is used to be.

Jimmy:            Yeah, but I will say though related to health is the importance of fitness. I know, Steve, you’ve lost some weight. Carlos I don’t know what your situation is. I’ve lost 85 pounds three different times. Speaking of a yoyo, you know, rollercoaster ride, three different times I’ve gained and lost 85 pounds. So I can’t tell you how many times I’ve gained and lost 50 pounds. I finally, I have gone to myself to a point where I am relatively stable, and healthy, and I make fitness a priority. My claim to fame, I mentioned my internal, my misspent youth awhile ago where I came back where toil in my legs while out there on the left coast, Richard Simmons, you guys have heard of Sweatin’ to the Oldies?


Episode 75: Testing Storage Devices

Storage testing and validation is something what we to add under roles and responsibilities as DBAs. Every database we ever manage is going to need one, but how often do we kick the tires? Many times we’re basically told to go verify that array or we’re doing this POC, we’re testing this new storage, but are we really only testing connectivity?  In this episode of the podcast, we chat with Argenis Fernandez about how he goes about testing a SAN array and the important metrics you should consider for your storage.  If you are still using SQLIO or even Diskspeed to test the IO for your SQL Server, don’t miss today’s episode.

 Episode Quote

“Storage testing and validation is actually something we end up doing as DBAs.”

Listen to Learn

  • Why the now deprecated SQLIO tool was poorly named
  • Why you should question your storage vendor about allocation unit sizes
  • Why you should consider garbage collection when you test SAN arrays
  • How compression and deduplication make testing more complex.
  • Why testing storage in the cloud is difficult.

Argenis on Twitter
Argenis blog at Pure Storage
DiskSpd Utility

About Argenis Fernandez

Argenis is a Solutions Architect with PureStorage, a Microsoft SQL Server MVP, VMWare vExpert, and a well-known face to the #SQLFamily community, especially since he’s now the Director-at-Large of SQL PASS. He’s a regular speaker at PASS events, including SQL Server Summit. He also founded the PASS Security Virtual Chapter.

Transcription Storage Testing

Carlos:             Argenis, again, welcome back to the podcast.


Argenis:          Thank you so much for having me again. You guys, it’s always awesome, you know, I actually requested this one. So we upfront with everyone in the audience, I did told you guys to please go ahead and schedule me again because I wanted to go on yet another rant. Like I won the one rant on the previous one, I’m ready to go on another one man. Let’s go!


Carlos:            Very good, and as you all know compañeros what Argenis wants, Argenis gets. And of course we’re happy to have him back here on the show today. So last time we talk was kind of just a SSD conversation, talking a little bit about disk. And today, I think we’re going to be, our topics might be wide in range but we want us to kind of kickoff potentially with testing. And testing your arrays or testing your disks that you’re using in your servers.

Argenis:          Yeah, so storage, testing and validation which is actually something what we end up doing as DBAs. You know, like a lot of times we’re basically told, “Yeah, you need to go verify that array.” No we’re doing this POC, we’re testing this new storage, you need to go ahead and test it. And what is it, you guys answer this for me, what’s your favorite tool to use when validating storage?


Carlos:             Yeah, so they deprecated the SQLIO, so Diskspeed is the other one that comes to mind there.


Argenis:          Ding, ding, ding, right, so that’s what everyone looks for. So SQLIO, I love SQLIO, rest in peace now. Because it was completely wrong name to use like might possibly the most awful name to use for that tool because it actually had nothing to do with SQL Server or whatsoever. It just so happens that somebody in the SQL team wrote that utility but it was actually never meant to validate SQL Server patterns. It was just a storage benchmarking tool basically. It wasn’t even good for base lining because you couldn’t take like one, you know, I guess you could save the outputs and then compare them to some other output that you took in the future, right, to see how you’re doing etcetera.


Carlos:             Exactly


Argenis:          But you don’t see those things in SQL Server like that. For example, you would get all these fancy scripts from people out there, you know, that would run SQLIO at multiple block sizes and multiple frets and. You know, you will get some number so, what’s funny is that as DBAs we would actually never know if that was good or not. We would be like, “Yeah, we got some numbers. They looked alright. Let’s go.” It’s actually kind of funny at least that was my experience way back when I was giving a, I can’t remember who the manufacturer of that array was, and I was basically told here, “You go test this stuff.” I was like, “Ok, what do I do?” So naturally, I found SQLIO and then SQLIO is what I ran and I got a whole bunch of numbers and I’ve got a bunch of pretty graphs and then I showed them to the storage guy and the storage guy is like, “Yeah, yeah, yeah.” Ok, what does that mean? Is that good, is that bad? Are we, you know, if I put my databases on this thing are they going to work? Or you know, what is it? So on the era of magnetic storage and this is changing super fast like way faster than any of us expected. As you guys know, I work for the flash arrays all the time so I’m not really motivated or particularly interested on validating storage on magnetic anymore. But back in the day when we have the old HDD comprised arrays, magnetic arrays or spinning rust or whatever you want to call them. We wanted to test using something that made sense so, SQLIO would be a thing because they would actually generate some workload against the array, you know, regardless of the fact that it would be a patterned dataset which is actually very important to understand. SQLIO would generate a pattern of data to be send down to the storage array, not even close to real dataset, not even close. At that point, whatever performance characteristics you would see on your storage array at that point you will be happy with because you would basically ask the storage guy, “Hey, I’m getting, I don’t know, 800 a second and that thing. Is that good?” The storage guy would be like, “Yup, that’s good.” “Alright I’m done. I validated that my performance was ok.” You will look at the latency and see if the latency will be acceptable of different block sizes. And you would commit the most frequent mistakes of all which would be tying your allocation unit size on NTFS to the actual block size that gave you the best latency. That’s what everyone would do, like they would actually make that mistake right there and then. You would go ahead and see what SQLIO told you in terms of latency for a given IO block size. And you would say, “Yup, I need to format my volume at this particular allocation unit size because that’s what’s going to give me the best performance.” That couldn’t be any further from the

truth, like literally I have no idea why people got stuck in on that myth. And I actually have a blog post that I’m sure we can reference from the podcast here that mentions that I/O block sizes and SQL Server in general.


Carlos:             So Argenis, just to be a knuckle dragger here.


Argenis:          Yeah, please.


Carlos:             Because everybody knows, isn’t that because just run this test it’s telling me, “Hey this is the allocation unit size that you should be using.” And then you’re like, “Ok, well that..”


Argenis:          That’s not what the test is telling you. The test is telling you that a given I/O block size your storage array behave in a certain way. The I/O bock size as you all know has nothing to do with the actual allocation unit size of that of an NTFS users. They are two completely two different things so it makes no sense for you. So if you got the lowest latency at 4K you will not going to format your NTFS allocation unit of 4K. That’s just not it, right? Because allocation units are meant to be used for allocation, that’s what they are for. And so larger allocation units sizes, so 64K, and quite possibly larger than that with newer file systems, like ReFS which is becoming a thing nowadays. You would not consider using smaller allocation unit sizes because you want less metadata for your file system as long as that metadata doesn’t become a contention point for allocations. This should not be a contention point because this doesn’t work like DFS [inaudible –] You’re not going to be continuously allocating new expanse  for your database all the time like one by one, right? Doesn’t become a contention point for your database in terms of growth. Yeah, Steve, you wanted to ask something go ahead.


Steve:              So then with that, I mean it’s used to be you look SQLIO and then you format your partition size in your disk to match whatever your best throughput is. And then came along the rule that general always format it with 64K.


Argenis:          Just go 64K. Yeah, that’s what you should do. There are other storage arrays that still tell you to format that at a different allocation unit size. I literally have no idea why they ask you to do that. Because I don’t even know if this people are guided by the same things that I mentioning here all day just looking at the computer one thing, right? At my company, we just told everyone, format at 64K you’ll be fine. They are just going to perform just fine. All other storage array vendors tell their customers to go 4K on transaction log and go 64K on data files. I have no idea why they say that? I do not have an idea on how they architected at that granule level that it actually matters. To me it shouldn’t matter on in any storage array for the most part. Now, would you see differences on SQLIO? Yes, because you are allocating a file upfront for SQLIO. That’s one of the things that matter, right? Like SQLIO actually takes time to allocate that file and Diskspeed does that thing as well. So where we actually swayed by the fact that SQLIO was creating a file or what? I think that is actually part of the problem.


Carlos:             Other examples that might include is if I’m moving or upgrading from one server to the other. I already have a larger file and I have to allocate all of that space, right? In the case of a growth, I start with a very small database and I’m growing in chunks. Those patterns are also different.


Argenis:          Well, I mean, at that point you should need to start thinking in terms of file system, right? You have a set of database files that are sitting on a given file system. If you’re migrating a database from one server to the other and you’re just using a restore that destination file system doesn’t have to match the allocation unit of the source file system by any means. If you’re using something like Storage vMotion or Storage Live Migration on Hyper-V or whatever it is called on the Microsoft world. You were just to migrate something and you would end up with that same allocation unit size because you’re performing basically a block by block copy of the entire thing including the file system. Alright, so, new ways of doing things etcetera etcetera but what matters in the end is, me personally I ask your storage vendor what is it that they like. If they like 64K ask, you know, it would probably safe to end the conversation right there and format in 64K. If they ask you to format at 4K, ask them why? Why is that a thing? Are they doing the right thing by telling you to format at 4K? Anyway, rant over on that one guys let’s not go into that. Let’s take a step back, we started talking about SQLIO, right? And we said rest in peace, it had the wrong name to begin with, it generates pattern data, it was replaced by Diskspeed. But guess what I love Diskspeed and I kind of hate Diskspeed at the same time. Because you know what Diskspeed does? It also generates patterned

data. Why does that matter? Most of the storage arrays that they sell today, pretty much every single one out there has intelligence built-in to the controllers, so whenever you send a patterned data set to it, it will basically just mark a little metadata entry for it and nothing else, so you’d actually wouldn’t be stressing. This is actually super important for everyone to understand. You would not be stressing your final media so if you’re laying data down, pushing data down to your hard drives you wouldn’t be stressing them. If you’re pushing data down to SSDs you wouldn’t be stressing those either because the controllers will basically drop all that data at the head, at the front end and it would basically just say, “Yup, I got some activity going on here but it wasn’t enough to merit me using the SSDs or the Hard Drives at all.” That’s kind of very important, right?


Steve:              So then in that case then Diskspeed may still be good for testing like local drives that are installed in a server but not the storage array? Is that right?


Argenis:          Here is what Diskspeed is good, yeah, if you have local storage obviously Diskspeed is good. However, if you have something like, I don’t know, like Storage Spaces Direct or whatever that might be that might have some other backend behind the scenes, you still have to go through all of that, right? You may have an intelligent controller on the other side maybe managing disk for you. You don’t know that. As a DBA you’re not told exactly how your storage stack is laid out like. So if you’re connected to some storage array they’re not giving you the specifics of whether that stuff is doing something a certain way or not. So your job as a DBA is to make sure that you test with the best dataset possible and the best workload possible. So is Diskspeed the thing that’s going to generate the most realistic workload for you and the most realistic dataset for you? I think the answer to that is a flat resounding NO. So this is why I only use Diskspeed to validate plumbing, meaning I’m getting good throughput, I’m getting good latency really quick, and like my channel is good. I have a good line of communication between my host and my storage. That’s all I am testing with Diskspeed. If you have an old fashioned, you just dump SSD for example, that you just attached directly to a server and you want to run Diskspeed against that. Yeah, that’s good but is that actually going to test everything on that SSD. I will dare say no because there’s more to that. An SSD is actually comprised of NAND, right, like NAND chips. And the NAND chips actually perform garbage collection, right? So overall, the SSD has a controller and that controller would decide at some point to perform garbage collection on that NAND because NAND is not byte-addressable. NAND is actually addressed kind of like database pages, like the entire pages are written at the time. You don’t writer on a byte by byte basis. So that actually makes a huge difference, right? When you’re testing at something that is byte-addressable like a regular Hard Drive used to be that would be byte-addressable, right, like sector addressable. You wouldn’t trigger a garbage collection on HDD because there is no need for that. But on SSD you will trigger a garbage collection and the only way to stress an SSD so that you make sure that you trigger garbage collection on it is by priming it, so filling it with data and start writing data to it again. So you would trigger the garbage collection mechanisms, ok. Did you see how storage testing is a complex story? It’s not something that’s straight forward. It’s not just running Diskspeed and seeing what kind of numbers do I get. It is so much more than that. It’s running a continuous workload at different I/O block sizes with the given reducibility of the data. Actually we haven’t even talked about this. Reducibility of the data is a thing on all flash arrays and it matters a whole lot. Because most of the storage arrays out there are doing the duplication and compression of some sort. Like ours does, right? Like the one company that I worked for does it. EMC does it, SolidFire does it, Nimble Storage does it, THL does it. Everyone does compression, right? So when you want to validate that all of the storage arrays are doing the things that you want, they’re going to reflect a certain performance characteristics upon a given workload and a given dataset. You want to test with the real workload and the real dataset. So again, Diskspeed doesn’t take you there. Diskspeed only helps you test the plumbing. Make sure that, you know, whatever is between you and your final storage device is not bottleneck somewhere. It is really good for that and it is really also good for testing dump byte-addressable storage. Here is another thing that a lot of people

don’t know. Diskspeed actually has a way to generate random datasets. As you can tell Diskspeed here go ahead and generate a random buffer but getting it to work. There’s actually a huge bug on Diskspeed. Getting it to work is actually really complicated. I don’t remember exactly all the conditions that you have to do. Like you have to use a file of a certain size and then the buffer that you pick has to be evenly divisible by the size of the file or something like that. I can’t remember what it was like. Because its buggy and it’s not quite perfect then you have to have all these conditions for it to generate a random workload. But then that random workload will be so random that it wouldn’t actually be representative of your real dataset either. So it will be like the complete opposite of what I said before. You could start with a patterned dataset which is very easy for the controller to drop, right? And then in the complete opposite end of the spectrum you’re sending a completely garbled high entropy dataset that makes no sense whatsoever and it’s not actually reflecting your real workload. This is why I keep telling people you want to test with a real workload and a real dataset. So restore your database, replay your queries against it. That’s the best way to actually validate that your storage array is going to do the things you wanted to do. And you can see how it reacts to your workload and keep it on a continuous loop right. So run it over and over and over. Increase the frequency of your workload if you can. So there’s multiple ways we can talk about that, multiple things that you can do to kind of replay your workload and kick off more threads and stress the storage array a little bit. Or if you can’t do that, there’s synthetic testing that you can do against the storage array that will kind of mimic database access patterns but it wouldn’t be your real workload. In the end, what I want people to do is forget about running Diskspeed, foget about running SQLIO. If you want, run Diskspeed really quick just to make sure that you got good connectivity against your storage array. But in the end what you want is to replay your production data. Your productions datasets, so restore your production dataset and replay your production workload against that storage array. That’s really what’s going to give you the best picture.


Steve:              So at that point, when you say replay your queries, replay your workload I assume you’re talking about the distributed replay controller that point then.


Argenis:          That’s one way of doing it, right? You also have, Orca which is also another way of doing it. There are other tools out there that, you know, from vendors that allow you to do that. I believe Quest Software has one if I’m not mistaken. There’s others right.


Carlos:             What’s that second tool that you mentioned?


Argenis:          I’ll just mention three that I know about. And even if you don’t have time to do that you can just run an ETL, right. Which most people have an ETL or you can write very intensive read queries or write queries provided that you do things cold cache. So you flash your buffer with DBCC Drop Clean Buffers then run your queries or your I/O intensive queries against that restored copy of your database and see what kind of behavior you see. But more importantly when you’re testing and validating a shared storage array it’s not one workload that’s going to be hitting that storage array so things get even more complicated. And this is why, you know, I could be talking about this stuff literally for weeks.


Carlos:             Even then Argenis, you just flashed the buffer cache but that’s on the database server. And you mentioned even like so that caching at the SAN level. Now is the first time you pull that it’s not going to be in that cache. But how do you?


Argenis:          Well, that’s another thing, right? If you have caches on the SAN side how do you invalidate those caches, right? If you’re going to have a cache into your SAN do you want to use all that cache for writes or do you want to use all that cache for reads. So it depends on your workload. So what I tell people is, you know, what kind of workload do you have? If you have a data warehouse, there are going to be times during the day when you do ETL and you load data into that data warehouse that you want to have that cache to be just for writes as much as possible. And then at some point, you’re going to start pruning that database and at that point that cache be better used for reads. Is that the best way to set it up? It depends on your storage array. It really really does and this is why working with your system engineers from your storage vendor matters a whole lot. You need to understand what knobs are available to you or even if you want to tweak them or not. When you’re validating a storage array it’s very important that you understand that there’s a cache or not. So ask that question from your storage guy, “Hey, is there a cache involved in this thing?” Because if there is a cache you want to make sure that you do enough testing that you over run that cache and see what the behavior will be after you overrun that cache and just start operating it a different speed. Because that’s something that storage people love to do. They love to abstract you from the fact that you’re running on

this low storage by throwing another to your storage in there. Is that solving the problem? Maybe, right. Maybe it does fix your problem for you. But is it a long term solution for your workload that’s continuously growing and continuously getting more aggressive? Maybe not.


Carlos:             Well, it’s the old adage. I mean, it’s just a new version of throwing more CPU and memory at the server, right?


Argenis:          Right. Tiers are the thing for storage has always been a thing. And even at the CPU level you have different kinds of caches. You have an L1, an L2 and an L3 cache for instructions on data on the CPUs. So this is not a new concept, right, by any means. It’s just that, you know, how many times do you have to change your storage so you can get a solution going which is also a very big thing, right? How many times do I have to migrate my database before we actually run on a thing that actually supports my workload? It’s a pretty common problem out there, right? Like a lot of people hearing this podcast will probably identify themselves with this. You know, I migrated my database once because when you switch to new storage, six months into it we realized that it sucked and that we have to migrate it again, so data migrations are very much a thing. But in the end it’s all the storage testing and the validation that you did that’s going to give you that confidence of using that storage array the right way. You’re making the right purchasing decisions. It’s very important, you know, and going back to this point that I was talking about before, reducibility of the workload is very much a thing, right? If you just create one database that’s completely empty and what are you testing there? You just created a database that’s completely empty. What are you testing there? Absolutely nothing, right? If you’re validating, the performance characteristics of a TD Encrypted database that you’re placing on an array that performs the duplication, compression and things like that are going to be different that if you send a database that’s in the clear. So those are any compression whatsoever. The performance characteristics will be different so you want to validate both. You want to make sure that your array reacts a certain way for your datasets. So if you have TD Encrypted databases make sure that you restore TD Encrypted databases to that array and you run your performance test on top of that. The future is filled with arrays that perform data reduction all the time everywhere. And, you know, even if your array only offers certain features and you know you’ll going to use some of those features like for example there are some arrays out there that have data services turned off but you can pick to do compression only for example. Then, make sure that you enable compression on the volumes that you are going to use for your databases and drop your data in there and whatever final form it’s going to be. So if you place compressed data on top of an already compressed dataset on a compressed volume on a volume that will perform compression, you know, what kind of performance characteristics are you going to see out of that. In general, it can be kind of eye opening to see how storage arrays react to different datasets. A lot of people don’t think about this because they just think that, you know, running a quick Diskspeed will get them out of the way quick. It’s not what you want.


Steve:              So let’s say you’ve got your production server running on a new storage array that’s shared and you start out with SQL Server there and everything is running great. And then overtime the SAN administrator decide to load that storage up with additional systems using it. Bunch of backups right in there, or file stores or other systems using it. One of the things that I’ve seen when this happens is that your SQL Server performance degrades over time. And I guess as far as testing that one of the things that I’m really curious about is how do you kind of test on a regular basis to know that your production performance is not degrading there on the actual I/O side of the storage.


Argenis:          So, I mean, you nailed it, right when you mentioned that this happens very frequently and that’s just because people are not doing their homework, right? Every single storage array has limits. There’s a given amount of workload that you can fit on it first and it works just fine. As you add more workloads into that array you start seeing different performance characteristics from that array as you’re adding different workloads for different patterns. That is just natural. You know, all of a sudden just start, you have your data warehouse running at full throttle on that array. You start adding a bunch of VMs that are going to do, I don’t know, you’re running Exchange on those or whatever, or SharePoint or whatever you’re going to see a different kind of behavior from those storage arrays. So how do you test that? The only way to test it that you can be

comfortable with is mimicking what it is actually going to be as much as possible, and as much work takes and as painful that sounds is the only way to make sure that your storage array is performing a certain way. So take your real workloads with your real data sets and hit your array with that. I you want to fit an additional workload and you don’t even know what that workload looks like then you do have some homework to do. Like you need to understand what are your current, how busy your storage device is and whether it will accept that additional workload without tipping it over. Or will it actually cause it to, you know, start generating a lot more latency or your throughput will reduce because of that additional workload because now you have, you know, everyone’s drinking and we have a big fire hose before now we have two people that are drinking from it and not just one. It kind of works like that, it’s just, every single computing element in your data center has limits. And the storage array or your storage devices are just one more. You need to understand what are those limits are, and when you are going to feed more workloads into that you need to make sure that you do your homework, and understand how busy that storage device is. And when you are ready to drop that new workload in that it fits nicely and it’s not going to tip it over.


Carlos:             Yeah, it’s interesting even recently we put out our SQL Server checklist and you know lots of administrators are trying to standardize our processes and. You know cookie cutter in a sense, repeatability, automation, right that’s the big mantra. And it almost sounds like, “Eerrk, you know, hold the presses here.” There are certain areas where, not that you can’t automate all of it, but at least you’re going to have to understand. You’re going to do some homework first and then choose your own adventure if you will.


Argenis:          You nailed it. That’s exactly what happens. And do you know what a lot of people think that they can get away with not doing this in the cloud. “Oh, it’s just a cloud so it’s fine.” “Oh it’s not fine alright”, let me tell you. And the cloud you have a bunch of, you know, Amazon may not tell you, Microsoft may not tell you, Google may not tell you but they do all these things that we talked about, right? So they do their caching, the tiers, and they have this devices on the behind the scenes that do a whole lot more than just storage. They might have actually data services involved in all that. So it’s not as straight forward to test storage on the cloud either. What I like to do is just tell people remember the basics. You could get cache some idea with the throughput and what the latency will be like by using kind of dump synthetic tools like Diskspeed or Iometer or, I don’t know, CrystalDiskMark which in the end uses Diskspeed underneath the hood, or Atol, or whatever. There’s like a hundred of them. You can definitely use those. We’ll give you an idea but testing with real workloads and real datasets is going to be the thing. That’s why I get people, I literally have a friend who was spanking me on Twitter the other day. They bought this hyper-converge solution. It looked great when they run SQLIO on Diskspeed against it. But guess what that thing was doing? Exact same thing that I mentioned before, right, it was dropping the patterns of the controller side. So they never actually saw it perform with real workloads and real datasets. They put their workload on it, they started seeing all this latency and the workload was actually kind of in the same way that it was when it was running on magnetic. Because they didn’t test the fundamentals which is test with your real workloads and real datasets. I keep hearing hyper-converge by the way. It’s a thing now, like everyone is talking about hyper-converge. Need to remind people that hyper-converge is great but not necessarily for databases. One big problem with databases is that they need low latency. You’re going to be doing transaction log writes. You’re going to be doing a lot of reads and those reads better be performed quick. Because that quick turnaround from storage is the thing that’s going to give you better performance and so.


Carlos:             And so I’ve been hiding under a rock Argenis. Describe to me what hyper-convergence is?


Argenis:          So hyper-convergence, it’s kind of involved now for describing a new kind of architecture where compute and storage is all same thing, and even networking is kind of hidden. You can do on the same compute nodes you can have storage attached to them and, you know, everything is kind of flat and it gets you like that kind of cloud experience when it comes to provisioning storage kind of

thing. But in the end you’re sharing resources between storage nodes and compute nodes. So you can end up with a situation where your databases are running on the same exact node as your storage device and your storage. The actual serving of storage takes compute resources. And those compute resources are being kind of colliding with your databases. So there’s obviously other kinds of designs or features on hyper-converge where you can have storage only nodes. But that’s really no different than, you know, having your own storage device out there, and so they basically spinning storage again into its own node. It’s basically, you know, having less things to configure on your network so basically, it’s more like the appliance approach to computing where you just buy something. Hook it up to your network and two seconds after you have it up and running. Well, provisioning is part of it, right? Provisioning and getting things to run fast is one part of it of course but you know it’s an on-going operation and on-going performance so you are going to need either of that thing that really matters a whole lot. So if you’re looking at hyper-converge solutions, please, please, please make sure you test with the right tools. And if I can mention one synthetic tool that actually works really really well. I personally hate Oracle but this is one thing that comes from Oracle that’s quite kind of decent. It’s called Vdbench, V-D-Bench, so Victor David Bench. That is quite possibly the best storage testing tool, storage testing and validation tool that would allow you to get, you know, a better idea of how your workload and how your dataset are going to behave on whatever storage device you’re testing. It actually allows you to specify the duplication and compression settings for you workloads. So you can say, oh this is like a database so it will dedupe 1.2 to 1 and it will compress 2 to 1. Or it will compress 5 to 1. Or I’m testing VDIs so I’m going to have a lot of reducibility on my workload that’s going to reduce 19 to 1 so I can test it that way. And then you can also generate workloads by saying I have, you know, this kind of I/O block sizes this time of day. I have these peaks and valleys. You may actually specify that the target device needs to be filled with data before it can actually be tested. So there’s kind of whole bunch of sweet sweet sweet features that you would love to leverage when testing and validating your storage.


Steve:              Ok, great. So one of the questions I have here is that it seems like when we talk about testing with an actual real workload that really that often times in my experience happens after someone has already made a decision on purchasing your storage and it gets to the environment and here’s what you have, figure out how to best use it. I mean, is there a process that you generally goes through like the pre-sales when people are looking at different storage where you can do this kind of testing usually?

Argenis:          So most of the storage vendors out there will want you to perform a POC. Some of them actually don’t like it because it takes resources away from them, right, like, you know, they have to ship you an array. They have to be on top of you and you only have so many days to complete that POC etcetera, etcetera. Most of the storage arrays vendors are or the storage vendors in general not just storage array. They will happily give you something for you to try and it’s during that period that you were able to validate that, that storage array or that storage device. If your storage vendor doesn’t do that and your management will sold on that storage array and you get it right away. I need to remind you that most things are you can actually return. So if you get something, test it right away. Like drop everything you’re doing and test it because you don’t want to pay for this in the long run. Especially if you’re the DBA who’s going to be babysitting processes that are going to be running on top of this thing. Because imagine you end up with a dot, imagine you end up with a storage array that does not kind of give you the performance you need or storage device doesn’t give you the performance you need. You are going to pay for that dearly. You’re going to pay for that with your weekends. You are going to be suffering. You are going to have to watch those backups, and watch that maintenance, and watch that update stats they’ve meant. It’s going to be painful, right. So just make sure you test things as much as possible that you inject yourself in the process of acquiring the storage as much as possible. That you become knowledgeable on the storage

side as much as possible because storage is a thing that is fundamental to every database out there. And every single database is backed by some sort of storage and if you don’t understand how that storage works, and you don’t get into it a little bit, even a little bit then you’re going to pay for it down the run.


Steve:              And I think that’s where if you’re working in a shop where there’s a wall that’s been built between the storage administrators and the DBAs. That’s where you get the most trouble but when they’re the same person doing both sides of it or that there’s no wall and there’s very good communication between the DBAs and the storage admins then you’re able to make those kind of things happen.


Argenis:          This is something that you have to ferment. You have to make sure that whatever walls exists in your environment today you can overcome. Like, you become best friends with the storage people. You know exactly what they have. The storage people know why you care about certain things. They understand why databases are the pain in the butt that they are for storage. They are, they are a pain in the butt like the storage people hate DBAs for a reason. Because the databases are nasty especially, you know, take backups right. Who has a storage guy that loves to see the performance of the storage array when taking backups? Nobody, right, because backups are just nasty on every single thing. Backups are nasty on the source storage array, backups are nasty on the networks and backups are nasty on this target storage array. Or jbuff whatever, may not be an array. But whatever you end up backing up too. So it’s a thing like people need to understand that if you don’t talk to the people that work in your company that do different things than you do. Then, in the end have power and oversee the things that matters so much to you. You’re just hurting yourself. That’s actually part, you know, we could go on another rant about why you should be best friends with your developers as well. But it’s basically the same thing, right. Like everyone is working towards the same objective which is making sure that your company continues to operate at the highest level. And you can crank out features and crank out new functionalities as fast as your business wants so don’t become an obstacle right there. That’s actually something that I advocate to everyone. Don’t become an obstacle on the database side. Don’t become an obstacle on the storage side. Don’t become an obstacle on the virtualization side. Offer solutions, tell people, “Hey, this isn’t working let’s go and see what we can do to make it better.” Engage more people and make sure that everyone understands what’s actually going on in your environment. Because the last thing that anyone wants is surprises, “Oh, we never told anyone that this actually wasn’t working.” You need to make sure that everyone knows the status of your environment out there.

Steve:              Right, right.


Carlos:             Sounds like very good career advice.


Steve:              Absolutely.


Argenis:          Alright, man, we went a little rant over there. Man, I need to catch my breath here.


Steve:              Oh, it’s all good stuff.


Carlos:             Well, thanks again for being with us today, Argenis.


Argenis:          It’s my pleasure. Thanks for the time you guys. I really really really wanted to do this because you know having, I make sure people kind of got a taste of what it’s like to pay so much attention to the little details on the storage side. A lot of us happen kind of complacent in the past and said, “Oh, I just got some storage array that I own, some storage device that I got from this guy. I’m just going to quickly run something. Ah, numbers look good. I’m moving out.” That’s not it. It’s so much more to it. So you need to understand how things work and why testing in a different way matters so much.


Carlos:             That’s right. So before we let you go we’ve had you on the shows, we’ve done our SQL family questions with you already. I guess for those who may not know all of your history. Give us the brief story how did you first get started with SQL Server?


Argenis:          The year was 1998. I remember that Google was barely getting started back then. I remember when Google came out I was like, “Oh, I’m not using Yahoo anymore. I’m not using Alta Vista anymore.” Whatever we were using back then. That felt awesome. I remember I was SQL Server 65 that I was powering the backend for this product that I was working on. I was working for an ISP data administrator on an internet service provider. The largest one done in Venezuela and the mandate was to migrate from a FreeBSD environment to a Windows NT 4.0 environment. So you guys can imagine, right, like that was really really really controversial so we got a lot of pushback in there. But in the end that was the mandate that came from management and we purchased this product from Microsoft called Microsoft Commercial Internet System. It’s called MCIS, right? And the version of that was version 2.0. So that server was, that suite of

products was NT 4.0 with the option packs so IIS and all of that, and you had an additional set of binaries for running an SMTP Server, a Pop Server, an Ldap Server. And that Ldap Server was powered by Site Server’s commerce edition, Site Server 3.0. So if you guys, you know, I like reading, ancient folks like me that worked that technology wayback when you remember all of these stuffs. Actually I can’t remember if it was Site Server 3.0 or earlier version on the first version of MCIS that I used. I can’t remember. The truth is that it was powered by SQL Server 65 at first and then they develop support for SQL Server 7.0. So that’s kind of how I got started with it maintaining that database and I was basically one of the, there were many of us, accidental DBAs for that database. That’s kind of how I got started with it. And so we were sent up here to Redmond so that was January 1999 when we came here to Redmond for the first time. We spent a month here training on that and we were basically given a crash course on Windows administration and SQL Server administration. So that’s how I got started with SQL way back when. And you know, change stops a hundred different times and, you know, made my way through this admin a little bit and I did development for a little bit, and I did kind of management for a little bit. But in every single one of those positions that I did consulting even for Microsoft at some point. Through all of those positions I was always working with SQL server in some way or another. So you know, it’s been 18 years now. It’s been a great great great career and I just went to the MVP Summit. I knew some of you guys were there. Man, the MVP Summit was awesome because it’s all those, you know, they laid it out in front of us the future of SQL Server and what it looks like. It’s now going to run on Linux. Those learning on Linux, so if you guys out there haven’t seen that you need to go take a look because I think it’s going to be all the rage. SQL Server on Linux is going to be a huge huge thing.


Steve:              Oh yeah, I think so too. I know I’ve got it running on a virtual machine here on Linux and it’s pretty nice.


Argenis:          It is, it is. So that’s kind of the backstory on me starting with SQL Server way back when.

Steve:              Excellent.

Carlos:             Awesome. Well, again, Argenis thanks so much for being here and taking some time with us today.


Argenis:          Hey guys, thank you so much for letting me do this again. I really really really appreciate that you guys kind of give me another slot of your podcast which is wonderful by the way. You know, to get some more of my opinions out there and I know, you know, the previous podcast actually generated a lot of controversial discussions out there on where my stance is. I actually owe some people blog posts on that to follow up on some of the points that we talked about. And they will probably ask me a blog post on the things that I talked about today so I will promise. I promise, I promise I will make that. I will make those blog posts happen and we’ll get people really really interested on testing their storage the right way.

Steve:              Sounds good. We’ll definitely look forward to reading those.    

Episode 74: 2016 in review and 2017 goals

As the year 2016 comes to a close, Steve and I share our thoughts on our accomplishments for the year and what lies in store for 2017.  Steve reviews some of the changes to the database health monitor tool and Carlos shares details about his new book and how it came to be.

 Episode Quote

“2016 has been a busy year”

Listen to Learn

  • Some of the accomplishments Steve and I have had over the last year
  • What we are looking forward to in 2017

Carlos has a new book – Zero to SQL
The SQL Server checklist repository

Transcription: 2016 in review

The transcription will be available shortly.

Episode 73: SQL Server Performance Tuning

t some point all data administrators will have to tune their environments and there are may ways you might go about that.  In this episode, we talk with Pinal Dave about how he does performance tuning and the thought process behind the approach.  What I found interesting about this discussion is we have our own version of the how to do things, but there are some basics that everyone should have covered.

This discussion does not get heavy into the how to for all scenarios; however, I think you will find the discussion very informative.

 Episode Quote

“I let them explain in the first five to ten minutes their own problem. Understand their concern first.”

Listen to Learn

  • The first steps in performance tuning
  • Why MSDB can play a role in performance
  • Why server setup is still so important
  • Different approaches to troubleshooting performance
  • Why DTA still gives us heartburn
  • Pinal’s take on how long the server should be up before you use the DMVs to diagnose issue.
  • Who Adrienne Frost is

Missing Indexes
Missing Index Advisor
Pinal’s session from PASS Summit 2016
Adrienne Frost
Pinal on Twitter

About Pinal Dave

Performance TuningPinal Dave (pronounced Da vey) has been a database professional for nearly 20 years.  He is the author of the very popular blog sqlauthority.com, which just completed its 10 anniversary.  He has written a book on interview questions and is a frequent speaker at conferences.

Transcription: SQL Server Performance Tuning

Carlos:  Ok, well Pinal, welcome to the show!

Pinal: Thanks for having me! It’s a wonderful. I’ve been talking to you guys and we wanted to so glad to be here and I’m super excited.

Carlos: Yes, it’s almost a pity that has taken us this long to have you on the show. But thanks for taking a little bit of time and coming to talk with us. Ultimately, our topic today is performance tuning. And this is one of those topics that it seems like we can never get enough of, right? There’s always something to be tuning, some setting, some tweaking, some metric that we need to capture and then compare again. And so, I guess let’s just go ahead and kick it off with what are some of the, maybe. Let me say that again. What are some of the top things that you look for when you want to start performance tuning, SQL server, database or instance?

Pinal:  That’s a very good question! So, this is investing what I want to look into and what people want me to look into. They are two entirely different things. And I find it funny and when customers, so I’m an independent consultant for a SQL Server performance tuning so people come to me, hire me and say the system is slow. So far we are together with them. And this is where now the difference comes, so they come to me and say, “We want you to look at our indexes, and tell us why my index is not used by SQL server?” Or sometimes they come back and say, “Hey, look at this particular query, I believe it’s running very very slow.” Or they come back and say, “Our application is generating a lot of unnecessary data can you help us to clean it so it only generates necessary data.” They come back with this kind of request and that’s so interesting so this is what they want me to do. But you know what, I know behind the scene that most of the time they are just seeing things which they are seeing but our real problem lies somewhere else. Probably in TempDB, probably in I/O distribution, maybe they are facing deadlocks or [inaudible – 02:14]. So I think this is like the very difference so. Do you want to know how I approach this particular problem when customer come and say this is what they want to do and how do I try to bring them to the real problem?

Carlos:   Yeah, I think that’s a very interesting point. Even as database administrators we’re kind of stuck in that rut as well where someone will read an article, right? Some new webinar comes out with a specific metric and they’ve decided that that is the key that will solve all their performance problems.

Pinal:   Absolutely, so most of the times when a customer comes to me and says, “This is the particular problem.” I never said, “That may not be their problem.” I always say, “Well, then we need to fix it.” And I let them explain in the first five to ten minutes their own problem. Once I hear them properly, uhm, and they feel confident that they have explained the problem, and we try to look into. So, I know in that first five to ten minutes are always little bit back and forth but that gives me an idea of what level customers are, what they have done so far, where the problem situations are? And then I try to bring them to couple of very interesting point which is standard in my performance tuning consultancy. But the point is these are the things which we need to do to identify better your problem is. So, definitely wait statistics where I start, then I start looking up their I/O distribution because I figured it out that most of the people sometime put all the data on a CD-Drive. They sometimes put everything on a D-Drive where log and data index is anywhere, backups are together. So if they have situation like this, ok, so you know something to be improved. And the third thing is that, you know, I try to run their long running queries, CPU intensive queries, I/O storing queries or and try to make a list of the queries. And I have figured it out that as soon as I start reach to that particular point where I start bringing up and throwing I/O storing or CPU queries, CPU intensive queries, customer immediately start recognizing them, “Oh, this makes sense. Oh this is also our problem.” Yes, this was the reason and we end up looking at that time together or comprehensively and then I put them together. I said, “Look at this. You are saying you have a problem at this particular query but reality was that this is what is happening.” Yesterday, I fix a very very interesting problem. And I was so proud of me when I fixed it. Maybe I would share with you in one minute and that would sum it up. A customer come to me and say, “This particular query suddenly starts slowing down.” And we’re like, “Oh ok, why is it so?” We try to look into it, and they say, “This query runs once a day only, use to run for two seconds now taking ten minutes.” And we’re like, “Ok, that’s interesting problem. Let me look into this one.” We looked there and there was, it was sending an email. And he was like, “Ok, it seems like sending an email.” Then we went back and try to. I said, “Ok, I do not know how to fix it but let me go and do an analysis of your system.” We did analysis of that system and figure it out that they have a problem in pretty much sending every single email not that particular one. But other one were asynchronous so they didn’t care much where the email comes so email that are coming to the mailbox are going to customer. But for this particular query they have received the report that sending email very late. Now, email was a problem so what does it mean? Is it the SQL is the problem? Well, it was in fact in that case because they had MSDB database where they were using SQL mail to send email. And MSDB database had all the history of email by default it stores and that email table of email table I forgot the exact name but that particular table was huge, over 10GB. And inserting data into that table was taking so long and so much time, and this is why the problem was for this one query. We fixed that problem in MSDB and suddenly the system picked up and they were saying, “I never knew I was able to get such a big performance.” Now, they had a problem in the query. We fix in something else, somewhere in the database and take out amazing performance. And this kind of challenges makes me happy that, you know, when how we diagnose it, how we figure it out, where the problem is, when we fix it, how we get performance. A long answer to your short question but I think it was impressive and I was very impressed yesterday when I fixed it and it was very interesting as well.

Steve:    Well, I know that definitely sounds interesting to me because it seems like anytime that MSDB grows to a certain size. It’s usually an indication that there’s something in there that is bigger than it should be whether it’s job history, or mail history, or other histories are being kept there. And I think that’s a common thing that people overlook and going in finding that and speeding up the email where the assumption is usually it’s going to be the query. But it turns out that sending the email is the real problem. I’ve seen that more than once and in MSDB in my experience.

Pinal:    Absolutely.

Carlos:   And I think Pinal puts an interesting twist on that first step and that is let the user explain what they think the problem is because at the least while, again in his example, it ended up being something different. They weren’t necessarily. Originally, they thought they’re going to be doing performance tuning on a query. It ended up being something very different. But when you report back to them, you need to indicate what you have found relates to the problem that they’ve identified so that they can feel like, “Ok, yes, now my problem is solved.”

Pinal:    Absolutely. And that’s what that gives them more confidence and I think hearing. One of the things which I see with lot of customers and lot of consultants is a little bit arrogance in them and sometimes I do not like it. They think they know and the customer do not know and they go there with such a heavy ego and said, “I know what I’m doing just step aside. Let me fix it for you. Just participate when I’m done with this, doing this. I will show you the internal things.” Any consultant, maybe let’s say, a plumber does to my home, or carpenter, or maybe any pesticides consultant. When they do this kind of thing, I also feel little annoyed. Let me explain my problem.  And I think that just let’s do even SQL Server because now I’m a consultant and I want to make sure that my customer or my client explains the problem, understands it, and I understand it. And I go back to them saying, “By doing this I fix your original problem.”

Steve:    Yup. So listen, understand then go track it down and put your ego aside and all things will come out good.
Pinal:    Absolutely. I think this is something, I think I shared with Steve as well as I never see like no matter what you do Steve, you never have any, or you always answer very politely. And you know so much about corruption which I learn just sitting with you like the other day. And you were so kind to just open a book and explain and that’s motivating me. See, this kind of thing about the community motivates me. People are willing to share and that’s what all about sharing and that’s why the SQL authorities are out like, “What I know, I want to share.” And just like all of you have been doing it. Like, Carlos this webcast is sharing, right? People are just learning out of hearing it. So, you know, we do a lot for community and we share. And I think it’s amazing that we have a feeling to give back to everybody.
Carlos:   So, we’ve talked a little bit about a scenario in which we kind of dug in. You mentioned wait stats is an area that you want to start looking for performance tuning, I/O, and then your data, and the file allocation.
Carlos:   Those seem like very. It almost, well, because not maybe a quite server setup but when I start thinking about performance tuning some of the, I just say the ground level problems if you will. You mentioned TempDB as well. Almost come back to how the SQL Server was set up.

Carlos:   And interestingly enough we have a previous episode in which we kind of went through. And talk about the checklist if you will of this are the things that you should be, the knobs you should be turning when you install your SQL Server and which you should be looking for.
Carlos:   It seems like in certain instances kind of going back to the beginning making sure everything is set up properly and then we can kind of get into the “hard stuff.”

Pinal:    Absolutely because, you know, execution plan and understanding the people’s query is so difficult thing to do. I mean, even if it sounds simple, we can definitely hit the execution plan and said, “Aha, there is a bottleneck. But aha, this one is not a good operator.” “What are the alternatives?” Not always we have alternatives because 95% of the time the developer comes back to us and say, “Well, if I change this I will have to do a lot of test, and lot of, I need approval, I need to produce that.” Or sometimes they just come back and say, “Hey we don’t have a control on the code because it’s a third party tool.” Sometimes they come back and say, “Look, we understand what you want to do but we have this business need and this is how we’ve been doing it. We don’t want to change it.” I figured it out that 50% of the time when  I want to change a code I get a pushback. I get people say, “We don’t want to do this.” And that’s why I think the focus of performance tuning is more about setting up right. And I think as you said, I think duty of SQL Server is, well it’s duty or whatever you want to call it, that it’s so easy to install. But it doesn’t tell us a lot of thing when you’re installing so you just go next, next, next, done. Wow, it’s so easy to install.

Carlos:   Exactly. Everybody can do it.Right, we want to make that as simple as possible.

Pinal:     Right. That comes up with this kind of issues like, you know, file root is 1MB. Your field factor has no relationship with your workload. Uhm, you have log file and data at the same place where you have probably C-Drive. And even though you have extra drive there is no knowledge for it. Your MDB will be just like that and a lot of people even do not understand that Model DB, they should not be playing around or they should be played only once they have understanding because when our new table or new database is created that’s a copy of Model VB. Now, this kind of basic things are missing in explanation, or understanding, or education. Whatever you want to or whoever you want to blame. But ultimately you end up being on the system which there is a lot of things you can fix. Our interesting story, I feel like telling when I mentioned the Model DB. One of my customer that I met said, “Every new database I create is created by 1GB.” And I was like, “Ok, maybe, what are you doing there?” And he said, “We are not doing anything. Every database is 1GB.” And I was like, “Interesting!” Went back and check. They have some kind of data loading done in the past in the Model DB because nobody knew what it is so there was  huge table was being replicated again and again kind of thing. And there was so many store procedures which was what’s used. And I was like laughing and I said, “Don’t do that.” And now they were very much worried that because some of the things they used and don’t know how to do the clean up. The older databases is now, oh it was so mess and hours and hours of consulting and where I was just sitting on the other side of screen and hearing their conversation about this one. And they come back to me and said, “Should we drop this or not drop it.” And we do some diagnostic so this kind of thing. So when people do not do know the right thing and they just to make a little mistake sometime they pay a lot on consulting hours, or developer Model DB afterwards.

Carlos:    Another interesting point you made. Talking a little bit about pride, now you kind of talked about it in the developer’s sense. But even as DBAs right we have the data, it belongs to us, we’re masters of the domain, uhm, and one point there are all these monitoring tools that are out there that will help us collect historical information about the SQL Server. I think at times we shy away from that and I know even when I first started in the past communities. Start learning about all the DMVs and you’re like, “Ok, I’m going to make this monitoring program if you will myself.” And I actually heard a talk from Adam Machanic, right, which I lot of respect for. He’s been on the show SP_WhoIsActive, kind of going through all of those  DMVs to collect that information and I remember him making the comment, “Oh yeah, but we put a monitoring tool on all of our environment.” Now it happened to be at an event of a company that has a monitoring product and I thought he was maybe just saying that because they’re at this event. So I followed up with him afterwards and I’m like, “So, is that for real?” And he’s like, “I don’t know. It’s for real.” He’s like, “Why do all that hard stuffs yourself? I mean people kind of figured that out. Get something that works for you, that will capture that history and then you can start digging in particularly I think in the beginning.”

Steve:     And I think that there are just so many different… Go ahead Pinal.

Pinal:     No, please go ahead Steve. No, no, I think your point is same thing I’m going to say that there are so many different things out there that does not one solution we can do. But I think you were saying the same thing.

Steve:     Yup. I think we’re, there’s just so many different ways to do what people are trying to do with different monitoring and tuning that anytime you can avoid sort of reinventing it yourself every time if you can reuse something or use some tools that are already there. You can save a lot of time and save a lot of money. And I think that’s one of the things that performance tuning side of consulting. Having those tools available to you can really make a big difference there.

Pinal:     Alright, and I think Steve, I think you also took an attempt building this one, right? I think you are the right person to talk about it because I think you have your own monitoring tool and which you have people like databasehelp, right, that just updated in October. I installed it in one of my computer and it does a lot of things. So yeah, I think uhm, and it does, I’m sure you built it by thinking that you want to solve some one problem and then you keep on adding more things as people requested. And I think that’s how the, I think the monitoring tool out in the industries are also done the same way. They come up with the one with a similar request and things. I think same thing happening with me and my consultation also that people keep on asking more thing to add and, uhm, they want to always solve a little problem but once the problem is solved they want to add more things. And yeah, so they keep on doing it. And Steve, maybe one good idea would be maybe at the end of the webcast or maybe on a podcast page you can add your databasehelp.com link. I think people can try this out and I see you just updated in October.

Steve:     Yup. You know, that’s a great idea. Thanks for mentioning that. I know that we try and mention it when we can. There’s more more people using it throughout the world every day. But yeah, it’s one of those things that it’s just a monitoring tool that’s there. And it does a lot of the things that people try and invent on their own and save a lot of time doing it that way.

Carlos:    So, let’s turn a little bit back. I know you had a presentation at past summit I think on performance tuning just a few weeks ago. Didn’t you?

Pinal:     Right, I do. Yes. It was fun, I had 400 people and I got amazing rating. I was so honored looking at what people just said good stuff about me. I was like, I was honored. I think I’m humbled. People are very very kind. Yeah.

Steve:     You know, that’s true. The past community. Yeah, you are good at what you do but I think the past community is an amazing group. There is a lot of kind people out there, definitely. So one of the things that was on the list there was about your transaction durability and its impact on queries. And that’s one, maybe if you can talk about that for a few minutes.

Pinal:     Well, that’s a good point, so again, durability I think is one of the very very confusing topic. Every single time I try to present it to the people and I learn this thing that even though we all believe that we know ACID, not all people just know ACID because it’s a theory. We do not understand any part of ACID. And we keep on talking about, “Oh, ok, oh ACID I know about it. The data should be durable. And data should be there what we updated.” They do not get it that each of the component of ACID is also related to the performance. And I think SQL Server 2014 come up with this new feature about durability of a transaction and it directly impacts how the performance works but not how the data ultimately is taking shape. So, like for example, so it definitely helps if you have an application where you are doing a lot of transactions. At that time maybe you can just delay about in the committing of your data so you keep instead of committing it every statement. You can just say, “Let’s delay committing the data.” And you take to build it up everything in the memory and at the end you commit them together and that just reduces your durability of data. What it means is that your data doesn’t go to disk but that means your operation is completed faster because everything happens in the memory. And then, it push back to the disk but if during between this little moments, if something goes wrong then your data durability will be impacted. It’s a very very powerful thing. And one of my customer started to use it and they were like, “Oh, it’s giving amazing performance.” But under the hood it’s not also true that you get performance out of it all the time. It has to be used carefully on those kind of application where you can get, if you want a control back from your statement, if you want control back from your store procedure very quickly. I think this feature can help you very much otherwise you would be just over utilizing the feature and does not invent at the point. It does not give you some advantages because lot of operations which we do in our SQL Server are serialized. So things which are serialized are going to be serialized and you will not get some performance out of it so, one has to be very very careful when they use this kind of feature. And that’s good question. Not many people understand it. So people try that and eventually said, “Ah, I don’t understand it. I’m going to quit on this.” So you need to identify the application and this is true for any feature. Don’t think about this one that a lot of features introduce in SQL Server 2014 and 2016 they help to identify where exactly you will use it. Otherwise, or using them does not give you any performance and people ultimately keep on saying, “What’s the good of this particular feature. It doesn’t do anything or there are so many restrictions.” Just like In-Memory, like a lot of people are thinking, In-Memories like not doing good for them. Actually, it was released initially but now I see a slow adoption for this particular product.

Steve:     Yup. And I think, I like to think all of those features, all those new things, all those different ways of doing that as different tools. And sometimes you, one of them is the right tool for the right job. And sometimes it’s the wrong tool for the right job or the wrong tool for the job. And I think that one of the things that sounds like that you do well with the performance tuning work that you do is truly understanding what’s the right tool. So you can use the right tool when it’s appropriate.

Carlos:    The absolute worst thing would be you enable that feature that you’re not quite 100% sure about or you don’t understand what the downside effects are, right? There while we might not agree with all of the defaults if you will. There is a reason why it is that way.  If you don’t understand what we changed when you enable or change defaults then you need to do it carefully.

Pinal:     Absolutely. That’s the very very point. And I think people also, I think to us learning is also reducing because our attention span is now less and less as we go forward. Like, you know, things are, different people do not have patience. And I think that is also changing how the consultancies are shaping up as well as how these features are coming up and tools. Like previously people have patience. Now, they just want to collect all the data and do things. Now, they just want to use the various tools and that’s why I think lot of vendors organisation, lot of talk about tools and let’s say, we all try to build something which will just automate things and will make our life easier.
Carlos:    Ok, so with that, I know you mentioned the adding index is where they’re needed or if you just have tables or just heaps. But if you have, could you talk a little bit in some of the examples of removing indexes in order to improve performance, maybe specifically around some of the DTA indexes.

Pinal:     Right, so I do not like DTA at all. Ok, some say bad stuffs about it because I’m sure people can go back to the episodes and see every single good consultant just saying bad thing about it. So I will just save time and I’ll say DTA has a lot of limitations. And matter of the fact, since it was released still today I do not see much of the enhancements in the product itself also. It will just analyze your workload and base on each query come up with some kind of queries, weird name of the index. And then, suddenly, it will create like 10, 20, indexes on your system and that they are not efficient. Sometimes they are on a single column, they are not on a multiple column and included index are coming, columns store index are coming, and lot of In-Memory and columns operation analysis index. All these things are still, I don’t know if it has caught up with the DTA.  I haven’t checked it in the recent times. So index is more and DTAs are just there. And a lot of indexes of DTA are not more used by SQL Server because workload changes. So one of the query which I give to the, it’s there on my blog which I give to the people is how they can run this any query and analysis and figure it out if the index is really used or not. If they are not using it they should go back and drop those indexes. So give them a drop script. And also one of the things which I do is that I go and check if there are table that has more than 10 indexes. Now, this number is dividable. You can say, “Oh, my number is 8 or my number is 5.” Ok, it maybe your number. So I think I reach to a point if I see 10 or more indexes on a system. I think that table will have a lot of trouble by inserting data in it. That’s what my end experience is. Again, you can hit on a 12 indexes or 15 but 10 is the number which I start with and then I tune myself so lot of people say index is a word for selecting data but it slows down inserts we know but we do not get about insert. This is what I keep on hearing. You know what; I want to add one interesting point here, if your inserts are going to slow so why insert or update this being happening or when their  happening. And that time SQL Server is going to put some kind of lock around those data. It may be a base lock, excel lock or table lock. We do not know what kind of update it is but if your update is going to lock certain part of the data. Now what happens if select comes to that particular part of data which is locked? It has to wait. So even though you have lot of indexes it is slowing down your update that may indirectly impact your select statement and your overall performance will go down. So do not be liberal by creating too many indexes so this is what I try to tell. So there is another script also which is how to figure it out which good index you have to create. So I would be happy to share both the script here and if there is a place in a podcast you can put them and people can just download those scripts and start working with them immediately and with more detail they can engage me for consulting. But I would like to, just like you, give all the scripts which is built for free.

Steve:     Yup, so Carlos, we usually have a place we can put links. We can probably put those links there as well.

Carlos:    Yeah, well our episode show notes today will be at sqlpartners.com/pinal, P-I-N-A-L. And you can, we’ll have all the links and everything we talked about in today’s episode.

Pinal:     That’s fantastic!

Carlos:    One question I had on the 99 minutes or kind of attacking those indexes from, now you do make the point, you kind of getting them 80% of the way there. And that is I’m assuming that you are coming at it from a DMV perspective. You know, SQL Server is collecting that information about how many times just requesting it and the impact that it would have on the queries. You kind of aggregate all of that up and then you make an assumption, you know, as to whether you should, you know, deploy this index or not. What happens because, do you have a minimum amount or minimum time of plan cache that you require before you start making of those assumptions?

Pinal:       Absolutely. So I request people, that’s the very very point. I always say, if the SQL Server has not started for 7 days I would like to touch it because if it is like one day or two days that thing is not going to cut it if it is a week-end of thing. So, I at least one your SQL to be running for 7 days under business workload, so maybe Sunday to Sunday, Friday to Friday, or something like that. And then I would like to touch it so that way we can figure it out that it has the right amount of the plan cache in it and I can do the proper analysis. If it is run for like one day, I think it’s not going to help at all.

Steve:     So that brings up an interesting point. How often do you run across clients that you are working with where they attempt to speed up their SQL Server by rebooting it regularly?

Pinal:     Okay, so number may be different, surprising to you. I like every single time I think there’s something about Microsoft, right, that you just restart when things does not work. My all customer is before they reach out to me might have restarted the system for at least five to ten times I just see that. And a lot of people, lot of customers are just ok to hit restart as soon as they like. Like one time we ask to change something like you know cause special for parallelism in a system and the customers say, “Now I have to restart it.” I said, “No, you don’t have to do it, it automatically takes impact without restarting.” Or they would not believe it until they restart it. So after changing every configuration even though I prove them with demonstration they would just go ahead and restart it. And I said, “What about your customer?” They said, “They will just think it’s time out.” And I just see this kind of thing and I think people restart it. Restarting is what they love. Restart is like caring over your system that’s what I think people come to a point. And matter of the fact while I was talking to you I just restarted my phone because it was a little bit slow. So I think we are all have built this kind of habit when things doesn’t work we just go and restart it before we invest our time in investigating the real cause.
Carlos:    Yeah, that may also be because we don’t know what else to do and so rebooting it make us feel like we’re doing something.
Steve:     Yeah, that’s what I feel when it comes to my cellphone.

Carlos:    Yeah, exactly.

Steve:     Alright, shall we do SQL family?

Carlos:    Let’s do it.

Steve:     Ok, so Pinal how do you keep up with the technology with all the changes that are happening especially with SQL Server and all the great things that are coming out recently.

Pinal:     Wow! This is a tough question. I honestly have a hard time keeping up with all the new things that are coming out. Like for example, I haven’t touch even R Language and R Services, and now there are so many things like PowerGUI is still so far from me. So I have a hard time to learn. I have a hard time to catch up with them. And I think I’m doing pretty bad job at it but there are couple of things which I have made a point to hear. So I hear podcast like yours, and second thing which I do is that I try to read lot of blogs because I figured it out if I’m not learning somebody else is learning so I go out and read lot of different blogs and try to learn that what they are talking about it. And I think if one of the thing which interest me then I would go and double click it. But previously there was a time when I use to open the Microsoft documentation and take each of the word and try to learn that, and dive deeper, demonstration on it. Nowadays, it’s just impossible so I would go out, read blogs, hear the podcasts, see what people are throwing words at it each of it. And I think one of the word will catch me fancy and I will invest time in it. That’s the only way I can learn and when I learn I blog.

Carlos:    Another good point there, right? We can teach somebody and then you can learn it.

Pinal:     Absolutely.

Steve:     I think when you say, when you learn you blog. Well, if you look at your blog that’s pretty apparent that you’ve learned a lot over the last 10 years with the number of posts that you have.

Pinal:     Yeah, thank you! It has been 10 years now. And yeah, I’ve been blogging every single day since I started. So I haven’t missed a single day and 10 years anniversary just happened this November 4th so yeah, I’m very proud and just like as a father would be. Blogs seem to be like something which is a baby and it became my master. It drives me different way now. It motivates me going. And yeah, so blog is like everything. My family run the rounds and feeds my family so it’s a father to us as well. So yeah, blog is playing multiple roles in my family right now.

Carlos:    Yeah, that is impressive. Ten years, everyday for ten years. You’ve set the bar pretty high there.

Pinal:     Thank you! No, it’s a passion. It’s a part of life now.

Carlos:    If you could change one thing about SQL Server, what it would be?

Pinal:     This is amazing question. I have thought about it actually in the past. Two things I want to change. One, first thing I just don’t like the default settings but they are there for some reason. So I think I want Microsoft to educate us what each of the setting does. So they are there in MSDN but when you are installing the product. What if you just stretch max so you could do your parallelism? Now what does it mean actually, right? You have to know. But there is pretty much no help on the installation page, or there is no link, there is no pop up. Nothing and what does it mean? What value I should be keeping? They have a pretty, something I really wish Microsoft can at least give some kind of question mark when you click on it and would go to MSDN page and read base on what each of the version says. This is the one thing I wish I can request the Microsoft team to change it and that would just make things much easier for people to consume. Because otherwise we are just doing searches on internet with each of the word we end up on a lot of bad advices also on internet and I think that’s what one thing I want to change. So these two things are related, one I want Microsoft to educate us about what are the defaults do and second thing I just want them to link us to write and authentic information.

Carlos:    Great point.

Steve:        Ok, so what’s the best piece of career advice that you have ever received.

Pinal:       I see, ok, I have clear answer for this one. Actually years ago, when I was attending one of the summit in United States and I was very much worried about how do I present myself, how do I talk to the people if my content is up to everybody because I see all these greatest people in the world now using amazing content. And I was like I do not know so much things and should I be on blogging, should I be on writing, I do 13 sequels for books. But before that I was so scared to writing one word I would end thinking like what people would think and everything. I think this is the time when I met Arnie Rowland who is from Portland. I expressed my concern to him and he gave me one line advice and said, “If you stop writing about others then and then you will grow.” So I think he said, grow up, write whatever you want to, face up the consequences of what you do, and just improve yourself but do not ever stop thinking what others will say so. That’s what he said. And I think that was so powerful that since that day I have never stop and I keep on going, keep on going about what I’m doing. I definitely improve. I hear everybody I learn and I pay respect to everybody and try to be a person who hears them and improve myself. And I think I learned because Arnie told me that I should be never stopping myself and that’s why the blog is now 10 years. So, yeah, that’s the best career advice and I think I’ll tell everybody out there, “Don’t stop with your passion. Keep on doing but stop and improve yourself and take a moment to fix if you are doing something wrong.”

Carlos:    Great stuff.

Steve:     I think that’s a great point. I know that every time I talk to Arnie it seems like he has always good advice for me so maybe we should have him on the podcast someday on career advice.

Carlos:    Yeah, sounds good.
Carlos:    Pinal, our last question for you today. If you could have one superhero power what it would be and why do you want it?

Pinal:     Oh, I know the answer to this one. And it’s Adrienne Frost, I think she is from the Marvel Universe, and she is associated with X-Men. Adrienne Frost has an amazing psychic power where she can just, I think they call it expert of the psychometry, so she can touch any object and instantly know history of many event concerning to the object. Like I love if I have that one, I touch anybody’s SQL Server and without running any diagnostic I would know what DB was done last evening. I would know which developer has put a malicious code.  I would know what kind of workload change was done in the last week. And I would know that who was the hacker or malicious guy before leaving the job would loop running inside the system. I want this power of Adrienne Frost is in me so when I go consulting I am the man. Every single performance tuning consulting will come to me if I have this particular power by just touching their SQL Server’s Management Studio or Server Node I would know what happened to it in the past with sometime even DMV fails to expose to us.

Carlos:    There you go. You wouldn’t be able to hide from Pinal any longer.

Pinal:     Sweet. Yeah, I really really want that particular power. So I keep on telling my customer that tell us what you did in the last time, last day and then and then I would know what. It was working before, now it’s not, something must have came in between.

Carlos:    Alright. Well, Pinal, thanks so much for being on the show today. We’ve enjoyed it. It’s been great.

Episode 72: Testing Automation for Business Intelligence

In the data space we’re hearing a lot about dev ops, continuous integration and the programmers are getting a lot of love there with lots of tools to help them; however, if you’re on the data side–not so much.  If you’re running Agile in your environment and then you’re trying to do data warehousing or other data development we’re not quite getting the same love. Sure, there are a couple of unit testing software offerings out there, but it’s a little bit cumbersome.  This episode our conversation focuses on testing automation in a business intelligence or data warehouse environment–even in an Agile system.  Our guest is Lynn Winterboer and she has some interesting thoughts at the intersection of business users, data professionals and tools.

 Episode Quote

“I would say what we need the business users to do, if we’re on the IT side, what we need from our business users is their specific expertise and we need to be using them for things that require their specific expertise.” Lynn Winterboer

Listen to Learn

  • Where the burden of final testing should be
  • Roles for the business and the IT sides for testing automation
  • Ways to get started with testing automation
  • How to find the product owner for the data warehouse
  • How your test environments might look different than they do today
  • Testing automation is the goal, but there are several baby steps to get there
  • Why automation is not the most complicated piece of testing automation

Lynn on Twitter
Lynn on LinkedIn
Winterboer Agile Analytics

About Lynn Winterboer

Testing Automation Business Intelligence“I’m primarily a business person who has also spent over 20 years working in the data warehousing world as a business analyst, project manager or being the business lead on a data warehousing project.”

Lynn is a an Agile Analytics Coach and trainer with Winterboer Agile Analytics based in Denver Colorado.  She has two daughters and works with her husband on the business.

Transcription: Testing Automation Business Intelligence

Carlos L Chacon:               Lynn, welcome to the program.

Lynn Winterboer:            Hi there. Nice to be here.

Carlos L Chacon:               Yes, thanks for coming on with us today. Interestingly enough, we met in Kansas City at a SQL Saturday out there and your session topic intrigued me. In the data space or I guess in technology space we’re hearing a lot about dev ops, continuous integration and the programmers are getting a lot of love there, lots of tools to help them with that however, particularly if you’re an … you’re in the Agile space or you’re running Agile in your environment and then you’re trying to do data warehousing or other data development we’re not quite getting the same love. Sure, there are a couple of other unit tests, testing software out there.

It’s a little bit cumbersome. It’s kind of hard to wrap your mind around so ultimately our conversation today is on that thought of testing automation in a data warehouse environment and then even a step further, those who are running an Agile. I thought you had some really interesting questions about it or thoughts around it and so we wanted to have you on talking a little bit about this. I guess, first just let’s kind of set the stage here and talk about why testing automation is so important.

Lynn Winterboer:            Okay. Thank you, first of all, for inviting me to be on the podcast. It’s great to be reaching your listeners. I do feel really passionate about this topic because … I’ll give you a little background about myself. I’m primarily a business person who has also spent over 20 years working in the data warehousing world as a business analyst, project manager or being the business lead on a data warehousing project. What I’ve found over the years is that, in general, the data warehousing industry doesn’t have a whole lot of focus on testing. We’re very good at things like data quality and data modeling and managing large volumes of data and managing complex transformations so we’ve got some really good strengths that lead to great business results but we also haven’t had as much support in the industry for testing as a discipline and as a skillset.

The reason that I really care about this is because it is not uncommon for the final testing to be on the shoulders of the business stakeholders and they are the ones who know the data the bet and they are the ones who will do the best job of validating that what the data warehouse team has built is correct. However, it is not the best use of their time, nor the best way to build a strong relationship with between the business and the development team if what they’re catching are errors that we should have caught early on. You combine that with … We put a lot more burden on our business stakeholders than a lot of development teams do, I think, in my experience. You combine that with the … A lot of the Agile concepts that are really, really … they’re exciting and they are effective and they’re guiding projects really well. Of pulling your testing earlier and making testing something that the entire delivery cycle is involved in, or quality I should say, more quality than testing.

Some of the Agile concepts like having acceptance criteria to each user story or each small requirement and the acceptance criteria’s expressed from the point of view of the business of, “Here’s how I’ll know that you’ve met my need, when you can show me this.” If we can take that acceptance criteria, turn it into an actual test that everybody, the business, the developers, the testers can agree is a good test to prove the development is complete and then automate that, you have a really, really nice set of not only automated tests that prove to the business that you’ve done what they’ve asked but also regression tests that can be used down the road, which is … I would say really critical to being able to move quick and to be agile, regardless of what Agile framework you’re using.

Carlos L Chacon:               No, that’s right and it’s interesting that you mentioned kind of lessening the work of the business holders because, interestingly enough, in our last episode, episode 70, we had Kevin Wilke on. We were talking a little bit about Excel and kind of using it for the analytics piece and we actually, from the tech side, we actually came to it like, “Hey, the business users need to do more,” and so not it’s interesting to kind of throw that back on us and say, “No, no, no. We have enough to do as well.” We need to figure out some way that we can move forward. I think it is interesting that we all do need to kind of get together on the same page and put some of these things in place so that we can move faster altogether.

Lynn Winterboer:            I would say what we need the business users to do, if we’re on the IT side, what we need from our business users is their specific expertise and we need to be using them for things that require their specific expertise. If we can pull that expertise earlier through asking for acceptance criteria, that’s a … and clarifying that acceptance criteria. I did a webinar a couple of years ago where I said the most common acceptance criteria in the data warehouse team is the least useful and that is our acceptance criteria is that the data is correct. No duh. That’s our job, right? That is what we do. That’s what we focus on. That is what we’re dedicated to. How do we know what correct means from your point of view, Mr. and Mrs. Business person? I do think pulling the users in early to have concrete discussions about what do … I hear that you want this. What do you mean by that?

For example, if somebody says … I’m going to give a very, very simple example here just so that nobody gets lost in the domain. Let’s say they want a summary of product revenue by quarter, so revenue by product by quarter. We say, “Okay.” We have our common conversations where we say, “What do you mean by product? When you say ‘revenue by product’ is that the top three product lines the company has or is that the 30,000 skews that we have.?” We have that typical conversation but it’s also helpful if what we can come out of that conversation is them giving us a set of orders, for example, that if you can calculate product revenue by quarter correctly on this specific set of records, I will know that you’ve calculated it correctly because these records have the … They represent each of our major three product lines. They go down as far in terms of they’ve got all the right skews in them that I want to look at and, actually, in that conversation we might come out that it’s not just product revenue by quarter, which might by, say, three numbers if you have three product lines, three summaries. It may be that they’re looking for 30 numbers because when they say “product” they mean throughout the whole product tree.

Coming up with concrete actual examples is one of the most important things. Honestly, as a business analyst working in data warehousing, that’s been one of my go-to tools since the ’90s and lots of good BAs who worked with data teams. That is their go-to tool is an actual example of this. What do you want? What that does, if they can give us a handful of records, meaning a defined set of records … Even if it’s all the orders in Q-4 of last year, that’s okay. It’s a defined set and they can pre-calculate themselves, so this gets back to maybe the Excel comment you made earlier. If they can pre-calculate themselves what they expect that number to be for product revenue by quarter or that set of numbers, then we have something we can test against. We can automate it. You have to know your expected result to be able to automate something. If it takes a human to look at it and go, “Yeah, that feels right,” it’s hard to automate. You can automate the running of the test but you can’t automate the assessment of whether they’ve passed or failed.

Steve Stedman:                Lynn, what you’re describing there sounds a lot like what I’ve observed in development organizations where you have backlog grooming and planning meetings and the product owners are there and the developers are there discussing the user stories and acceptance criteria. Do you see that the business people are getting in the same room with the analysts that are in the same room with the developers that are building out the data warehouses or the BI structure and working in the same way development teams would there?

Lynn Winterboer:            Yes. That is what I’m talking about and you mentioned the role of product owner, which is an important role and typically that person is from the business and is very knowledgeable about the business. I shouldn’t say “typically”, by requirement they are. A product owner is supposed to have three criteria. One is they have the knowledge … I’m going to start with one is the authority to make prioritization decisions and typically that’s somebody who’s pretty high up in the organization in the business organization who has that authority or that authority’s been bestowed on them by somebody high up. They have the knowledge to do the deep digging to come up with, for example, the subset of records and what the expected results might be in this case and they have the bandwidth to do all that work.

It’s pretty hard to find a person in a large data warehousing team supporting a large company. It’s going to be hard to find somebody who has the authority, the knowledge and the bandwidth to represent all the various business entities that that warehousing team supports, even if it’s a specific project, let’s say, for a finance team. You probably would have a good product owner might be a financial analyst who’s worked at the company for a long time and who has the trust and support of the CFO to being making some of these prioritization decisions, or at least to be going out and getting the information on prioritization from the executives and bringing it back.

We do expect the product owner to be deeply knowledgeable in the business and the business needs, however, that doesn’t mean the product owner is hung out to dry all by him or herself. It means they are expected to bring in the people they need to do to get the … to meet the need, the need of product ownership. I’m starting to talk about product ownership instead of product owners because I think it’s a need that needs to be met. Then, you have a product owner who’s that one point of contact but they would bring into this up front discussion the right people they need to have in it from within the business. Even if we-

Carlos L Chacon:               I think even another critical point that you made there is that I think a lot of times, depending on the organization and Steven mentioned some of the steps that the Agile takes to try to convey an idea, but it still seems like a lot of times in the BI space they’re kind of like, “I need this report,” right?

Lynn Winterboer:            Right.

Carlos L Chacon:               They rip off the Excel spreadsheet, “I need this.” The IT folks start thinking about how, “Okay, how’s this going to look? Drop downs, drill downs,” all these kinds of things where they probably used to say, “Stop,” like you mentioned, “Let’s go back. I’d like to see you calculate this.”

Lynn Winterboer:            Yeah. You know, I actually think Excel is an incredibly useful tool for data warehousing. I know I have there are lots of people out there who would slap me for saying that but if I can get somebody in the business to do a manual export from their sources system and show me, in Excel, without re-keying anything, using pure formulas and referencing of cells, what they need and that meets their need that is a great prototype and that goes a long way toward helping us understand what’s needed. What I would suggest is that we would … limit the scope of the input data for acceptance criteria and acceptance tests to say … And again, it’s something you want to be able to pre-calculate.

I had a great customer I worked with. Our product owner at that point was a senior director in finance and she finally came down to, just to make the story short, after lots and lots of testing and lots of going down rat holes and discovering that the rat hole was because of a data anomaly that’s not likely to occur in the future, she finally said, “You know what? We have grown by, our company’s grown by acquisition for twenty years. We have a lot of weird date from the past. We might get weird data in the future but in doing what we need to do to meet the company’s needs right now, I don’t want us spending time on stuff that’s not going to happen in the future because we’re going to have a cutoff date and then we’re only going to be looking at future stuff.”

She came up with 16 orders that if we ran these 16 orders through and applied all the revenue recognition rules, of which they were many and they tripped all over each other a lot, she said, “If you can get these 16 orders to come up with the revenue recognition that I have allocated in this spreadsheet purely by formula, I will know that you’ve got 80 to 90% of the need met and then we can go down the rat holes of the corner cases and decide which ones are important enough to tackle before we go live.” She said, “I don’t want anybody looking at any other date but these 16 orders until we’ve nailed them.” A big part of that, which is not uncommon in data warehousing, is the business was … They had a revenue recognition system that was 10 years old and had been built and maintained by a single individual who had been just taking orders from the business and doing the best she could and what it turned out is there were very complicated rules and there were some conflicts.

There were lots of business decisions that needed to be made so this senior director of finance was able to say, “Once you get these 16 orders to flow through and give us this, then I’ll be happy and I’ll be confident I can tell the CFO we’re on track.” I think that was a great example and that was something we could automate. We were able to automate those tests so every time we put a new rule in, we could run the test suite and in a second know whether we had broken a prior rule or not.

Carlos L Chacon:               I was going to say that that’s where the next kind of transition or difficult part is, at least when I working with data warehouse teams, is then to limit them to a certain amount of data. It seems like they’re always, “We want a refresh of this data so we can do our testing because if we can’t have all the data then we can’t do our tests.” It sounds like if you can identify what that good data looks like, then you can go off of that and then again, like you said, looking at other scenarios further down the line.

Lynn Winterboer:            We do … we still need that good set of data further down the line and so another challenge that I think data warehousing teams face is that our testing environments, typically there’s a couple of problems with them. The first one being they’re typically shared testing environments so if you’re doing an ERP implementation and you’re building reports for that ERP or off of the data in that ERP system, your testing environment from a data warehouse perspective will typically be the ERP’s testing environment.

If you think about people implementing an ERP, they’re testing out workflow processes. They want to know that it can handle all these different workflow scenarios and so they’re basically creating a bunch of junk data to test to prove or disprove that the workflow works. That’s not useful to use in data warehousing. We need what we’ll call quote-unquote real data and junk data doesn’t do us any good. It’s not junk to them. It serves a purpose to them but we’re at cross purposes so that’s the biggest problem I see. It’s not uncommon for large, I’ll say ERP systems or CRM systems again, to not want to refresh the data in their testing environment but once every six months or once ever year because they’ve got all this testing in progress and it’s going to interrupt their testing.

First of all, I think we need separate testing environments. We get one that has a really good set of data that doesn’t change and that is reliable to be realistic as to how the business data looks and they need a different environment. They need a much smaller set of data. They just need to play around with creating records and pushing them through. The other thing is our testing environments tend to be much smaller and not configured like our production environments and that causes us a big problem because if they’re much smaller, we can put a whole lot of data in them. If they aren’t configured like our production environment, then we have all these potential huge problems when we try to deploy to production of things we couldn’t catch in test because it’s a different configuration.

Then I think the third big problem for us is actually what kind of data can we pull into our test environment? With data breaches and data security being so critical these days that if your test environment is not in the same security configuration as your production environment, which it typically isn’t, you’re often not allowed to pull quote-unquote real data in to test because of BI data or financial data or HIPPA regulations. That makes it really hard for the data warehousing team to test on quote-unquote real data because we were not allowed move it.

Some teams are deciding to … The teams who get the magic wand and they can get whatever they really want to do this are getting basically a copy of their production data warehouse behind the same firewalls and in the same security standards as their production systems that they’re drawing from and … are able to have, whenever they want, a refresh of either a subset or a full refresh of data from production. That’s pretty hard to get, financially, but I’m hoping that the benefits of test automation, which is one of the key components of continuous integration, will start to resonate within the executive world in terms of if we’re going to support our software development teams in continuous integration and configuration management and code management and test automation, that we want to support our data teams as well. It might cost something we weren’t expecting but we’ll see where that goes.

Steve Stedman:                Okay. Then at that point you’ve got your test data, whether it’s a complete replica of production or some subset of close to production or whatever it may be there, and you’ve got your acceptance criteria that you talked about and then you have the 16 orders that you mentioned that if these orders go through, that’s representative of everything. Then what is the path from there to really automate that and get it into a process where it doesn’t take someone to manually go through and … I don’t know, manually test that?

Lynn Winterboer:            Manually test it … Actually, that part is the easy part, frankly. Right now you have a business person saying that but the demonstration that you saw, Carlos, at SQL Saturday, came from a group of my data warehousing friends. We were sitting around having coffee one day, talking about life and stuff and I started asking them like, “Come on, guys. Why is it so hard to do test automation?” They all looked at each other and said, “It’s not that hard. We just don’t do it.” I said, “Really? Okay.” One guy had done some test automation in the past and he’s like, “The mechanism’s pretty simple. It’s just … “I can honestly say I think the reason a lot of data warehousing teams don’t do it is we’re a very vendor-driven community, and have been for years, in the sense that …

The example I give to demonstrate this is, if we’re looking to hire an ETL developer and the language, the software language that the ETL developers typically use is SQL, we don’t put an ad up for a sequel developer like other teams put an ad out for a Java developer. We don’t do that. We don’t even put an add out for an ETL developer, typically. We put an ad out for an Informatica developer or a DataStage developer or an Ab Initio developer, which are products that these developers use to do ETL, which is extract, transform and load. If a company’s looking for an Informatica developer and you’re a DataStage developer, they don’t want to talk to you. You won’t even make the first cut to be able to talk.

I have several friends who do development in ETL tools across a variety of tools and they say, “It’s not that hard to cross tools. They’re not that different,” however the industry still isn’t there. I think our vendors have done a good job of becoming very sticky. They have differentiated themselves from each other. They each have their own semantics. It was by design and I’m not upset with the vendors. I think they’ve done their job well. I think we need, as an industry, to start challenging that and stepping back and abstracting it a bit and saying, “In the end it doesn’t matter what tool you use. It’s the skill to know how to use that tool to do something as efficiently and effectively as possible.”

That is why, since there are very few tools out there that anybody has delivered as vendors for testing the data warehouse or for doing test automation … There are tools out there that’ll talk about testing the data warehouse but they’re really where you store your tests so you can manually run them and record the results. For test automation there are really only a handful of tools created by vendors that are available to data warehousing teams. They’re really pretty new to the market. It’s really a blossoming market.

Carlos L Chacon:               Right.

Steve Stedman:                Then those tools you’re talking about there, those would be created by specific vendors like Informatica for their specific product. Is that what you’re saying?

Lynn Winterboer:            Yes, Informatica does have some testing capabilities built into its tool, which is great and if, for teams who are suing Informatica and buy that additional module, I think that’s certainly worth looking into. I don’t know a whole lot of teams that are actually using it. Maybe just lack of exposure on my part to the right teams. Frankly, I don’t know a whole lot of teams that are doing test automation, to be honest. The other tools that I think are worth looking into, one is called Ice DQ, I-C-E like ice cube, D-Q for “data quality”. Another one is called Tricentis. It’s by a company called Tosca. Maybe the tool is called Tosca and the tool is Tricentis. Anyway, they’re out of Europe and I have European data warehousing friends who are using it to there’s their data warehouse so I know for sure that it’s being used and applied that way. Another company is called QuerySurge and then another one is called Zuzena, Z-U-Z-E-N-A. Those are a couple tools that I’ve been learning about lately that teams might want to look into.

Back to your question, Steven, “What does it take to automate it?” The automation piece is not the complicated piece. You really just need to define the tests, store that definition in a database, store the test records, which specific test records do you want to run through for that test, what is your expect result for that test and then have something do a comparison between what actually happened when you ran the test and what the expected result was and to then put the … Basically … Excuse me. Basically record the … Sorry, that totally stumped me.

You want to have … Then you have a tool that’s something that does a comparison. What actually happened when you ran the test versus what were expected results and typically we record all of the results, passes or fails. Typically it only notifies a human if it’s a fail and there may be certain tests you say, “I don’t really want to know right away if this one fails but I want to see it in the log,” or something like that. Other ones you say, “Yes,” you want the developer who last checked in the piece of development to get a notification immediately when something fails. It’s really not that complicated, the mechanism. The hard part in test automation, frankly, is deciding what’s that acceptance criteria that makes sense to the business, what are the examples of records that prove or disprove? Not only do you want to say, “If you can run these five records through and they all calculate correctly, I’m happy, but here are five others that shouldn’t be included in the query and we want to make sure they’re left out,” or something like that. You have to do your positive and negative testing.

That testing mindset and the skillset that testers bring into a team is something that we really need to embrace in the data warehousing world and get them to come into our team. We have to have patience with them because they don’t understand data versus GUI and they have to have patience with us because we don’t understand testing discipline and strategies. If you work together, you can really come up with some powerful things.

Carlos L Chacon:               I think that’s the key point. You said, “Oh, it’s not that difficult.” I think it’s that coming together because the IT teams, they can definitely build the tests but they need to know what the rules are, right? They’re going to get the results from the business and they can’t get that rule from a, “Here’s my report, I want you to build it.” They’ve got to understand what that … It’s almost like back to the beginning what we talked about. It’s, again, to use I guess another Agile word but that cross-platform teams to kind of come together and share information. I think that is what will kind of get us headed in the right direction and, of course, the tools and how you go about that, you can fight that another day.

Lynn Winterboer:            Yeah.

Steve Stedman:                It’s almost like, at that point, it’s more of a culture shift for an organization to break down walls and have that cross-functional capability there.

Lynn Winterboer:            It is.

Steve Stedman:                That’s almost more difficult than the technical piece sometimes.

Lynn Winterboer:            Oh, I think it is. I really do and when we talk about Agile, we talk about even a single data warehousing team or a BI time having cross-functional skills within a single team of say seven people and what we mean by that is we’d really love to see these data teams evolve to where, within a single group of, say, seven plus or minus two people, you’ve got the abilities to do the analysis, the work with the business, then do the data analysis, then do the data modeling, do the ETL, do the BI metadata work or the cube work, do the BI front end and do the testing and quality assurance. It doesn’t mean you need a human that does each of those things. It may mean that you grow your team so that a single human has skills across multiple categories there.

Then we could extend it even further and say a really agile organization is going to have … They’re going to look at things from the point of view of a value chain so if you look at a business value chain … Let’s just say “supply chain management”, that is one of the typical value chains. What we would do there, if were going to be a really agile organization, is we would have the data professionals, the people with these skills from the data perspective and the data warehousing perspective … be closely aligned with the people in the ERP system who are also closely aligned with the people in any other system that plays in that value chain.

What I’d like to see is where data warehousing and business intelligence and reporting and analytics are part of the value stream of a business process. They’re not and also … “Oh, yeah, we also need reports.” They’re not a red-headed stepchild of IT. They’re really a valued part so as you’re building out the software to meet a business need you’re also looking at building out the reporting very soon thereafter in small chunks.

Instead of an entire ERP implementation, for example, you might start … I’m just going to pick this one out of the blue. You might start with shipping data and say, “Okay, we’re going to make sure our shipping component’s work correctly in the ERP and meet the business’s need and then we’re going to do our shipping reporting before we move on to order management,” for example. I don’t know. However your order is that you do these things, it doesn’t necessarily have to be the order in which the value chain flows. It might be the order in which the business rules are more clearly nailed down by the business. You tackle that first while the business is trying to figure out the business rules in the more complicated pieces of the value stream. That’s where I’d really like to see it and I’d like to see that sliced very thinly to run from a business need through front end software through data warehousing to the end before they move on to the next little slice. That’s my dream.

Steve Stedman:                Okay.

Carlos L Chacon:               We should all dream big there, right?

Lynn Winterboer:            Yeah, exactly.

Carlos L Chacon:               To kind of put it all together and to kind of see some of these steps, I think, when we look at implementing data warehouse test automation, the first components are code cultural, right? Getting together with your analyst or whoever’s giving you the spreadsheets and giving them, “This is what I want this report of,” then they have to help you identify the rules and the process of how they go to where that report looks like. We can then take that and that set of data. We then will put that into our test environment, if you will, and that’s what I’m going to run my development against. It doesn’t necessarily need to be a whole copy of my data warehouse because I know the rules that are going to be in place and I can test against that core set of data. Do I come up with the same numbers I’m expecting? Yada, yada, yada, why not?

Then when I move and I go from there, I can do the automation, run through the bill. Then I go to the next step and then when I’m ready to actually present that to the user, and let’s just say we’re going to the next quarter, so the data I have is … Because we’re sitting here in the fourth quarter of 2016, I’ve got data for the third quarter of 2016. When I start looking at fourth quarter data, I’m going to apply those rules I show that back to the business and then they say, “Well, this number looks a little bit odd.” Then I can go and say, “Well, I’ve applied these rules. My third quarter numbers still look good so what is it? Do we have a change? Is there a new product? Is there some other component or rule that we didn’t take into account?” Then we can kind of start from there. Is that a fair kind of-

Lynn Winterboer:            That’s exactly it.

Carlos L Chacon:               Yeah.

Lynn Winterboer:            Yeah, I think it is. I think what’s good about the process you just described is that if you’ve already validated you’ve got Q-3 running right and it looks good and then something looks funny in Q-4, you’re right. What you’re doing is you’re narrowing the range of areas you would look for an issue so it’s going to help with troubleshooting to say, “We’re confident the math works correctly. What else could be skewing this? Is it a change in the business process? Is it a change in our source system, how it records things?” Any of those things, it’s going to narrow down the scope of what you have to go troubleshoot and that is really, really useful to a data team and to the business people because we can have some confidence in certain things and then say, “Here are the unknowns that we’re going to go investigate.”

I think … Then, with regression testing as well, if you have regression testing at a very micro level … Not too micro but unit tests and then the next level up might be acceptance test. If your regression test suite is a combination of unit tests on specific transformations, for example, and then acceptance tests that might be a little bit broader and more business-facing but you’re automating those, you’re going to know pretty quickly what broke and why it broke because it’ll be down to … Let’s say even down to a single field level. “This field is no longer flowing through a system the way we thought it should continue to do so. Nobody has intended to change that but now this field isn’t behaving as we expected.” It’s going to save you a ton of time and troubleshooting and it also really …

I think test automation eventually, Steven, you mentioned the cultural changes. Test automation’s going to help reduce the amount of finger-pointing or anxiety might feel of like, “Oh gosh, something’s broken,” and suddenly everybody’s covering their own bottoms over it. If you’ve got test automation and you’re … The test you’ve automated are pretty micro level, you’re going to know exactly what broke and there’s no need to be guessing or speculating or finger-pointing. You’re just going to go fix it. You’re going to go solve the problem. I think it does help take the emotion out of the reaction to testing results.

Carlos L Chacon:               One last thought I wanted to ask here. Ultimately, this testing sounds a little easier when we are looking backwards but I know a lot of our listeners are going to be screaming at me and saying, “Well, what about, how do I do that for a new dimension? I want to create something that I don’t currently have rules for.” Does it still apply or …

Lynn Winterboer:            I think it does. I think you have to speculate on what … You have to create some, say in my 16 orders example, you might create order 17 and 18 that have the characteristics of the new type of order that’s going into production or you might create a whole nother 16 orders that add these characteristics and then see how that plays out. You may have to create some … You have to mock up the data, basically. I think in the software world they use terms like “mocking and scaffolding” and we can do that in data warehousing, as well, and in BI.

I do know some successful Agile teams where the BI team is separate from the data warehousing team but they’ll have a joint design session and the BI team will do some scaffolding. They will create structures that don’t yet exist in the data warehouse and populate those structures with data as they expect it to come from the data warehouse and then they’ll build from there. It takes some tight coordination between the two teams but at least the BI team can be moving ahead so that when the data warehouse does have quote-unquote real structures and real data for the BI team to pull into their tool set and into their environment, they’ve got a head start. It may not be perfect but then they tune it and tweak it to make it fit instead of starting from scratch.

Steve Stedman:                Interesting. Okay.

Carlos L Chacon:               Should we do SQL family?

Lynn Winterboer:            Yeah, let’s do SQL family.

Carlos L Chacon:               How do you keep up with technology changes now? I guess you mentioned you’re kind of a business person but you obviously you’ve doing a little bit of cross over here and we’ve been able to get along so we’re going to-

Lynn Winterboer:            Yeah.

Carlos L Chacon:               How are you kind of keeping up with changes, I guess even in your Agile coaching as business methods and things change, how do you keep up with that and kind of stay on your toes?

Lynn Winterboer:            It actually takes a lot of energy to do it. I am looking basically at two industries and then trying to find their overlap or their synergies. I have my Agile world and I have my data warehousing world and it’s important to me to keep up with both of them. I go to a lot of local meetups because that is where not only do I learn new things and hear how people … Somebody might use a term that I don’t understand and I can just ask them and say, “Oh, what do you mean by that?” Then they define it and I go, “Oh, okay, that’s something new that I haven’t heard about.” I also go to national conferences like the one … I guess SQL Saturday Kansas City, for me it was national because I had to fly there from Denver. I try to go to those types of conferences.

On the data warehousing world I’m an analyst with the Boulder BI Brain Trust, which is a group that meets on Fridays. Not necessarily every Friday but some months it’s every Friday from roughly nine to twelve-thirty and we meet with vendors and they share us their tools. It’s anything related to data warehousing or BI or analytics. We get a portion of time where we’re tweeting like crazy. It’s a public portion. We record those and if you’re a subscriber to that organization you can go in and view the recordings.

We also then go into an NDA section where the vendors will ask us questions about what we’re seeing on the marketplace and what direction they should go or they show us new stuff they’re thinking about rolling out and we can give them feedback. The vendors get some good feedback from this group as well, as us learning about them. I don’t go to all of them because it’s a big chunk of time but I really try to go to as many as I can, especially if I think the tool or the vendor that’s presenting has anything to do with Agile enablement for data warehousing. I think one of the most important things is I keep a list of experts when I meet them so when I went to SQL Saturday in Kansas City, I met several people who i will now have in my own CRM as somebody, these people are really good with this type of thing or that type of thing. If I have a need or a question, I know who to reach out and I maintain those relationships and I hope they will leverage my insight and knowledge, as well, and reach out to me when they have questions.

Steve Stedman:                Okay. Lynn, what is the best piece of career advice that you’ve ever received?

Lynn Winterboer:            I’m actually going to give you two but I think they’re related. The first one that comes to mind is really about trusting your higher power. The lady who shared it with me was a woman who was very successful in her data warehousing career and with her company and I asked her at some point, “How do you get your career to be as good as yours? What is your secret to success?” Frankly, she said, “I pray. I pray a lot.” I would translate that to say, pray to your higher power, trust the universe, listen to the yearnings of your heart because I think those are given from above and at leads to finding the work that makes you want to jump out of bed in the morning and excited to go to work.

I know, for me, that is what led em to bringing Agile and data warehousing together. I love the data world. I am hooked. I’m a true data geek and I love the people who work in that world. People don’t go work in the data world if they’re lazy or dumb. Data warehousing is full of some really smart, intelligent, hard-working people. The Agile piece is what brings the joy and the trust and the cultural changes that make it exciting to go to work there so I really decided about five years ago I just wanted to bring the two of them together and that combination, those synergies and bringing those together is what makes me want to jump out of bed every day, so sort of a two-part answer.

Steve Stedman:                Okay. Great.

Carlos L Chacon:               Very nice. Our last question for you today, Lynn, if you could have one super hero power, what would it be and why do you want it?

Lynn Winterboer:            Let’s think. That’s a great question. Should’ve known after meeting you guys at the SQL Saturday that that would be one of the questions. I would say I would have the ability to gift to another human being the gift of deep empathy, to really be able to understand your … somebody who’s very different from you. Where are they coming from? What are their motivations? What are their fears? I think the world is really a scary and sad place right now and I think empathy … I’ll distinguish empathy from sympathy. Sympathy is saying, “There, there, I feel bad for you.” Empathy is saying, “Wow, I can’t … That must be really hard. I’m imaging if that happened to me.” I really think if I could do that, it would be the ability to give the gift of deep empathy. Knowing myself, I would have to not have a super hero costume or a cape because I get very distracted by that external thing. Be better if I could walk around kind of invisibly gifting empathy to people.

Carlos L Chacon:               I was going to say, if that gets out and too much empathy, people might start running away and be like, “Ah, no!”

Lynn Winterboer:            I know. [inaudible 00:51:40]. Yeah, I’d maybe have to have the ability to be invisible and give empathy.

Steve Stedman:                Okay.

Carlos L Chacon:               Lynn, thanks so much for being on the show today.

Lynn Winterboer:            Thank you, guys. It’s been delightful.

Carlos L Chacon:               Yeah, we do appreciate it.