Does not play well with others. Is this description appropriate for some of the queries in your database? In this episode, we explore the role behind isolation levels–why they are important, what they help prevent, and why so many people try to get around the rules they try to enforce.
We want our databases to be responsive and provide us with the data as quickly as possible; however, the database has to juggle everyone trying to make changes (updates, inserts, and deletes) with all the reporting options (selects). To do this, SQL Server has a mechanism to retain order when there are multiple requests at the same time. In some cases this requires locking records to ensure only one request can modify the record at a time. These locks can lead to blocking and this is when people start trying to do crazy stuff.
When you are reading data that is involved with another transaction. You ignore any potential affect it may have on your results.
“Isolation levels are about the degree of isolation you want an individual transaction to have.”
“If everybody is only reading and there is no write to the database then isolation levels don’t matter.”
“An optimistic model does work well if you don’t have to read conflicts and the pessimistic model works well if you have conflicts.”
“It push load on TempDB and if TempDB is already a bottleneck then, yes, you could potentially have issues there.”
“Memory optimized tables supports snapshot isolation, repeatable reads and serializable.”
Listen to Learn
00:05 Introduction of the guest speaker (Jos de Bruijn)
00:24 Episode Topic: Isolation Levels
00:52 What are isolation levels and why it is important?
04:18 Dirty reads and Phantom reads
09:30 Isolation levels when building a database application
11:51 Optimistic Model and Pessimistic Model implementation
18:15 TempDB bottlenecks, locking and blockin user databases
21:09 Difference between the utilization of Read Committed Snapshots and General Snapshot Isolation
26:41 Implementation of repeatable read and serializable isolation on memory optimized tables
34:43 Locking hints and when to use it
36:31 SQL Family questions
About Jos de Bruijn
Jos de Bruijn is a Senior Program Manager in the Database Systems team. He works on the SQL Server Database Engine and the Azure SQL Database service, focusing on query processing and data storage. Jos has led the query and transaction processing for In-Memory OLTP in SQL Server 2014, and delivered In-Memory OLTP enhancements in SQL Server 2016/2017 and Azure SQL Database. His current focus is performance and scalability in Azure SQL Database. In a previous life, he obtained a Ph.D. in knowledge representation and semantic web technology, and worked in academia for several years as an assistant professor.
*Untranscribed introductory portion*
Carlos: Well Jos, welcome back to the program.
Jos: Well, thanks Carlos! I’m happy to be back.
Carlos: Yeah, it’s great for you. It’s always great to talk with a Microsoft folks and we enjoy you taking some time out to talk with us. It’s been almost a hundred episodes, and so it’s nice to have you back on.
So today, we’re going to be talking about isolation levels which might seem like a slightly daunting task. But I think it’s one of those very important features, and I guess let’s go ahead maybe set the stage for why isolation levels are important, and what are they help us to do. Right, so I guess how do we want to frame that in this conversation?
Jos: Yeah. So isolation levels, so we are talking about transaction isolation levels and isolation levels are, as the name suggests are about the degree of isolation you want an individual transaction to have. Because in a database system, database systems, SQL Servers, support a higher degree of concurrency that you can have. Lots and lots of different transactions, user operation happening at the same time. So the question is if you have a transaction that is operating on some data, reading some data, making modifications to what degree it isolates from another transaction that is touching the same data or reading the data and modifying the data at the same time. So that is why isolation levels are important because what you don’t want is that you are making an update on some data. Let’s say you’ve banking software, is a very classical example, you are doing an account transfer between one account at the other. You want to make sure that the overall balance is between the two is zero, right? Subtract 10 from one account then add 10 to another account. At the end of the day, you want to make sure that you have to balance, that’s only $10 a shift, is you don’t want another transaction to come in halfway and look at the data as if there was higher total amount. You want the other transaction to only see the result of your transfer and not…
Carlos: In between state. Ultimately, if you’re writing queries to the database, it’s how those queries behave with each other. Right, that’s kind of the idea of our conversation today.
Steve: And maybe just to phrase that a little bit different from the developer’s perspective, when the developers has an application that’s talking to the database. It is how those queries that are coming from that application are going to behave. And the different transaction isolation levels may change behavior of those queries slightly. Is that fair way of saying it?
Jos: Yeah. It is actually about the interaction between the queries and the write operations. If everybody is only reading and there is no write to the database then isolation levels don’t matter, and there is no difference whatsoever in the behavior between different isolation levels. It’s only in the interaction between the reads and the writes. So if you have a query and some other transaction is doing writes to the database, your isolation level essentially governs what you can see, what’s states of the database you are allowed to see.
Carlos: And what we are trying to solve here is I think help that basic like basic transactional database logic from the very beginning, right from the very beginning, almost the reasons that these transactional systems were put together. These are kind of the rules that it agree that it would follow in correlation from those updates and reads.
Jos: Yeah, that’s right.
Steve: So two terms that seem to always come up when we start talking about isolation levels are the term dirty reads, and the term phantom reads. Can you touch on what those exactly mean from the SQL perspective?
Jos: Yes. So dirty reads means that is reading data that has not actually been committed to the database. So usually when you have a transaction that does writes, you do the write operations. You make the changes in the tables and then at the end of your transaction you’re doing a commit. And then when you receive an acknowledgement from the server that that transaction has been committed, that means only at that point in time is the data really part of the database. At any point before that it, it is actually dirty data and if anybody reads that data it will be a dirty read because before the transaction is committed, it can roll back at anytime. And if the server crashes or failsover before a transaction is committed it has always roll back. So a dirty read means you are reading data that has not yet been committed to the database and you don’t know if it will ever be committed to the database. So that is a dirty read. That is kind of you’re getting in very flimsy territory and usually dirty reads is what you really want to avoid. Phantom reads, that is something very different. Phantom reads, that is data that has been committed to the database, but it has been committed essentially while your transaction has been running. It’s a subtle kind of interaction between transactions.
Steve: When you’re selecting from a table and you’re reading through page by page, as a query does, and something happens that causes pages to shift around in that table. For instance, an index is being reorganized, or somebody has inserted into a page that’s full that causes a page split. What can happen when that occurs, you get records that are either duplicated or missing. So, phantom records are basically when rows are either being counted twice, or they’re not there at all. And what that can do is if you’re doing reporting, and it has a table with this going on, is you can get very skewed numbers on your results. So, if you’re doing, like here’s 10 rows and then at the bottom there’s a summary that says here’s our total line, if your total line is being calculated and a row moves, some rows can be counted twice and some rows can be missed. If you’ve got a clustered index on a GUID or a last updated date or something that can change, and that changes, that causes the placement of that row in the table to move. And if you’re doing something like select count where from that table, with locking, make sure that you get the right number or rows. But if you use select count with no lock, you can then end up with those phantom records, where you have either rows missing or rows duplicated because using no lock says “I don’t care about other people’s locking, I’m just gonna scan through the pages and grab whatever’s there that’s not fully committed.”
Carlos: Exactly, so although that work is happening, and it’s still gonna give you the results, regardless of what other people are doing.
Steve: Yep, and what that means is you can grab the same row twice, or you can miss a row completely.
Carlos: This sometimes can be difficult to pin point when you talk about coming from a developer’s perspective. You know, when you think about testing as to writing your queries against your own database. You know, they happen very, very fast even when you try to “load test” and then maybe you have some different windows open or connections and you’re trying to replicate some of the stuff. But really it gets important when that concurrency, when the number of transactions increases at the same time on the database. And that’s when you start to see all these rules to start to come in force. I know sometimes when I was first starting with this it seemed a little abstract but again that idea of who gets to go first and what’s going to happen plays an important role.
Jos: Yeah, absolutely. But it is important when you are working with a database application especially if you have sensitive data that you are working on especially with things like financial data. It is important to really think through like when you are reading the data. Like how important it is to isolate the reads from concurrent applications. How important it is that when you do for example a range query, when you query all these account data for a single use, how important is it that nobody adds any additional account data to it, right? So that is what you need to think about when you are building a database application.
Carlos: Yeah, and again I think this could be a little bit blurry in the sense particularly when you have all of this, I guess you use the streaming data, right? For all of these, the no SQL Solutions that are coming on the scene. Again it’s that balancing act of how do you want these interactions to play into. And these are again core fundamental issues. Now, while we want our transaction to play nicely together like you mentioned with the account, my bank information, I don’t want somebody to see that I’ve taken 10 out and have it deposited into the other account while that’s going on. But in order for the database to be able to force that one of the things that I’m going to have to deal with then is blocking. That’s saying like, “No, I’m sorry you can’t you until I’m finished.” And that’s kind of the mechanism that the database is going to use to help enforce that traffic.
Jos: Yeah, that’s right. I mean, there are essentially two ways that this can be enforced. So first off is you need to make the choice what isolation level is right for your transactions and for your read operations. Usually you pick an isolation level either for you whole application or for individual transactions. But it is even possible to pick for individual read operations. So that is first part, choice. And then in terms of how these isolation levels are implemented. There are generally two ways to implement. One is optimistic way, the other is pessimistic way. So pessimistic is the default or has been the default for the long time. Optimistic way, that has been introduced first with snapshot isolation for kind of general tables. And then later with memory optimized tables also if you pick the repeatable read or serializable you’ll also get optimistic. But essentially the pessimistic model means that you are going to assume that there are going to be conflicts between read operations or different read operations that will get into each other’s ways. We are always going to assume some other transactions go to write something to the database that’s going to violate your transaction isolation which means that you need to take a lock on a row or range of rows. So basically if you pick for example, repeatable read. A repeatable read scenario if you read a row you will always want to be able to read that same row in the same state or later point of your transaction. Or serializable meaning that you essentially have serialization path between different transactions, right, so you don’t see any interference between transactions, meaning in practice if you read a range of rows or you read an entire table, no rows will have been inserted into or deleted from that range. So in the pessimistic model you actually take a look to make sure that nobody can change those rows or these range of rows. Whereas in the optimistic model, in the optimistic model, what you say is, “Well. I’m going to anticipate that there are not going to be any conflicts between these different operations”, and if there is a conflict so for example two concurrent transactions want to update the same row at the same time, you get a failure right away. If there is some concurrent transaction that commits but violates that serializable isolation then your commit fails. An optimistic model does work well if you don’t have to read conflicts and the pessimistic model works well if you have conflicts. But then of course you have these locks, and especially if you use serializable you can have lots of low ranges of data and then you can have transaction that start to wait for each other.
Carlos: Right, and this is again some interesting ideas that have been floating out there. Steve and I were chatting about this before we get started here and so you mentioned SQL Server is using a pessimistic by default, right? So that isolation level read committed and we know that it wants to protect the reads that are happening. So we are going to start putting locks there then we start getting blocking. Well, we have this, I don’t know, if it’s secret sauce. But there is this idea out there that people can use a function called no lock and that’s somehow is going to get by. It’s going to ease their burden and they are not going to worry about all these things.
Jos: I think when we start talking about this level then you are already kind of assuming that you are already using a pretty low level of transaction isolation. So earlier while we were talking about transaction isolation we were talking about higher isolation levels like serializable or repeatable read. Those are pretty high isolation levels. When you get actually to the lower ends, when you get to read committed which is the standard isolation level in SQL Server. It’s default. That is the isolation level where you still have some integrity in the database where you avoid dirty reads. So every wrote that you read has actually been committed to the database that is why we recommit it so you have a minimal level of consistency. You can still have locking and blocking because when you’re updating a row, you need to take a look on that row so that nobody reads it. But at the same time I will throw it here as well that we do have read committed snapshot isolation which is not used by enough customers of ours. So locking and blocking, read committed snapshots. Unfortunately, there are some horror cases where the application can’t observe differences of behavior between read committed snapshot and traditional read committed. But in most cases you’ll be fine so that is what I would point at first.
Carlos: So I guess what we are suggesting here is that if your organization as it’s rule is using no lock everywhere by default, you probably want to take a look at your isolation levels before you keep slapping that on all of your transactions.
Jos: Yeah, so first I would say, look at the read committed snapshots. What happens is, so in SQL Server, read committed snapshots came to SQL Server in 2005. Unfortunately, because we have legacy it is still not the default. But if you create a new database in SQL Server, read committed snapshots is still of the database. Now, what does it mean to have a read committed snapshot on? Read committed snapshot means that every individual query that you run within a transaction, each individual query operates all the snapshot of the database, the snapshot as of the start of the execution of that query. This means that if there is some rows that are being updated by a concurrent transaction you are actually reading older version of that row. Which means that you can just continue reading your blocks by updates that are happening at the same time.
Steve: Now, that row versioning of keeping track of those older versions of the rows that’s done in TempDB, is that correct?
Jos: That’s correct, yes.
Steve: Ok. I mean, one of the thing that I’ve ran into as I’ve come across database is where I think this will be great to run in read committed snapshot isolation level because people are already doing with no lock everywhere and it would be great to make it so that they didn’t have to do that. But then on those the same systems because they are already overloaded a little bit to begin with and that’s why they started using the no lock, they find out then that I think look at it in the TempDB is their biggest performance bottleneck. And I guess how big of an impact will turning RCSI on. I assume that going to put quite a bit more load on TempDB and if TempDB is already a bottleneck that would not be a good idea to turn on.
Jos: Well, indeed. It push load on TempDB and if TempDB is already a bottleneck then, yes, you could potentially have issues there. However, usually locking and blocking in a user database is often a different performance or a different performance issue from TempDB bottlenecks. So indeed, if you have lots of TempDB bottlenecks and on top of that you have these blocking bottlenecks in the user database. Then yes, maybe it will not help you. Maybe TempDB will become an even bigger performance bottleneck.
Steve: But if you have locking and blocking in your user databases and TempDB is not overloaded and that’s a good scenario to turn on the read committed snapshot isolation level.
Steve: Ok, I just want to make sure I got that right.
Carlos: That’s where we’re looking at things. I think, Steve, to your point the idea is like you’re going to trade one problem for another, right? So I’m going to trade my locking and blocking for potential TempDB issue contentions. As long as I knew that’s that taken care of. Good current feeding there. Maybe you’ve move that over to your SSDs or whatever is going to flip your boat there. There is no magic bullet. I guess the idea is that you want to make sure that you understand the trade-offs and that you’re prepared then to deal with them.
Jos: I do want to put another point to this and there is a difference here over the utilization between read committed snapshots and general snapshot isolation. Because read committed snapshots, you use that snapshot only for the running of single query not for your entire transaction. If you have traditional snapshot isolation you use for the entire transaction. So if you have traditional snapshot isolation, you need to keep the row versions in your TempDB for the entire duration of the transaction, and if you have long range transactions in the system which lot of strain on TempDB. But if you use read committed snapshots you only need to keep the version for the duration of the query which means that there is a lot less pressure on TempDB compared with whole snapshot isolation.
Steve: That’s a great point.
Carlos: I feel like the read committed snapshots however is an introduction to the in-memory isolation levels.
Jos: To some extent, so if you look at isolation levels which are supported by memory optimized tables. So memory optimized tables supports snapshot isolation, repeatable reads and serializable. Read committed is actually not supported, so snapshot isolation. But with memory optimized tables the row versioning is not done in TempDB instead it is done in the memory space of the user database. It is build into the table data structures so the row versioning is a lot more efficient than for traditional disk-based tables.
Steve: So then with the read committed snapshot or the snapshot isolation levels, those could be turn on in existing databases without changing the structure of the tables. But for the memory optimized tables that’s involved in changing over to the Hekaton or the memory optimized style table.
Jos: That’s right, yeah. And when talking about the way this isolation levels work for memory optimized tables, they are optimistic model. So first is you have the basic snapshot isolation on memory optimized tables that works functionally very similar to disk-based tables. So you see a snapshot of the data as of the start of the transaction. If there are concurrent rise let’s say row, you have conflicts that are being detected, throughout your transaction you always read from the same snapshot level of the database. Now, where it gets interesting with memory optimized tables is for repeatable read that are serializable isolation levels, how they are implemented, because they are implemented using the optimistic model which means that you have the same guarantees. So repeatable reads nobody has change, nobody has change this row between the time you read it and you commit the transaction. For serializable you essentially don’t see any other transaction committing data in any of the ranges that you read from. Those are the basic guarantees.
Carlos: Now, you’ve actually been bringing up the serializable and repeatable read quite a bit, but you also mentioned the read committed snapshot something that probably more customers should take a look at. Now admittedly, I think may have had one database that I have to put in repeatable read but I’ve never work on a database that has been in serializable. How common is that. I mean, I feel like I only know it because it will only be used for exams and stuff.
Jos: Yeah, that is true. So if it’s serializable is the only true transaction isolation if you want to look at from a purist kind of point of view. That is where the research comes from. But then when you hit reality then you have performance, right, and concurrency. With serializable, with the pessimistic approach where you take the locks, then you have this range of rows that you’re reading and if your query does a full table scan you take a look on the entire table. And between that time and when you commit a transaction nobody else can make any change in that table. So that will be very restrictive so there is a lot of locking and blocking. And if you don’t absolutely need it for this particular read operation then you don’t do it. That is why the lower isolation levels are more popular, that’s why read committed is the default isolation level in SQL Server. And actually one fact that not everybody realizes is that you can actually pick and choose isolation levels on a query level and even on the table level within. So within your query you can add a hint to the table and say, “The read on this table need to be under serializable for example but the other tables can be read under lower isolation level.” So you could say for example, my accounts table I always want to read that under serializable but other tables I can read them on the lower levels.
Steve: Well, so then with that, I mean independent of what your overall database isolation level is if you that control on a table by table basis then you can, as long as you understand what all those mean. You can get the behavior you need between different tables where you have one table that no lock is ok. You might have another table where you need serializable and make it sound how often things are changing, right? Is that correct?
Jos: No lock is ok. I think there are very few scenarios where no lock is ok. There are some cases that, I’ve heard of some cases where you have an application that for example has two connections of the database and one connection needs to see operations that the other connection did in the database without it actually having commit the transaction. And in that case, no lock or rather I would prefer people use read uncommitted in that case because then you’ll actually say what’s your intention is even though implementation-wise it’s the same. But you will actually say what’s your intention. Then that kind of thing might be ok but otherwise it is usually use as workaround around locking and blocking. So semantically people don’t like to read dirty data usually, right? It is more of a performance workaround. I would say rather read uncommitted, consider read uncommitted only if read committed snapshots puts too much pressure on your TempDB or your application somehow cannot deal with the snapshots semantics of the individual queries.
Steve: Ok, thank you. And perhaps I was baiting you a little bit there with no lock comment because I am a big opponent of no lock. It causes so many issues from what I see in day to day organizations. There is so much misunderstanding from people out there around you just turn it on as to go faster switch. And that is really not the right behavior of it.
Jos: Yeah, read committed snapshot does the go faster switch.
Steve: Absolutely, alright. Love it.
Jos: It’s always good to please your host.
Steve: Well, no. I guess what I said I love it, I love it because now I can refer back to this section of the podcast and help solve debates that I’ll have in the future.
Jos: Let’s talk about, besides no lock which I guess you should never use. There are a number of locking hints that people would be familiar with that are useful in some cases where you can say, “I want an update lock on this row” or “an x lock on this row”. I would suggest before going dead route because these locking hints are typically used as to pick and choose on the query or statement level or table level what you actually want. So I would suggest before going dead route think about whether you can solve the same problem with isolation levels because isolation levels actually, you actually declare your intent what you want to do rather than you specify what is the mechanism that the server should use. That’s often easier to understand by the user.
Carlos: Sure, from a troubleshooting perspective when you somebody come in and take a look at it. They know kind of from a system setting like you mentioned here is my intention rather than having to then find a query, oh in this scenario it’s different.
Jos: Yeah, exactly. I do want to acknowledge that there are always scenarios, there are always cases where people do have to resort to a locking hints but I would say do that as a last resort.
Carlos: Sure. I mean you have that in your bag of options but don’t make that the first one probably.
Jos: Yeah, exactly.
Carlos: Ok, so shall we go ahead and do the SQL Family?
Steve: Yup. How did you first get started with SQL Server?
Jos: I got started with SQL Server, when I started to work for the SQL Server team. That was in October 2010. I moved to Seattle and started working in Redmond on SQL Server query processing.
Steve: And you had not worked with SQL Server before that?
Steve: Wow! Ok.
Carlos: So what made you make that leap, I guess you know when you’re applying for this job and it would be like, “Hey, you want to come work on our database?” “Sure.”
Jos: Yeah. I was working in Academia before. I was working at a university doing research. Doing some teaching, but I’ve kind of decided I wanted to move to the industry, product development to kind of have a more immediate impact on the world and also work in a more closely and collaborative teams. And I want to work in interesting technology, so impact, interesting technology, as SQL Server needs to build there. I knew someone from university who has moved to the SQL team earlier. He told me about an opening. I applied and that’s what happened.
Carlos: There you go. Now that’s interesting. Ok, now I understand why where all the serializable comments came from. Obviously, you are working on a team, right? Maybe we won’t ask what you would change about the SQL Server but maybe we’ll ask you, what’s the next thing you’re working on?
Jos: I actually find the first question, I find it interesting. I do want to give an answer for what I would change in SQL Server. Something I would change if there were no legacy. I guess I know it’s making changes while there are legacy applications very hard but I really wish we could change the default on a lot of things. Like for example the ANSI session settings, like ANSI_NULLS, ANSI_PADDING, all that stuff. I would like to put read, the snapshot that we talked about as the default and a number of other kind of best practices that people have developed over the years. I would like to be able to make those the default to SQL Server. But it’s very difficult because we have a legacy and we cannot break legacy applications of course because it would ruin our reputation.
Carlos: Right, sure.
Steve: Very interesting. Ok. So was there something you want to talk about what you’re working on next or do you want to jump into the.
Jos: No, let’s jump to the next one.
Steve: So what’s the best piece of career advice that you have ever received?
Jos: The best is, work on something you’re interested in. That is the best career advice I’ve received and it’s also the career advice I would give, always do something you find interesting.
Carlos: Our last question for you today, Jos. If you could have one superhero power what would it be and why do you want it?
Jos: What would it be? I think that’s pretty for me. I would like to be able to slow down time or stop time for everybody besides myself so that I could either do some extra work or more likely take a nap while nobody notices.
Carlos: True superhero power, right? You want to take a nap without anybody noticing.
Steve: I would like to take advantage of that one myself.
Carlos: Well, Jos, thanks so much. We do appreciate the conversation and thanks for giving us some of your time.
Jos: Yeah, thanks for having me on, always a pleasure.