Advanced Query Tuning Techniques

Target Audience:

Developers and DBAs with at least 4 years of experience with T-SQL, indexes and query plans


You identified a query that is causing performance issues, and your mission is to optimize it and boost performance. You looked at the execution plan and created all the relevant indexes. You even updated statistics, but performance is still bad. Now what?

In this session we will analyze common use cases of poorly performing queries, such as improper use of scalar functions, inaccurate statistics and the impact of parameter sniffing. We will learn through extensive demos how to troubleshoot these use cases and how to boost performance using advanced but practical techniques. By the end of this session, you’ll have many powerful techniques to apply to solving query performance issues.

Audio podcast:

Enjoy the Podcast?

Don’t miss an episode, subscribe via iTunes, Stitcher or RSS
Leave us a review in iTunes

Why I Want to Present This Session:

I just love query tuning. I have been doing it for the past 20 years, and I still love the feeling of reducing a query execution time from 36 seconds to 420 milliseconds.

Additional Resources:

Session Transcript:

Brent Ozar: Well, in this session at GroupBy, Guy Glantser’s going to be talking about advanced query tuning techniques, so take it away, Guy.

Guy Glantser: Alright, thank you. So, this is me. My name is Guy. I’m a data platform MVP. I’ve been doing all kinds of crazy stuff with SQL Server for the past 20 years or so. Of course, I did a lot of mistakes. This is how we learn and I’m going to share with you, not mistakes, but some of the things that I love to do with SQL Server, which is query tuning. I run Madeira Data Solutions; it’s a company based in Israel. We provide, of course, data solutions, data services, everything around data, mainly SQL Server, but not just SQL Server. And also, Brent mentioned it, together with Matan Yungman, we host the SQL Server Radio Podcast. So visit – we talk about SQL Server, about professional stuff, about our experience. We talk about community events. We interview some guests once in a while; really fun stuff. If you like SQL Server, this might interest you.

This is me, more or less, and the agenda for today is a demo, okay, so that’s it. No more presentation; I’m going straight to the demo. It’s going to be 100% hands-on. Now, before I start, it’s called Advanced Query Tuning, okay. It’s a huge topic. I will not be able, of course, to cover everything. There is no such thing as covering everything in query tuning, but I have lots of stuff. I chose some of the topics that I find are very common and people make the same mistakes again and again, so I think it’s important for people to know about them, and some interesting stuff as well. So I will try to put as much as I can into this one hour or even a bit more, but let’s start.

So, the first thing is, let’s say users complain, they say that there is a specific report. It used to run for whatever, now it’s not as good anymore. It’s slow, please do something about it. So the first thing most people do is either use profiler, or maybe extended events, just to try and track a specific query or the stored procedure that is running behind the scenes and see what it’s doing and what we can do about it. So for the sake of this demo, I’m going to use extended events because it’s like the new thing. I use profiler really, but for the demo I use extended events. Not really, well, I use both, okay, but for this demo, I’m going to use extended events.

So here’s how it goes; create event session – I call this problematic application because we have an application that does problems. And I want to track the RPC completed event – it stands for remote procedure call. Every time the .NET application is going to execute the stored procedure in SQL Server, you’re going to get, when it’s finished, an RPC completed event. And I’m going to put a filter on the client application name, which is my specific application. Now, I have an application here. So here is the application I’m going to write. So it’s running, okay, you will not see anything; it’s just simulating all kinds of activities behind the scenes, doing some stuff like a regular application against SQL Server.

Okay, so it’s running now. Let’s start the event session – let’s create it and now let’s start it – and then, when you use extended events, if you’re not experienced with it, this is going to look ugly to you. Also, if you are experienced with it, it’s going to look ugly to you. Okay, this is how – let me put this down. So this is usually how you query the data in event sessions. You need to use XQuery because everything is in XML. Okay, so without getting into all the details, we just get the data – so we have the events. Each event here is an execution of a stored procedure. We get the text itself. Now, it’s important to get the text itself because if the stored procedure has parameters, like in this case, then we have an execution example, because I’m not familiar with the business, I don’t know which parameter values to use, this is, of course, a simple case; just one parameter, sometimes we might have ten parameters or whatever. So I need an execution example that I can execute myself and then I can troubleshoot and look at the execution plan and so on.

So now I have this and we can also see – if we run it again, we have some more – we can see that the execution time, this is microseconds, so it’s around eight seconds maybe on average, so this is a lot of time for the users. We need to do something about it. So we take the execution example, and here it is, I have it ready here, and we have the actual execution plan, and then I’m going to execute it. Okay, so it takes around four seconds. This is still a lot, but it’s not eight seconds, right. Let me do it again – one second. Again – okay, so the first time here was a bit slower, but we have around one second, two seconds, for this. If we look at the plan, then we see a huge scan. This is a scan of the entire table. We have the invitations table; we have this thick arrow here showing how many rows. In this case, we have the number of rows read is five million rows. So maybe this is why it’s taking long. But again, if we run it here, it takes one second; if we go back to what we’d get from the extended events session, we get around eight seconds, maybe even more.

So this is quite common. I’m sure some people know about it, but I know that a lot of people don’t know about it, that’s why it happens, and it’s important. So I want to show you and also talk about some things behind it. So why do we have eight seconds coming from the application when you track it with extended events or profiler and it takes only one or two seconds when I do it from management studio? It’s the same stored procedure with the same parameter values; exactly the same. So the first thing I want to look at is the plan cache. So I’m going to look at exec procedure stats; this gives me statistics about stored procedures in the plan cache. And I want to also see the execution plan itself, so I cross apply to this function. And if we run this, then we will see that we actually have, for the same stored procedure, we have two execution plans in the plan cache.

Now, one of them up until now was running for 23 times and the average duration was eight seconds. One of them only four times, the average was two seconds. Now, you can guess which one is which, right? So this is coming from management studio with this duration and this is from the application. So we actually have two different execution plans. If we look at the execution plan for management studio, it’s the one I just showed you. And if we look at the execution plan of the application, it’s a different one. But it’s weird because this one looks much better, right? SO the application takes eight seconds, but look at this; we have an index seek. We have very thin arrows here, so nothing compared to the other one; it’s almost nothing, no data. So we have to get ten rows. The estimated number of rows is ten. So this looks much better, but it takes a lot more time to execute. So we need to understand what’s going on.

Let’s answer this one question at a time. The first question is why do we have two different execution plans? Okay, why not one? It’s the same procedure, same parameter values. So what I’m going to do is use another function; exec plan attributes. Each execution plan is stored in the plan cache alongside some attributes describing the execution plan. One of the attributes is the set options. Now, the set options – let me show you. If we look at some plan and we go to the select operator, we do right-click properties, then we have here the set options right here. Okay, so you can see this section here; there are seven set options. These set options affect the behavior of your queries within a session. So you can change them per session and each one has a specific meaning. For example, ANSI NULLS describes what happens when you compare to null. So if you have a select from customers where, let’s say, phone number equals null, if ANSI NULLS is true, you’ll get an [ANSI set always] because according to the ANSI definition, every comparison with NULL is false. Okay, so you need to do actually where phone number is NULL. But if ANSI NULLS in your session equals false, then phone number equals NULL will, of course, retrieve all the null phone numbers; null values.

So as an example, we have seven different set options. Each one has some effect on how the session is running. And the thing is that because it might affect the results of your queries, it also should affect the execution plan for such queries. And then, whenever you have a different set of options for two different sessions, each one is going to get a different execution plan. So this is a hint, and if we go back to our query with the set options – so the set options is actually a set of bits and we get an integer value describing the total value of the bits. So each set option, if it’s true, you’ll get one, otherwise you get zero, and then you get the integer representation of that. So the first thing we can see here – it doesn’t tell us much, but we see there is a difference.

We have two different numbers; it means we have two different combinations of set options for these sessions. And this answers the first question, why we have two different execution plans. Now, we want to see what the difference is; so I prepared a query here for you. Of course, I’m going to share all of this later. So, I’m going to use some bitwise operator to extract the specific set options in different columns, and then we can look at the different set options. And here it is, ARITHABORT – everything is the same except for this. Set ARITHABORT – it’s true in management studio and it’s false in the .NET application. So this is the difference.

Why is there a difference? Maybe first of all, let’s answer the question what does ARITHABORT do? So ARITHABORT affects the behavior of queries when you either divide by zero or you add out of range – range overflow. So you’re trying to do a calculation and put it into a tinyint, but it’s larger than 255, for example; something like that. So for those, ARITHABORT will change the behavior of what happens. If it’s true, you’ll get an error and the query terminates. If it’s false, you get a warning and you get actually a null value in the results of the calculation. So this might affect, of course, the result of the queries.

Why do we have these values? Let me show you. So what I’m going to do, I want to monitor login activity. What happens when we open a new connection? So first, let me open this; monitor logins. Another event session, it’s called monitor logins. We have an event called login and we have something called collect option text, okay. So I want to enable this. I will show you what it means, what it does. I want to add – it’s called an action; it’s like a global field, the client application name. and then I want to filter by the application name, so I only want to get new connections coming from either my problematic application or coming from management studio; a new query in management studio. And then I want to have the SQL batch completed events with the same filter. So I want login event and SQL batch completed event.

So I’m going to create this event session. Alright, let me stop the application and then let’s start this event session, and now I want to start the application. So when I start it, of course, I open a connection and then I want to come back here, and let’s see what we get… Here it is. So, we have the login event. It’s coming from our application. And we have this thing, the network protocol. This is a batch running all kinds of statements. Instead of trying to see what’s going on here, I prepared one in advance for you. This is how it looks like.

So this piece of code is running whenever we open a new connection from ADO.NET using the >NET driver from an application. It’s a set of initialization, initializing all kinds of set options, including this thing. So the .NET application initializes the ARITHABORT setting to false by default. Now, this is part of the .NET driver. It’s been like this for years; forever, actually. It’s hardcoded, it’s part of the driver. It doesn’t matter from where you are going to use this driver and open a connection and how you open a connection; it’s always going to run this piece of code. So this is one thing. This is why it’s false in the application.

But then, if we go back – where are we? If we go back here, we get this for the application. I’m now going to click new query to open a new connection from management studio. And let’s try it again – so we get some more stuff. And this is actually the second login coming from management studio. Actually, when you click new query, management studio opens several connections behind the scenes. It does all kinds of checks. So this is why we have several logins. Let’s look at the first one. This is my main connection, my session. So management studio does the same thing because it’s the same .NET driver behind the scenes. But then, it also runs as several batches after that. One of them is this one and it looks like this.

So after running the .Net driver code, management studio runs this batch to do its own initialization and it does this. So it’s overriding the default setting of ARITHABORT false by setting it to true. So it’s like saying instead of changing what the .NET is doing, I’m not trusting the .NET driver, I’m going to override everything and do it how I believe it should be. And this is practically what’s going on in Microsoft, more or less. The .NET team developed the .NET driver many, many years ago and they put this piece of code in it and at some point in 2005, SQL Server management studio was released for the first time and the SQL Server team said the best practice is to use ARITHABORT true. So they changed it and they put this – instead of going to the .NET team and telling them please change it, they just added their own initialization. And from that point until today, we have different settings in the .NET application and management studio. Okay, so this is the reason why – and by the way, if we go to MSDN, the page describing ARITHABORT, they actually put a warning here.

“The default ARITHABORT setting for SQL Server management studio is on. Client application setting ARITHABORT off can receive different query plans, making it difficult to troubleshoot poorly performing queries.” So instead of fixing the problem, they put a warning here. Now it’s your responsibility. You take care of it, right. And this is how it is for many, many years and I see a lot of people fall in this trap because it might happen to you. You try to troubleshoot something in management studio and you’re actually looking at the wrong plan and you might spend a lot of time and effort trying to improve something that is not really happening in the application – it’s not a real problem – without even knowing about it; so it happens once in a while.

So we answered the question as to why it happens – so what should we do about it? First of all, one thing that is important to say here, that the reason we have two different plans, we’re going to have two plans anyway because we have two different set options. But the reason we have two different plans that do different things is because – it’s not because of ARITHABORT. In this case, ARITHABORT doesn’t change anything in my specific query in this case. The reason is something else. For some reason, when there was a new compilation from management studio – in this case, we got a different plan. So the ARITHABORT setting only created a situation where we needed to compile again.

It’s important to understand. It has nothing to do with ARITHABORT itself, what it’s meaning. So in this case, it was parameter sniffing, which is very common. I use some parameter in management studio which is different from the parameters the application used for the first time and for each parameter we got a different plan. Now, what can we do about it? The best practice again, according to Microsoft, is always set ARITHABORT to on. So what we should do actually, is complain to Microsoft and ask them to fix the [crosstalk]. It won’t help you, okay, so what we can do is have some kind of a mechanism in your application, in your DAL code or whatever, to always set ARITHABORT to on in your connections. But if you don’t do it – almost nobody does that – if you can’t do it or you don’t want to do it, at least when you try to troubleshoot something that’s run by the application, change ARITHABORT to off in management studio – and by doing that, if I run it again, now I’m actually reusing the plan that the application uses.

Now, I can really troubleshoot and investigate what’s going on in the application, why it takes eight seconds or even more in this specific case.

Brent Ozar: We have a question from Steve, “Do you happen to know what the default ARITHABORT setting is in PHP?”

Guy Glantser: I have no idea. Great question, I really don’t know. I can try and check that, but I don’t know.

Okay, so now, I get the plan the application uses. Now I can troubleshoot and investigate and try to understand what’s going on, why it takes so long. In my case, it was even 16 seconds. So let’s try to do that. so maybe the second question is, how come this uses and index seek and it takes so long and the other plan uses a clustered index scan over the entire table, five million rows, takes just one or two seconds? It looks like a different thing. And maybe the hint to the answer is the thickness of these arrows. So if you remember, when I showed you this plan before, it was very thin. It looked like very, very fast and efficient. But now it’s very thick. And the reason is, before I looked at the plan cache and I only showed you the estimated plan. And in the estimated plan, you see the estimated number of rows. And the thickness of the arrows is according to the estimated number of rows.

But now, I’m looking in the actual execution plan and now the thickness is actually representing the actual values. So if we look here, we can see that there is a huge gap between the estimated number of rows, which is just ten, and the actual number of rows which is a million and a half, more or less. And this is the reason for the poor performance in this case. So the optimizer estimated ten rows coming from here for this index seek, and for ten rows, it’s very beneficial to use nested loops because we only do ten key lookups. This is great. In reality, we are doing a million and a half key lookups, which is not very good. We get this number of logical reads, which is very high. I didn’t show you before, but this is very high compared to the other plan.

So this is the reason in this case; this is the problem. While the other plan doing a scan – in this case, a scan is much more efficient than doing one million and a half key lookups. So this is the reason why we have different performance metrics. Now, what we can do about it – we can do all kinds of things about it now that we have a poor plan in cache and the application is suffering from it. The first thing, very easy – it won’t always work. It’s just like trying to do – it’s luck actually. It depends on luck in this case. It’s trying to recompile the stored procedure.

By using sp_recompile, I’m instructing SQL Server to mark this plan as invalid – actually both plans. We have two plans now, so mark all plans in cache as invalid. And the next time a user is going to execute the procedure from the application, it’s going to recompile again and maybe it’s a matter of luck. Maybe we’ll get the good plan the next time. So it’s just gambling in this case. Let’s do this. So now, if we look at the plan cache, we can see that we still have – okay, so the application is running. Now we have just one; it’s the new one. It’s already initiated and now it’s doing the good plan, in this case. So it worked in this case. It will not always work because sometimes you might get again a newer compilation with the wrong value of parameter. And of course, you need to troubleshoot more, so I am not going to go into too much detail here because I have other things to show you.

If we go back to on, run it again – so now, I get the same plan. If I look now at the plan cache, we still get two plans because of the same problem, because now I’m using ARITHABORT on, but now in this case, this plan and this plan are actually the same, but we have still two instances of the same plan because of the different set options. Now, if we look at the stored procedure itself, get invitations by status, let’s look at the code.

Very simple, select top ten from invitations where status ID equals something. Okay, so we have different statuses. I want to look at all the invitations by specific status and only the top ten, order by creation date time. Okay, so if we look back at the plan again – so we’re doing a scan of the entire table, five million rows, and we’re adding a predicate because there’s probably not a good enough index in this case. So we’re doing a predicate, not a seek predicate, on the status. So scanning the entire table – we know we have an index because we saw an index seek before. We have an index on this status, but because now the optimizer estimates the correct number of rows – compare this one to this one – now the optimizer knows it’s not going to be a good idea to use that index, I’m going to use a clustered index scan.

Of course, we can make the index a covering index by adding included columns. This will make things better, but not good enough. What I want to show you is something else. If the query is this one, if we order by this, we get the top ten – we only need ten rows out of five million and we see the distribution. We know there are one million and a half rows with this status. Actually, in this case, you need to know distributions. So I know there are only three different statuses. Each one has around one million and a half values. So if I do an index on this one, if I add an index on creation in the date time, now the optimizer can use this index to not need to sort by creation date time, it can scan from the index in a sorted order, and then just take the first ten after applying this filter. Let me show you.

So if we go back here, we create this index on creation date time. Okay, so we have the index. Now, let’s look at the plan. We have a different plan. We have an index scan – it’s not a seek, it’s an index scan on our index, on the creation date time index and we’re actually reading 30 rows. We need ten. Now we read 30 rows from the table, from this index, but we do it in an ordered fashion, order equals true. So we take advantage of the sort order of this index and we read the first 30 rows, then after the key lookup, we can add here the predicate on the status. So after applying this filter, out of 30, we get the top ten. We get the ten rows we need and we’re done, okay. So we get down to this number of logical reads, which is really, really great. Even if we add an included column to the index we had before in the status, we won’t get the correct result.

So in this case, the thing I want to show you here is that usually when people look at execution plans and they analyze queries, we tend to look at the where clause first. So we have a filter here, we need an index of status ID. Let’s create an index of status ID and then see how it affects the performance – and sometimes people just stop there and that’s it. You need to always also look at the order by or sometimes the GroupBy in the query because as you can see, adding an index on creation date time is much, much more beneficial. Okay, so it really depends on what the query is doing. And don’t just look at the where clause, the predicates here.

Alright, so I think I’m done with this. Any questions, Brent?

Brent Ozar: No, you’re good. A couple of people asked about the code map on the right-hand side in SSMS and I sent them over to the web post about it.

Guy Glantser: Yeah, you just do the right click, scrollbar options – this one, right? Use map mode. Instead of bar mode, use map mode, and now everyone can concentrate on what I’, showing you, not on this one. Don’t look at this one anymore, look at my code.

So the next thing – so let’s just learn from here that the next time you troubleshoot an execution plan in management studio, remember that you need to set ARITHABORT to off or at least make sure that you use the same set options as the application, otherwise you might be looking at a different plan. And I showed you what happens, let’s move to the cost. I’m going to run this. So I have a stored procedure; it’s called send fake invitations. I will not tell you what it’s doing because it’s criminal.

So it’s running. I’m going to show you the results of the plan. While it’s running, let me open the procedure – that’s what it’s doing, okay. So we’re declaring some table variables here, eligible members, potential couples – it’s a dating website. We try to correlate members with each other and create potential couples and then what we do, we correlate it. We take 200 random members and 200 random members from here and then we try to match them based on the same country, sexual preference, their birth date and stuff like that. We delete all the non-eligible members and then eventually we insert the results into our invitations table; those are the fake invitations. We’re trying to fake invitations on our website to make it look like we have some traffic. It’s a legit business – or not…

So let’s look at the plan. So we took 41 seconds. So we have a bunch of statements here. This is the insert of the first table variable; this is the deletes and so on and so on. And as you can see, the first one, if we look at the cost, the first statement 61%, zero, 39% and that’s it. And then we have another one, zero here and a zero here, so we get 61% for the first statement. So this one, the clustered index scan of the hash match aggregate, this plan takes most of the time in this stored procedure, right? Wrong.

Okay, so this is what I want to show you; this is wrong. So a lot of people will look at the cost and based on the cost, they will make a decision, where should I start, okay, because we have lots of statements, where should I start? I start with the highest cost. It makes sense. Let’s put our efforts here and see how we can improve this statement, this insert into eligible members and so on and so on. And of course, the same goes within a specific plan. So inside the plan, we have the relative cost, we have 37% here, we have 59% here. So again, where should I start if we have a more complicated plan like, let’s say, this one here.

So we look at the cost and maybe this 42%, this sort is the problem in this case and we need to look at why is it doing a sort, how we can maybe eliminate sorts and so on and so on. And I want to show you something else now. Let’s open the statement execution and extended events. I need the object ID and the database ID. So it’s 16 and this is the same. Alright, I’m going to create an event session and I’m going to track the event sp_statementcompleted, in this case. So every time a statement within a stored procedure is finished, I get an event. And I only want to focus on this specific stored procedure – the one we’re running, fake invitations. So I don’t want to get all kinds of garbage in there. I will start the session and I will run it again. The same thing, okay – so it will take about 40 seconds, right. But now, while it’s running, I can see what’s going on. Every time a statement is going to finish, I’ll get an event.

So we have three out of five already. This is weird already because I showed you earlier, that’s 61% of the 40 seconds, whereas the first one, right, doesn’t make sense. It’s not done; it’s still executing. Okay, so when it’s done, I want to show you the numbers here and we can more or less see which statement out of the five is really the bad guy, and we’ll talk about, of course, why. So let’s just finish…

So, if you look here, this one is 39 seconds actually, out of 41. All the rest are like nothing compared to it. So it’s actually the fourth statement. According to the actual duration, the fourth statement takes 39 seconds. And again, if we go here, 61%, 39%, the fourth statement, which is right here, is 0%. So it’s a huge difference between what the cost is telling us and the actual duration, at least based on extended events. So what’s going on? The problem here is that when you look at the cost, the cost is estimated. And it’s always estimated. This is something that is confusing because we have the estimated plan and the actual plan. And in this case, I’m looking at the actual plan now. So if I look, for example here, we can see actual values. This is the actual plan. But even with the actual plan, the cost is estimated. Those are the estimated values that the optimizer calculated during compilation, during the generation of the execution plan.

There is no such thing as actual cost, actually. The optimizer never calculates actual cost. It calculates the cost as part of the optimization process to evaluate different alternatives and choose the best one with the lowest cost. So the cost is something that is done only during the optimization; it’s always estimated. Now, there is a paradox here because if we have performance problems, in my experience, in most cases, when there’s a performance issue, it’s because the optimizer is wrong somewhere with its estimations. And if it’s wrong, then when you’re looking at those numbers, they are wrong, they are lying to you. This is not the real numbers because they are based on estimations, right, and you’re trying to understand why there is a gap between estimation and where is the problem and it’s a loop. So you’re in a problem here.

So don’t trust the cost. You can maybe sometimes use them. I will talk about it – but don’t trust the cost. Don’t start with the cost. I would start with looking for the operators where there is a big difference between the estimated and the actual number of rows. It’s not so easy always to look at everything and see where the problem is, so you need to see what the stored procedure is doing, of course, and where to start maybe to look. It’s not that simple, but usually I will just look around and look for the differences. And in this case, I will show you because I know where it is – if we look here for example – so we talked about the fourth one; this is the fourth one. So we now, when we look at the event session, we see that the fourth statement, this one, with the 0%, is actually running for 39 seconds.

If we look here, we can see that the estimated number of rows is just one. The actual is this – and of course, this is a big difference. And again, the problem here, because the optimizer thinks it’s one, it’s going to do nested loops and only one iteration of this table scan – this table scan is going to scan many, many rows and this is the number of executions. It’s actually going to do this number of executions ending up with this number of rows. Okay, so this is the problem. This is why it’s taking so long. Alright, so again, don’t trust the cost. Look for differences between estimated and actual number of rows. And when you find a difference, like in this case you’re halfway to the solution because now you know where the problem is. Again, it’s usually, not always. There might be some harder problem or something else maybe.

From my experience, in most cases, this would be the problem. You need to find where there is a gap and then to understand why the optimizer was wrong with the estimation. Okay, there are all kinds of reasons. We talked about parameter sniffing before. Here is another reason. So again, I’m sure a lot of people know about this. We are using table variables. With table variables, the optimizer doesn’t have statistics to rely on, so it will always estimate one row, a single row, even if there are 2000 rows, like in this case, or two million rows, or whatever; one row, because it’s actually blind. It has no clue about the number of rows in the table.

So this is the problem in this case. What I’m going to do is very simple. I’m just going to replace all the table variables in the stored procedure with temporary tables. We have two – that’s the only thing I’m changing, nothing else. Let’s run it first; it’s quite long… And then let’s do it again. So now, it should be faster.

Alright, that’s it. Instead of 41 seconds, just seven seconds. Okay, much, much better. And now, I can tell you, of course, we need to look at what’s going on inside, I can tell you that now there is no difference in most cases – not a big difference at least – between the estimated and actual number of rows. So if there is no such difference, it means that now the cost values are more or less accurate. So now, really, the first one is taking 57%, but now it runs in a different way. If we look now for example – if we look now in the fourth statement, now it’s 2%, but it’s really around 2% because now the optimizer knows the number of rows in here, so it doesn’t choose nested loops, it chooses a hash match algorithm, which is much better. In this case, we just scan each one of those tables just once.

And if we look here again – so forget about the create table – form here to here, this is the fifth statement in the second execution, so now you can really see that the first one takes a lot of time. Okay, so this is like six seconds out of seven. So the number s now are more accurate than what we had before. So don’t trust the cost and look for those differences. Of course, other reasons why the optimizer might be wrong, it can be statistics are not updated, or maybe statistics are not accurate enough because we’re using a sampling rate that is too low. You might want to try full scan for updating statistics or maybe even we have statistics that are with full scan and they’re updated, but still, they’re not enough for SQL Server to estimate correctly because we have all kinds of skills in the distribution of the data. This can also happen and there are all kinds of other reasons, of course. Maybe you’re using local variables, for example, or all kinds of expressions within your queries like IS NULL. If you use IS NULL in optimization time, it’s an unknown value, just like using local variables. So without all of these things, SQL Server will have a hard time estimating the correct number of rows.

So, the next thing, this is something that I encountered not so long ago. I helped a client with some performance problem and this is what I got; this query here. So it’s selecting from invitations, inner join member sessions, and then we have two predicates – two filters in the where clause. One of them is on the receiving member ID. Another one is on the status. Now, let’s run it first and then analyze it. Let’s try to run it with actual execution plan.

Now, in this demo, it runs very fast, but in my client’s environment, it was a much, much bigger table and it used this query in several places, so the impact was very dramatic. What we can see here – I want to look at this specific index seek. Let’s concentrate on this one. So we have an index seek, but we have a seek predicate and also a predicate. So first of all, let’s talk about the difference. The seek predicate is how SQL Server uses the index to seek through the index to filter for the specific rows using the index keys. Okay, so in this case, we have an index – based on the name, I can tell you we have an index – on status ID and then receiving member ID. So this is a composite index, composed of two index keys, first status and then receiving member ID. And we have an index seek operation – it’s not so simple to read it but it’s actually filtering by status and receiving member ID, because those are the two index keys. And it’s actually a range scan, so you’re doing a seek, but then it finds the first row in the leaf level of the index, then it’s doing the range scan, a partial scan, from this value, one, two, three, four and status one ending in this value, status three and one, two, three, four. So this is actually how it translates our predicate. So it’s a range scan between one and three, which is the first index key, and one, two, three, four, to one, two, three, four, which is the second one.

So if we go back, what I want to show you is that the number of rows read, this one, is around three million. And we only need 46 seconds; not very efficient. So we have to scan a lot of rows, read a lot of rows. We have a great index, right. We have a composite index on the exact two index keys that we use in the query, so we expect it to be very efficient because it only should return 46 rows and it uses both index keys in the index, but still, we have to scan three million rows, so it’s not very efficient.

The question is why, why and how we can improve it. The reason it happens is because of this range, because the status is the first column in the index and status is a very non-selective column, meaning there are only three or four different values for the status, and out of five million, there are one million or even more, one and a half million rows, for each value. So it’s very non-selective. While receiving member ID is a very selective column – there are many, many unique values in this column and for each unique value, there are maybe ten rows, something like that. So because the status ID is the first one in the index and we have between operators, SQL Server translates it during the range scan and it actually has to scan all the rows in the index between status ID one and three.

And then, applying the second filter, which is part of the index seek, but it’s in the second place, it doesn’t help as much. So the first thing we can do – of course, the best thing to do is to reverse the order of the columns. So this is very important to show you that you actually need to choose very carefully the order of the keys, of the columns, in a composite index. If you can, there are all kinds of considerations, but if you can, choose the more selective columns to be first. So I would create an index on member ID and then on the status. This will change things. But maybe, sometimes I can’t, like in this case.

In this case, I can’t do it because it’s a client, it’s a production environment, it’s not something you can do just like that, reverse the index maybe. They need the index for other reasons and they don’t want to create a new one. There’s all kinds of considerations against, so this is what we have. One thing we can do, instead of between, we can do this, which is actually the same thing, right. What’s the difference between one and three? It’s an integer value, the optimizer knows it’s one, two, three; it’s the same thing. But if we do it, then now, boom, this is what we want.

This is what we expected. I have a composite index on the exact things I filter on. I get only the rows I need from the index – 46 rows, that’s what I’m going to read. So it’s much, much, much more efficient now. And now, you can see here, again, it’s hard to read, but we have actually three different seek operations. So this is one, then we have two, then we have three.

So we have actually three index seek operations, one after the other, and each one is filtering, not a range scan, but the specific equality predicate. So status equals one and member equals one, two, three, four. And in the next index seek operation, It’s two, one, two, three, four, and so on and so on. So now, we are doing specific seek operations. Each time, it’s reading only the ten or something like that rows we need. Altogether it’s 46 and we don’t have to scan the entire range of statuses between one and three, which is actually almost the entire table. So the optimizer is not – in this case, it’s not smart enough to understand that both queries are the same actually and to treat between in the same way. So we need to do it for the optimizer. So this is where you might want to prefer in over between.

Usually, people say that between is preferred over the in, and in most cases, it doesn’t really matter. It’s not such a big difference, but in such a case where you have an index that is not optimal for this query because of the order of the keys, it’s a big difference and you should prefer in, in this case. Of course, if I create – just to show you, I create a non-clustered index on the receiving member ID and then status, on the opposite order, and then we’re going to run it again, now I’m using between again.

But now, SQL Server can use the new index, first receive member ID, and then status ID, and now we can see here that because we first do – it’s a single index seek, but we first do an equality predicate on one, two, three, four on the member ID. And only then we do a scan between one and three, and this is much, much more efficient and we get to the same result of this; actually reading only 46 rows. So if you can create the right index, of course, this is better, but if you can’t, at least in this specific case, prefer in over between.

Brent Ozar: Steven asks, “How about if you used or? Like or equals one or equals two or equals three…”

Guy Glantser: Alright, great question. So or, in this case, is just like in. so if you do or, the optimizer translates this to in one, two, three. It will actually do three seek operations, just like before. So it’s the same thing. Great question, thanks. Alright, I think we’re done with this. Any more questions?

Brent Ozar: Nope, you’re good.

Guy Glantser: Excellent. So let’s talk about implicit conversions. So, in this case, we have members, we have payments and I want to look at all the payments made by members on the website and I want only members who registered in a specific date; in this case, January first 2012. I don’t know why; whatever. And only payments below $10, let’s say. I want to order by member ID, payment date time – let’s run this… So we get nothing. We don’t care about the results; we care about the plan. We don’t have a plan, so we do it again. Now we have a plan.

So, we have, in this case, a very simple plan, not very optimized. We have a scan on the members table, we have a scan on the payments table, we do a hash match join between them, so we actually scanned everything. If we look at the predicate, we scanned the entire table. In this case, the table is 100,000 rows and we have a predicate here to apply all the filters; in this case on the registration date and time. And we have another scan here below on the payments table, and again, we scan the entire table, we apply a predicate, in this case on the amount, but we read everything, essentially.

If you look here, then we read 100,000 rows, but we actually return only ten. So the number of members who registered on January first 2012 is ten; ten members. And if there are ten members, then if I would have an index on member ID – because this is how I join between them – if I had an index on member ID on the payments table, then the optimizer would be able to use nested loops and then use this index seek only ten times, which is very efficient. But because there is no such index, it has to do a scan. Now, doing this scan ten times with a nested loops is not such a good idea, so it’s a hash match.

So I’m going to create this index on member id on the payments table, and then let’s try again the same query – so we see the same plan, it’s not using the index. I don’t know why. We have ten rows, a very efficient index seek here, I would expect to see nested loops and then an index seek, okay. So one thing we can do in those cases – and this can be very powerful when you try to debug things and try to learn and troubleshoot, is to use an index hint. Now, don’t do this in production, but in your test environment, if you try to troubleshoot, to learn some things, to see what’s going on, this is a great way to learn and to understand why the optimizer didn’t choose to use this index.

Okay, so let’s instruct the optimizer to use this index. So we get worse performance than before. It takes some time. So there is a reason why the optimizer couldn’t use it. So now, we don’t have an index seek yet, we have an index scan. So we forced the optimizer – the optimizer said, I don’t have a choice. I cannot do an index seek for some reason; we don’t know, still, why, but I can do a scan. And then we need to scan the entire table and we need to do a key lookup to get the other rows – so this is very inefficient because we get two million rows from here, the entire table, we have two million key lookups to get the extra columns; very, very inefficient. So yes, in this case, of course, don’t use the index.

But we want an index seek. So we can also try to add forced seek. So we can force the optimizer to use a seek on this specific index, and then we get an error, “Query processor could not produce the query plan because of hints defined.” So in this case, the optimizer says, I’m done, I can’t do this, this is too much. I can use this index if you really want me to, but I cannot use it with an index seek. Now, it doesn’t tell us why we can’t, but this is where we need to start analyzing the query better. Let’s go back without all the hints and try to understand for ourselves what’s going on, why the optimizer cannot use index seek on member ID in the payments table.

So, we have a hint, we have this warning here. You should look at warnings. If you see warnings, please read them. So in this case, the warning is type conversion – convert implicit; we take the member ID from the payments table and SQL Server has to implicitly convert it to integer. And the warning is, because of this implicit conversion, this may affect two things; one of them is cardinality estimates. So the optimizer says, I might be wrong estimating the number of rows because of this implicit conversion because I cannot use statistics efficiently. I need to implicitly convert each value to look at the statistics. It doesn’t really help. And also, this may affect seek plan, which is exactly our problem. I might not be able to use an index seek operator in my query plan because of this implicit conversion.

Now, unfortunately again, it doesn’t tell us where in the query we have this operator and we can either look for it manually or we can look at the XML and do a search on this ugly text here, but I can tell you where it is. So if we look here, we do a scan again on the payments table and then we have a compute scalar, so this might be a good place to look. The compute scalar, if you just look at the hint, it will tell you nothing. You need to look at the properties. And then there is defined values, so the defined values, we can see here – yes, we took the member ID and we increase conversion to integer. So this is where we do it. So this means that for every row coming from this payments table, we need to implicitly convert the member ID to integer so that we can do this join, because of course, this is integer, but what is this?

So of course, we go to the table, we look at payments, and unfortunately, it’s not an integer. So of course, this is a design problem because if the member ID is an integer in the member’s table, it should be an integer here if you want to join between the tables. It doesn’t make sense to make it something else. But I see it on all kinds of occasions, so it’s a design problem in this case. And again, the right way to deal with it, the best way, is of course, not to do this mistake in the first place, or if we can afford it, let’s correct it now. So let’s change this data type to int. of course, this can be a very heavy operation in a production environment if the table is very large. There are all kinds of ways to do it in chunks or using partitions or things like that, but this will be the best thing to do. If we can do it, we have to live with this mistake – using NVARCHAR 20 – then at least what we can do, we can change the order of things because what SQL Server does – think about it, we have ten members coming from here with integer values and then we have lots of rows coming from here. It’s 20,000 rows in this case.

And SQL Server implicitly converts each one of those 20,000 rows in order to get this. So if SQL Server would convert these ten rows into NVARCHAR, into strings, then we could actually use the nested loops and do an index seek only ten times, just like would do before. This would be much better. But the problem is that SQL Server does not have a choice in this case. When SQL Server has to covert between different types, there is a specific order. There is a table of data types, you can search through it in MSDN and SQL Server must always convert form the lower data type to the higher data type, okay, there is a hierarchy of data types in SQL Server. So in this case, it will always covert form an NVARCHAR to integer and not the other way around. And this is not good for us, but we are smarter; we know that we can do better.

So we can actually instruct SQL Server to do it the right way by doing this – so instead of letting SQL Server do it implicitly, we can explicitly convert the member ID, in this case from the members table, to NVARCHAR 20. So by doing this, let’s see what happens now. So now we get the same scan as before in the members table; we get the ten rows. But look at this; now we can finally get a nested loops and a nice index seek on our new index we created, which is NVARCHAR. Of course, we added another key lookup, but it’s just for the ten rows. Altogether, after applying the filters, we have just zero rows, so it’s like nothing. So it’s much, much better. We still get the warning, but in this case, the warning is not about the implicit conversion, it’s about a conversion, because we still have a conversion, of course, but this time we control the conversion; we control the order of it, it’s explicit. So it’s much better in this case.

Just another example, a very simple one, for the same payments table. I do something like this. Again, this is NVARCHAR, this is integer. I don’t tell SQL Server it’s integer, but it interprets – actually, it interprets it as smallint in this case, based on the value. It can be either tinyint, smallint or int or bigint. So this is smallint and again, now we have two incompatible data types, NVARCHAR and smallints. SQL Server has to make a decision. It does not have a choice. It always has to convert this one to the integer, so think about it. It’s so stupid. Instead of just converting this to string, SQL Server has to go through the entire table, do a table scan and convert each value to integer just to see if it’s the same value as this.

So the result is a big clustered index scan and then a convert implicit of the column itself to see if it equals five, four, three, two, one. Very, very inefficient, okay. So by just doing this, okay, now I’m telling SQL Server, this is actually NVARCHAR, so now SQL Server doesn’t really have to do any task because it’s the same data type and why are [unintelligible] in the index now? Okay, so now the reason is that for this specific value, we have many, many rows coming out. And actually, this is what I wanted to show you before – I forgot to show you. When we do this without the proper conversion, then not only we can’t use an index, we also can’t use statistics.

So this is another problem. SQL Server thinks that 21 rows, because this is the average in the table, while in fact, for this specific value, there are many, many rows. So this is also statistics and the statistics might affect other decisions made by the optimizer and get given a worse plan. So it’s not just the index, also the statistics. So the thing to do here, of course, is to – through the design process – be compatible about the data types, use compatible data types and also, when you write your code, always make sure that you compare between compatible types in your predicates, when you define parameter types or local variables. Wherever you use data types, use the same types when you need to compare between them, and if you have a problem like this one and a redesign is out of the question, then at least use explicit conversion to make sure you do it in the right order so this would be better. Any questions?

Brent Ozar: No. Wendy says she really likes that trick, so very cool.

Guy Glantser: Which one, sorry, which trick?

Brent Ozar: The one you just did – not now but the one before of casting one side of the join with the data type you wanted.

Guy Glantser: Okay, right. So I just want to summarize. So I lied before about no more slides; I have some summary slides just to summarize the takeaways from this session. And again, there are so many things to talk about, but within this one hour, I tried to cover some of the more common things that I see. So again, one of them – remember about the set options, the session settings. Make sure that you have the same session settings between different sessions, or at least be aware of the differences so that you know what you’re looking at. Whenever you have different set options, you get different plans. They might end up the same plans or different plans, but make sure that you look at the right one. If you can, use ARITHABORT on or true in the application, in your .NET applications. If you can’t, at least when you troubleshoot something in management studio, change ARITHABORT to false. Okay, remember that the cost is always estimated, even when you look at the actual plan, the numbers are always estimated numbers. And again, from my experience, usually when there are performance problems, you’ll see differences between the estimated and actual number of rows; look for those differences. They will tell you where the problem is. Don’t trust the cost values; they can be very misleading.

So yes, I talked about it, I think this is the most important tip, for me at least, when you do query tuning. When you need to analyze plans, the first thing I do is to look for those differences. If there is a performance problem, most likely I will find someplace where there is a big difference, then I just need to find out why. Again, that statistics, not updated statistics, parameter sniffing, table variables and so on and so on. Those are the reasons why you might have different things. Now, nested loops – I forgot to show this I think. I just told you that you should look for the differences between the estimated and actual number of rows. There is a problem with nested loops. Let me try to find – where was it?

I just told you to look at the differences, so if we look here, the estimated number of rows is one; the actual number of rows is 46. For the sake of argument, let’s say that this is a big difference, but what I want to tell you here is this is not exactly a big difference. It’s not really a big difference. It’s something that Microsoft did by design. When you look at the lower end of the nested loops operator – so just for the nested loops operator, we have the upper table and the lower table. When we look at the lower table, the lower path of the nested loops – so just for this path, the estimated number of rows corresponds to the estimation of the optimizer pair iteration; what the optimizer estimates will return in each iteration, while the actual number of rows is the total number of rows for all the iterations together. So it’s very confusing.

So because we have the number of executions at 46, then actually, what you need to do is multiply the estimated number of rows by the estimated number of executions – this number, around 50. So this is 50, and compare this to 46, which is the actual total. This is the estimated total and this is the actual total. So this is very misleading. And everywhere else, you just need to compare between the estimated and the actual, but when you look again at the lower path of the nested loops operator, then you need to multiply the estimated number of rows by the estimated number of executions. I hope this was clear because this is very confusing.

Alright, never use table variables. I love to say that; never use table variables. Well, I’m not exactly kidding, but avoid table variables as much as you can. I think this would be a better way to put it. There are some rare cases where you might want to or need to use table variables maybe or where a table variable will perform better, but those are really rare cases. There are so many cases where a temporary table will perform much better than table variables. I can tell you from my experience, I work with clients for many years, I get lots of calls from clients with performance problems, I look at the client environment, I look at the stored procedure, for example. And even before trying to understand the business logic and what we can do about it, the first thing I see is table variables. So I say okay, hold on a minute, I just replace all the table variables with temporary tables and boom, everything is so fast compared to before by just doing this.

So many times it happens and before, not even once where I did the opposite, by replacing a temporary table with a table variable I get a better performance. So I got actually two feedbacks from people who told me they had a case where a table variable performed better and we analyzed it and we realized that yes, indeed this is correct, but this is, so rare. The reason is, table variables don’t have statistics and temporary tables do have statistics. There are other differences between them, but this is the most important one. For table variables, SQL Server never calculates statistics. This means the optimizer, like I said before, it is blind actually. It always estimates a single row. And if there are many rows, it’s now going to do a nested loops and it’s going to do something on another table if there is a join, like a big table scan for example, many, many times instead of just one time, okay. And this can be very, very catastrophic for your performance.

So this is very, very dangerous. Of course, using temporary tables, we have statistics, we don’t have this problem because the optimizer knows how many rows are in the table, but of course, this also means there is some overhead in the transaction log. There are recompilations and update statistics. There are some other operations associated with it, like we need to pay, but the payment is so much lower. The risk is so much lower compared to the risk of using table variables. From my experience, from others as well, so unless you really know what you’re doing and you have a really edge case and you know that you need it, prefer to use temporary table. You would be surprised how much performance gain you can get by just doing this replacement.

Okay, I didn’t talk about this, but it’s something that you need to know how to apply. I used to look at plans only from right to left. I think that most people still do it. When we analyze a plan, we read it form right to left – from the right most operator – because first we do a clustered index scan for example, then we do a join, then we do maybe a sort, then maybe top ten, whatever – and then we get the result. So this is the data flow. But you should also read execution plans from left to right which will give you the execution flow, because actually, at run time, we start from the left most operator and then each operator calls a method to the operator to the right saying get me the next row. I didn’t show you this because I have some demo, but I don’t have enough time for this. Sometimes, you will have a better understanding of what’s going on in the plan if you read the plan not just from right to left but also from left to right and then you realize why it did what it did.

So one very simple example is if you have a select top ten order by something and SQL Server uses a clustered index scan which is ordered by the order by key, then if you just read from right to left, people usually look at clustered index scan and say, okay, we do a scan; we scan the entire table. But in reality, if you look at the actual number of rows or the number of rows read, you see that SQL Server actually read only ten rows and that’s it. If you read it from left to right, then you get it because you get the first row, then the second row and the third row and so on. And after ten rows, you can actually stop because you get the top ten that you need.

So this is, again, important in some situations. Tooltips – I see a lot of people, they look at execution plans, they know they can hover with the mouse and they see the Tooltips. They know what to look for in Tooltips, but this is where they stop. Okay, so don’t just look at the Tooltips; there are many, many things. You can look at the properties for each operator, right click and click properties. There are many things that you can see in the properties that will not show up in the Tooltips and they are important. So remember that you have the properties also.

In some rare cases, but sometimes, you might want to look at the XML. It’s right-click, show plan XML and there are some things that will show up in the XML, but not in the properties or anywhere in the graphical execution plan. Again, this is rare, but sometimes remember that you can look for things in the XML itself. There are many, many trace flags – again, I didn’t show you this session – there are many trace flags you can use to extract more information about what the optimizer is doing. For example, which statistics were evaluated or chosen during the optimization of this execution plan, there is a trace flag for some debugging information that you can extract and troubleshoot and analyze. There are many, many trace flags you can use and you can also use debugger, but only if you’re Paul White.

So if you’re not familiar with Paul White, I really recommend you to follow Paul White. Paul White is my guru about anything related to execution plans and this guy just sits with the debugger and SQL Server and it he knows exactly what’s going on behind the scenes for every step in the optimization process, but don’t do this; it’s too much.

So again, this is something that I didn’t to show you because I have a huge demo for maybe like five days. I just had one hour so I removed some of them, but I left the tips in the summary because those are important things I can talk about. Scalar functions. Scalar functions are also bad for performance because they are a batch. Each scalar function is a batch and SQL Server cannot optimize it within the query so it has to run this function for every row in the query. So it’s essentially like a cursor, okay. So instead of a set-based operation, now for each row in your query, you have to run the function, call it to the stack, execute the code, return the result and so on for each one. So it’s really, really poor for performance and either just avoid using scalar functions; embed the code inside the query where you can. In some cases, you might be able to convert the scalar function to an inline table function, which will perform much better and will still retain the same business logic and then you can reuse the function and also get good performance, okay. So this is about scalar function, then I think that’s it. Any questions?

Brent Ozar: Awesome. No, there’s a bunch of people talking about how in Slack that, I guess, scalar functions are so CPU intensive because they’re mining query bucks. They’re just like a crypto currency, continuously mining query bucks. [Sasson] says, “Great presentation. [Breck] says great presentation. Nice job. J.D. asks a question. He said, “What do you do if you absolutely must have a scalar function?”

Guy Glantser: If I must have a scalar function? So first, it’s a good question. Why or if I must really have a scalar function, because form my experience, in many cases, I don’t really need to have a scalar function. There is something that is really – it’s quite annoying because we need to choose between good programming practices, because using functions is really a good thing – you can reuse your code, you don’t need to maintain your code in different places, and performance, because they don’t work together. So you need to make a choice; good maintenance or good performance. In most cases – sometimes it doesn’t conflict. But if you really must have a function then maybe, like I said, you can convert it into an inline function. It will not always work but you can try and then you’ll get great performance. Maybe, if it’s a batch doing some business logic, maybe you don’t really need it – maybe you can convert it into a stored procedure. Again, it depends on how you run your code, your workload. We have to analyze it a bit more deeply. There is something you can do starting with 2016, I think. In 2016, you can use a different function – I forgot the name of it – with memory-optimized table. What’s the name…

Brent Ozar: In memory OLTP or…

Guy Glantser: Yes, but the stored procedure…

Brent Ozar: Natively compiled.

Guy Glantser: Yes, thank you. So you can use a scalar function which is natively compiled even if you’re not accessing memory-optimized tables. So if you’re just doing all kinds of calculations, you can natively compile this function and it will run faster by just doing this. Maybe it’s not fast enough, but it might be really a big improvement, so you might want to try this. It’s not the goal of natively compiled functions; they’re used for in-memory OLTP, for memory-optimized tables, but you can take leverage of that for just better performance for your functions.

Brent Ozar: Perfect, well thank you very much for teaching today, Guy. Always a pleasure to listen to you speak. I always look forward to hearing the new SQL Server Radio episodes as well. Thanks, everybody.

The following two tabs change content below.
Guy Glantser, Data Platform MVP, is the leader of the Israeli PASS chapter and also the CEO and founder of Madeira Data Solutions. His career has been focused on the Microsoft Data Platform for the past 20 years, performing various database roles as either an on-site DBA, an external consultant or a speaker. Guy is involved in many activities in the Microsoft Data Platform community. He occasionally speaks at community events, such as PASS Summit, SQLBits, SQL Saturdays and user groups around the world. He also co-hosts the SQL Server Radio podcast.
Previous Post
How to Monitor Everything
Next Post
Announcing the GroupBy September Lineup

6 Comments. Leave new

hi Guy, where can I download the scripts you used in the presentation? thanks


Great presentation! Are the demo scripts going to be made available for download?


Hi everyone, I have just added a link under “Additional Resources” to download the session materials.


Thanks for all the great demos Guy! Very enjoyable and informative session.


Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.