Enjoy the Podcast?
This session is intended mainly for SQL Server developers who write code using parameters (or not using parameters). If you are a developer and you encounter performance issues related to parameters, then this session is for you. Attendees should be familiar with indexing in SQL Server and tell the difference between an index seek and an index scan.
Parameters are a fundamental part of T-SQL programming, whether they are used in stored procedures, in dynamic statements or in ad-hoc queries. Although widely used, most people aren’t aware of the crucial influence they have on query performance. In fact, wrong use of parameters is one of the common reasons for poor application performance.
Does your query sometimes run fast and sometimes slow – even when nothing’s changed? Did it happen to you that a stored procedure, which had always been running for less than a second, suddenly started to run for more than 5 seconds consistently – even when nothing had changed?
In this session we will learn about plan caching and how the query optimizer handles parameters. We will talk about the pros and cons of parameter sniffing (don’t worry if you don’t know what that means) as well as about simple vs. forced parameterization. But most important – we will learn how to identify performance problems caused by poor parameter handling, and we will also learn many techniques for solving these problems and boosting your application performance.
Why I Want to Present This Session:
I work a lot with customers on performance issues, and from my experience, in many cases, performance issues are caused by mishandling of parameters due to lack of knowledge for developers. I have presented this session many times in the past, and I always hear people in the audience say “Wow, I had no idea” during the session. This drives me to continue to present this session, and as long as I hear that “Wow”, I won’t stop…
You can download all the session materials from here.
I also wrote a complete series of blog posts about SQL Server Parameterization, which I recommend you to read:
- Plan Caching
- Seven Ways to Execute Your Query
- Parameter Sniffing
- Handling Non-Uniform Data Distribution
- Two Common Mistakes
- Simple vs. Forced Parameterization
Introductions and Agenda
Brent Ozar: So hi everybody, welcome to GroupBy. Today, the first session kicking things off is going to be Guy Glantser talking about how to use parameters like a pro, which even if you think you know how parameterization works in T-SQL, you are going to learn things in here and this is a wonderful session that you can pass on to your developers and get more people involved. So Guy, take it away.
Guy Glantser: Thanks, Brent. I have to say first, I really love this concept of the GroupBy conference – I mean, I’ve been speaking in many events and conferences around the world but this time – it’s the first time that the people actually chose to watch the session, to watch me. It’s not the event organizer or some company you know, it’s the people themselves. It’s a great thing, I’m really excited about this. So thank you for voting and thank you for attending and let’s kick off.
So my name is Guy Glanster, I’m at Data Platform MVP, I’m based in Israel, I work in Madeira Data Solutions. We provide data solutions and services, many services, professional services to our customers around the world. We also run SQL Server Radio. It’s a podcast, me and Matan Yungman, my colleague, we talk about SQL Server stuff, our experience with projects, events in the community, interesting blogs we’ve read, everything around SQL Server – SQLServerradio.com. Oh, another thing, before I go on, can you see this?
Brent Ozar: There you go, there you go, yes.
Guy Glantser: It’s the bunny sticker. I told my son, my four-year-old son that I’m going to present a session before many people, so he gave me this sticker, the bunny sticker and he said put this on so people can see it, they’re going to love it.
Brent Ozar: Well, it’s bunnies and stickers. That’s two things that you just can’t lose with.
Let’s Start with a Story
Guy Glantser: Alright, so I would like to begin with a story and the story goes like that. I got a call from a customer in the middle of the night, it’s always in the middle of the night, right? And he said, we have this stored procedure, this stored procedure used to run for like 50 milliseconds on average each time and starting one hour ago, it started to run for like four minutes – sorry, four seconds each time. Steady four seconds, it used to be 50 milliseconds, now it’s four seconds each time. Something happened one hour ago, we don’t know what and we have to solve it very quickly. So I started asking, okay, what has changed? Did you upload a new version of the application? No. Did you upload a lot of data? No. Did you make any changes to the stored procedure indexes? Did you any indexes? Removed any indexes? No, no, no, no, nothing. He swore they didn’t change anything and something happened, of course, so I’m going to leave it this way, we’re going to go through the session. By the end of the session, I’m going to tell you what happened and of course, it has to do with parameters, parameterization.
So I’m going to talk about what parameterization in SQL Server means, why is it so important? Now, some people might say what’s the big deal? It’s just parameters, I have a stored procedure, we have parameters, what’s the big deal? What do you have to say about parameters for an hour? Well, I have a lot to say, it’s a very important topic. Parameters in SQL Server from my experience, most performance issues that I encounter with customers, they have something to do with parameters one way or another. So it’s very, very important to understand how it works and what you can do about it.
Working with parameters might be great in some cases, it might be a very bad thing in some other cases, and then we need to do – to understand why and what we can do about it and how can we deal with it in order to make the performance better. So when I talk about parameters or parameterization, I actually talk about – in this case, a literal or constant but it can be something else, it can be the parameter. I’m talking about the predicate in the workloads, when you can put anything here, where column name equals or maybe not equal or like or whatever, something.
Today’s Demo Data: Customers by Country
This something can be a parameter, it can be of course, a literal and for the sake of this session I’m going to use a case study called the customers by country, a very simple case study.
This is how it looks like. I have a customers table, I have a country column with two characters for the country and I’m going to select all the customers from a specific country. That’s it, very simple query. I’m going to use this one throughout this session to demonstrate parameters. Let’s go to management studio, here it is. Okay, so that’s the query. I get all the customers, in this case from Israel. IL stands for Israel, we have 74 customers in Israel in this case and I’m going to show you later on, of course, the execution plan and what happens when we change things, but this is it.
Intro to Plan Caching
Alright, before we move on, I want to talk about plan caching. It’s a very important concept in SQL Server. SQL Server has an area in memory called the plan cache. The plan cache is used to store execution plans. Every time you run a stored procedure or an ad hoc query, it has to get compiled and execution plan is generated for that query and this execution plan is going to be stored in the blank cache in order to be used again and again. Now, this is a great thing because if we have to compile and generate an execution plan each time in every execution, it’s a lot of resources. A lot of CPU resources mainly, but not only CPU and it’s going to bring your server down if it has to run a lot of queries each second. So we want to reuse plans, this is a great thing. We have the plan cache and what happens when you send a query – okay, so I write the query, I send, I hit execute or maybe it comes from an application, I send the query to SQL Server.
SQL Server first needs to check in the plan cache whether an execution plan already exists for this plan. If it already exists, it still has to verify that the plan in cache is still valid. Now, a plan might become invalid because something has changed since the last time it was complied and generated. Maybe a new index was added to the table, maybe the scheme of the table itself has changed, a new column was added, maybe an index was removed, maybe the statistics that the execution plan relied on are out of date, they’re not updated, okay, and now there’s a new statistics that we need to take into account. So in all of those cases, the plan is invalid and this time when the user tries to execute the same query, which already has a plan in cache, the plan has to be recompiled and generated again. If the plan exists in cache, and it’s valid, SQL Server can go ahead and just execute it, bypassing all the process of optimizing and generating a new plan.
Using a View to See What Plans are in Cache
Let’s go to the demo about plan caching. So I’m going to create a view here and this view takes some data from sys.dm_exec_cached_plans, which returns all the execution plans currently in the plan cache, and I’m going to cross apply to sys.dm_exec_query_plan, this function takes the plan handle and it will give me the plan – the execution plan itself, the XML presentation of the query plan and also sys.dm_exec_SQL_text, which will give me, of course, the text of the statement, the batch itself behind the plan, and I also join it to sys.dm_exec_query_stats, just to get the last execution time of each query. It’s going to be convenient to show you this way.
I’m going to quit the view because I’m going to use it many, many times in this session and let’s just see what’s going on on my laptop in the plan cache.
Okay, so I can see all kinds of statements, each statement has its own execution plan. If I click that I can see the execution plan itself. Of course, this is the estimated plan, not the actual one. It’s what’s stored in the plan cache because they haven’t executed it yet, and I have the execution count of each plan in the cache. In order to reuse plans efficiently, I want to see high numbers here. Of course, this is my laptop, it’s not the production systems so you see nothing, but this should be high numbers. If I don’t see high numbers, this might indicate a problem. I might have a lot of ad hoc queries not using parameters with different values. I’m going to talk about that in a second, it might be a problem. I can see the object type of execution plan, whether it’s an ad hoc query, a stored procedure or if referred. I’m going to talk about what referred means, not a good word but it means the size of this execution plan and the last execution time of the query behind this execution plan.
Now, we can empty the plan cache using DBCC FREEPROCCACHE and if we look at the plan cache again, it’s now empty. Now, of course, you have to be careful, don’t run this in production. Seriously, don’t do that.
Let’s go back to our case study query. All the Israeli customers, and if I look now at the plan cache, I’m using the view I had before and I’m filtering all the text of the query just to get my own query, not other things in the plan cache. So I see my query, I see the plan, which is index six on the customers on the country column, I have an index, execution count one, it’s an ad hoc query because it’s not a stored procedure. I’m not using any parameters, okay. If I’m going to run this again the second time, the same query, I’m going to see what’s in the plan cache again. We have a reuse, we don’t have a new plan, execution count two, SQL Server found the plan in the cache, it was valid, great, it could execute it again for the second time. Excellent. I’m going to do it the third time and I want to see execution count three, oops, no. We have a new plan, it’s the exact same plan, but it’s a new one. SQL Server had to go through the process of compiling the plan again.
Again, this is an expensive operation and the reason I have another one and not three here is because I cheated. I added the space here because I wanted to demonstrate that the way SQL Server looks for the plan in cache is by calculating the hash value over the text of the query and looking for that hash value in the plan. Now, every single space or top corrector or anything that is changed in the text will produce a different hash value and SQL Server will not find the plan, it will have to generate a new one. Now, of course, in reality, our applications, we don’t add spaces but what we do is this: I now want to look for the French customers. So the text of this query is also different, which means I’m going to get another plan, which is still the same plan.
So I have now three execution plans which are identical but they consume space in the plan cache. I had to invest some resources in order to generate them and I’m not reusing the same plan as I would like to do.
Brent Ozar: There’s a great question in Slack, Nicky says he had no idea – he said, what else, does like returns and comments matter too? How much formatting…
Guy Glantser: Yes, everything. If you add a comment, for example, if I run this query, the same query as before, but with a comment, so now I have another one with the comment. Although it’s the same query, the same plan, but I have another one, okay? Every single change in the text produces a different hash value. Are there any more questions at this point?
Brent Ozar: No, you’re good.
Guy Glantser: Okay, great. So Paul Simon says there are 50 ways to leave your lover, I say there are seven ways to execute your query. Now, at this point I usually play the song but I’m not going to do that now. I hope you know the song, it’s a great song. I love Paul Simon and I want to show you seven ways to execute your query. Here it is. I’m going to empty the plan cache…
Brent Ozar: I think you have fans of the podcast because they’re asking you to sing. Warren’s asking you to sing. So for those of you who haven’t heard, Guy does a fantastic job on singing intros on the podcast.
Option 1: Non-Parameterized T-SQL
Guy Glantser: Yes. So the first way I can execute my query, I call it non-parameterized T-SQL query. It’s what we’ve seen already. So I have an ad hoc query without parameters, just literal value, going to look at the plan cache. There it is. If I’m now going to use FR for French customers, as we’ve seen before, I’m going to have a different entry in the plan cache, okay. So that’s not new, but it’s the first option.
Option 2: Dynamic SQL with a Local Variable
The second option is the non-parameterized dynamic SQL query, so what I’m going to do here, I’m going to use dynamic SQL. I have local variable for the country, with IL in this case, I have a local variable for the gray text nvarchar max, and I’m going to put the query text inside my query text variable and I’m going to concatenate the country here.
So the content of this string, of this variable is going to be the query itself with IL here. It’s not the parameter, it’s just the constant IL just like before, and then I’m going to execute it and what happens in the plan cache – this is the third one now. Essentially, the text that was sent to SQL Server and executed is the exact same text as this one, the first query with IL. The reason we have a third one is again, because the wide space is here. You can see it’s not exactly the same way.
If I would build the dynamic text the exact same way as the original text, I will get a reuse of the plan, okay? And again, if I did the same with FR, just change the content of this country variable, FR, since it’s not a parameter, I’m going to get yet another plan. Everything is one here, not good.
Option 3: Parameterized Dynamic SQL
The third option is a parameterized dynamic SQL query. This is the first time we’re going to use parameters in this session. So now we’re going to declare again the variable for the country, for the query text. Another local variable for the parameters definition that I’m going to use in the dynamic batch – and it goes like this: now, I’m not going to concatenate the value of the country IL, I’m going to have to declare actually a parameter inside the dynamic batch. I call this @pCountry, it’s not this one, this is the local variable at country, this is @pCountry, this is a parameter inside the dynamic batch, and now I’m going to define this parameter here in my parameters local variable.
At@pCountry nchar(2) in this case and I’m going to use this sp_executeSQL, a system stored procedure to execute dynamic batch with parameters. So I have the statement in my query text variable, the parameters definition is here and then for each parameter defined inside the dynamic batch, I’m going to bust the value, so in this case, the@pCountry parameter will get my at country local variable value. Now, this is the parameterized batch and it’s going to look different. Okay, so there’s a new entry here.
Notice that this is a prepared object type. A prepared object is actually a parameterized query. Again, not a good choice of word, prepared. It’s a parameterized query and it means it has parameters and we can also see that in the query text we have first, the parameter definition and then comes the step in itself.
If we look at the plan – I didn’t show you before, if we look at the ad hoc plan, and we go here, then you can see that we have the value here in the seek predicate. Okay, so country equals FR. Now, on the other hand, if we go to the parameterized plan and we go here, you can see that the country now equals@pCountry. This is a template actually. The plan does not include the actual value I used in the execution. In this case, it was IL.
It’s a parameter; it’s a template. This means that I can run the exact same batch now with FR and hopefully I will be able to reuse my plan. Great. Two – because it’s the exact same plan, it’s the exact same template. Only the value has changed but it’s the same plan, I can reuse it again and again for any value that I want to use for the country.
So for any country, I’m going to be able to reuse the same plan.
Now, my example is very simple but think about transaction ID, not country. You have a very large transactions table with billions of rows and you have a very simple query, okay, select from transactions or from whatever where transaction ID equals something. If you don’t use parameters, your query looks like where transaction ID equals 17 and then 23 and then one million and then whatever. Potentially you have billions of different queries with different text for different execution plans, it’s going to kill your server performance, assuming you’re going to do that a lot. If you replace that with a parameter, you’re going to have one single execution plan but use it many, many times. Okay, so it’s going to be much better of course.
So this was number three, let’s go to number four.
Options 4 & 5: Client-Side .NET Queries
Non-parameterized client side query, so I want to show you how it happens – how it goes in the application. I have here a very simple .NET application that I wrote. I’m going to put the country here and I have two options. First, non-parameterized query, let’s execute it and let’s look at the plan cache. There it is, it’s an ad hoc again. Let me show you very quickly the code.
So this is my non-parameterized query when I click the button, so select blah, blah, blah, and I concatenate the value of the text box, just like I did with the non-parameterized dynamic query and when I’m going to use the parameterized query in my .NET application, I’m going to use the same concept as with the sys.sp_executeSQL. I have a parameter here @pCountry and I’m going to declare this parameter, I’m going to add it with the data type and the size and the value and there actually this is an input parameter and then I’m going to execute.
Okay, so this is how it looks like in the .NET application. If I go now and change this to FR, still with the non-parameterized query, I’m going to execute this again, I get another one, so the parameterized query, this is IL, this is FR and now, if I switch to the parameterized query option, so first with IL, it’s a prepared now and actually what happens behind the scenes when I do that in .NET application with the parameter declaration, what happens behind the scenes, it’s going to use the exact same method using sys.sp_executeSQL. It’s the exact same thing.
Of course now, if I’m going to change it to FR and execute, I’m going to get a reuse again. So that’s how it looks like when you do it from a .NET application. So this was number five, the client site parameterized query.
Option 6: Stored Procedure
Number six, stored procedure. How obvious. Okay, we have a query procedure with the country, parameter and then I’m using this parameter in the query inside the stored procedure. I’m going to execute with the value from IL first and now, in the plan cache, it’s going to look like this. Now it’s a PROC. I have the code of the stored procedure, the plan as before, so it’s a different type of object, it’s a procedure plan and if I’m going to execute the same procedure with a different value, FR in this case, then, of course, I will get a reuse, okay, execution count two when I use the procedure.
Option 7: Looks-Like-Parameterized T-SQL
Last one, number seven, I call it looks like parameterized T-SQL query, and it goes like this. So I’m using here a local variable with a value IL and I’m using this local variable here.
Now, if you look just at this portion of the batch, without the declare, if you look at this, it looks exactly the same as the statement inside the stored procedure because – this is very confusing, this at sign, we use the same sign for parameters and for local variables but they are completely different things and they behave in a very different way.
If I use this one, first of all I want to show you, for the first time I got it clustered index scan in my execution plan. It’s not the right execution plan, so we have a performance problem to begin with when we use a local variable like that, and I can show you the reason.
The optimizer thought there are going to be five thousand something rows, while we only have 74.
So for this number of rows, the optimizer said well I better use a clustered index can, and of course it was wrong. And if I look at the plan cache now, although it looks like I’m using a parameter here, it’s not a parameter so the plan cache is going to have another ad hoc. The entire batch is going to be here and it’s an ad hoc batch and if I use the same batch with now FR, then again it’s for the parameters so we have a different text, we have a different hash value, we have a different plan, and not a good one.
It’s a clustered index scan and I’m going to show you by the way, why it happened. So it looks like a parameter but when you use local variables it’s really not a parameter and I’m going to talk about why that happens. It happens because of parameter sniffing. Now, this is very fundamental, a very important topic in SQL Server you need to understand.
Parameter sniffing means the first time that I’m going to run a stored procedure or a parameterized query that has parameters, maybe it’s a dynamic sp_SQL batch, we fire parameters, the first time I’m going to execute the query, SQL Server will going to sniff the values of the parameter. The optimizer actually has access to the values of the parameters when it compiles and generates the plan.
So by looking at the values, the optimizer can use statistics and the histogram and it can come up with a pretty good plan for those values, okay, which is great, because then at execution plan which run time, you’re going to get probably a good plan and good performance for the execution of that query. So that’s great. With local variables, you don’t have parameter sniffing.
It’s not local variable sniffing, it’s parameter sniffing. With local variables, you don’t have that so the assignment of the local variables, set local variables equals something, that happens at run time. At compile time, which happens before that, SQL Server, the optimizer has to generate the plan but it has no clue about the value of the local variable. It’s called an unknown value and it has all kinds of rules how to treat unknown values but in most cases, you’re not going to get a good plan. Like in our case, we got it clustered in the scan because the optimizer doesn’t know what’s the value and it has to generate the plan based on some general average of the distribution of values in the column, which is not good.
Is Parameter Sniffing Good or Bad?
So the question is, whether parameter sniffing is good or bad. In some cases, it can be a good thing, in other cases it might be bad. The reason is – think about it, it’s actually like a casino. It’s gambling, because the first time a user is going to run a stored procedure, we’re going to take the value that user chose to run the stored procedure and based on that value, we’re going to generate a plan. We’re going to then put the plan in cache and we’re going to reuse the same plan again and again and again, many, many times. Now, what happens if that user that runs the stored procedure for the very first time uses a value, it’s not an average value? This value might produce a plan using SCAN, while in most cases we actually need a SEEK for most other values. So the next users are going to use different values and they might get a poor plan because we’re reusing the same plan again and again, based on the initial value.
It’s a matter of luck. If we get lucky and the first user uses a good value, okay, so we’re going to get a good plan for most cases, but if not, we’re not. In other cases, if the distribution of values in the table is more or less uniform and for every value, we’re going to get more or less the same plan, then we’re fine. So sometimes it might be good, it might be bad.
Think about it this way, let’s say you got a new job in a big factory, in a big warehouse, and your job, it’s a big production line and this factory produces all kinds of products, all kinds of goods and your job is to wait at the end of the production line when the product comes along, you need to find the best package for the product and put the product inside the package. Very simple, but they tell you there is one important rule. In the beginning of the day, when the first product comes out, you can pick any package you like for that product, but from that point, you have to use the same package for all products that come out. It’s a weird rule but that’s the rule of the house. So you get the first product, it’s your first day at the job, you got this bunch of pencils, you say okay, let’s look for the product and you find this one. It takes us a while to find the product but you’ve got it – well, maybe it doesn’t look like they fit but let’s say they fit because you know there’s this rule, you’re going to have to use this exact same package again and again, you get a bunch of packages, the same type so that the next time, if you get another bunch of pencils, you do it very quickly. Performance is much better, but then, you get this, and you have to put it inside this because that’s the rule, okay? What do you do?
Well, you can do this for example, but it will take a while. The performance sucks. So it’s crazy, but this is how parameter sniffing works in SQL Server. As crazy as it might sound, okay?
Brent Ozar: You got a lot of presenters chiming in in the Slack channel saying this is a fantastic analogy. This is very good.
How Parameter Sniffing Looks in the Plan Cache
Guy Glantser: Thanks. So let’s see how it looks like in SQL Server. Let’s again empty the plan cache. First of all, with DBCC show statistics, we can look at the statistics of the histogram for some statistics, objects or an index or a column in this case, my country index, so we can see here the histogram for the country. Again, this is a very simple example for the sake of this demonstration.
We have all kinds of countries and you can see that I created a distribution that is not equal or uniform. We have all kinds of countries with around, let’s say 70 on average, and then we have China with 30,000 and we also have UK with 29 and the US with around 40,000.
So there is a skewed distribution here on purpose of course, and another thing, we can look here, the old density, this value here, if you multiply this value by the number of rows in the table, in this case, 100k, you get the average number of rows per unique value, and this might look familiar to you. When we use the local variable, we get an estimation of this number, 5555 something. It counts from here because this is the average value in the table and when we don’t have parameters and SQL Server uses an unknown value that can like a local variable, it cannot use the histogram, it can use this one. So if we have an equality operator like we had country equals a local variable, all it can do is to estimate this number and of course in our case it was very far from reality. So it comes from here. Now, let’s use our stored procedure that we created in the previous demo. First of all with IL and I get an index seek which is great, we have 74 rows. In this case, if I’m going to use it now with US, which has 40,000 – we get 40,000 rows, we still get an index seek. Now, in this case, this is not a good plan. We get a very large number of vertical reads because we have many, many key lookup operations, 40,000 key lookups actually.
Seeing the Properties on a Select
And I can show you also if you go to the select operator and you click properties, you have this parameter list. It will show you for all parameters that are involved in this query, what is the parameter name, of course, what is the parameter compile value and what is the parameter run time value. So you can see now the reason why we have a problem here, because the value it used for compilation was IL and currently we’re executing with US and this is the reason why we have an estimation of 74 and an actual number of 40,000. So the reason we see this gap is not because SQL Server tried to estimate how many US customers and it thought there are only 74. It’s because this is what it has in the plan cache, it’s only reusing that plan. That’s it. We’re not even going through this process of generating a new plan again.
Okay, so what can we do about it? If we have such a skewed distribution, what can we do about it?
Solution #1: Solving Skewed Distributions with Recompiles
First thing, solution number one is to execute sys.sp_recompile. Sys.sp_recompile we can give it a third parameter and the name of our stored procedure, it will mark the plan of this stored procedure in the plan cache as invalid. Remember we talked about valid, invalid. This is one option to mark a plan as invalid by just running sp_recompile. So this means it is now invalid. The next time someone is going to execute this stored procedure, the optimizer will search for it in the plan cache, it will find it but it will say that it’s invalid. It’s going to recompile and generate a new plan. So now, if for example, the next user is going to use US, we’re going to have a new compilation and we get it clustered in the scan, which is good for the case of US. But now, we are stuck with the plan for US and if someone wants to get only the Israeli customers, only 74 rows but we still get it clustered in this scan because we’ve reused the plan again.
So using sp_recompile, it’s the casino effect again but you know, it’s like a free roll of the dice. You get another option, another chance. Maybe it will work, maybe not. Not a very good solution, so we can go to solution number two.
Solution #2: Put Recompile on the Stored Procedure
Solution number two is to add with recompile with the stored procedure itself, inside the stored procedure like that. This means that in every execution, we’re going to get a recompile of the entire stored procedure. It actually means that SQL Server will never put the plan in the cache to begin with. There’s no reason to consume space in the plan cache for this plan because we’re not going to reuse the plan by definition.
So now I’m going to execute it first with IL and of course, I’m going to get an index seek and then I’m going to execute it again with US and I’m going to get a scan. So for every execution, I’m going to get the best plan for that specific execution, which is really great. It’s the best thing we can ask for, but of course, there’s a drawback. If I’m going to run this stored procedure a hundred times per second and I’m going to have to generate a new plan every single time, it’s not going to be very good for your CPU, for your server.
In most cases it’s not going to work, so it’s not very good for OLTP workload where you run the same procedures many, many times per second or per minute but for example, if you have a data warehouse and this stored procedure runs a very complicated report that users run maybe once an hour, there’s no reason not to use with recompile. It’s a great thing because executing this stored procedure might take, I don’t know, five seconds, maybe 30 seconds. Adding some more for compilation to get the best plan is a good thing in this case, so it really depends.
Solution #3: Add Query Hints for Recompiles
Now, solution number three, instead of adding with recompile here at the stored procedure level, we can add a query option, a query hint called option recompile, like this. This belongs to the statement, the select statement, it’s part of the statement.
So this means that not the entire stored procedure is going to be recompiled, we have a plan for the stored procedure in the plan cache but its run time, when SQL Server gets to this statement, think about maybe we had like 20 statements in this stored procedure. At the point that it gets to this specific statement, it’s going to stop execution, recompile only this specific statement, generate a new plan for it for only the statement, run the new plan and then continue to the rest of the plan for the entire stored procedure.
So there are two differences between option recompile and with recompile. One, it’s at the statement level, not the stored procedure level, and second, it happens at run time and not compile time. This is a very important difference I’m going to talk about later on.
So, alter procedure and now again, I’m going to execute first with IL and I get index seek and then with US I get index scan. Again, the same idea, it’s a great solution unless you run the stored procedure many, many times and you’re going to have to pay a lot of resources for generating new plans all the time.
Solution #4: Optimize for a Specific Value
Solution number four, another query hint you can use is option optimize for. I can instruct the optimizer to ignore parameter sniffing, disable parameter sniffing and use a value that I will chose for the sake of optimization, for the sake of generating a plan. In this case, what I’m saying is most of my users are going to use – most of the time, they’re going to use US as a parameter so in order to prevent the casino effect, I don’t want to be unlucky and maybe some user will use another value when the stored procedure has to be compiled.
I’m going to tell the optimizer always use US. Now, I know that if someone is going to look for the Israeli customers, they’re going to get a poor plan for that, but I know that it’s going to happen once in a while. So I prefer to disable the casino effect and instruct the optimizer which one to choose each time, so of course the results will be – if I use now IL I get the result of the Israeli customers, 74 rows but the execution plan is based on the value US. Again, we can go here, you can see here that we have – although this is the first execution, the compile value is US because I instructed with the optimize four option and the run-time value is IL.
If I run now, run US at this point, we are reusing the plan from cache, which is already clustered in this scan, which is good for US of course. That’s what I wanted.
Solution #5: Build Your Own Histogram
Solution number five, now, this is the most complex one but it’s also the best one. Potentially, you can get the best results if you do it right, and it goes like this. The idea is to create my own users table and the scheme of the table is just like the scheme of the histogram, the statistics that SQL Server maintains and I’m going to run this statement to insert the result of the histogram of the country, colon, into my user table.
So what I’m doing here is DBCC showstatistics just like before for the country column. I’m adding here with histogram so it’s just I only get a single record set with the histogram and through insert execute, I’m going to insert the results into my table. So now, if I select from my table, I have the histogram. So I can now manipulate it and do something about it.
What I’m going to do about it, in this very simple case again, I’m going to count how many rows are in the table and then I’m going to delete all the rows from the histogram, which have less than one percent of the rows in the table. Those are going to be the uncommon countries and I’m going to leave only the common countries, those with more than one percent of the rows in my table.
So if we look at the table now, we have only China, UK and US. Those are my common countries. Now, again, this is a very simple example. It’s going to be much harder in reality to come up with some algorithms, with some logic here to distinguish between common and uncommon values. It requires a lot of trial and error and tuning but I can tell you from my experience, we’ve done that several times with customers with great success. As soon as you do it right, you can get very good results.
Solution #5, continued: Building Code to Hit Your Histogram
Now, what I’m going to do, I’m going to create a new stored procedure, another one with the same name, the same parameter, the same select statement with _common as a suffix. Another stored procedure identical, only a different name, _uncommon and now I’m going to alter my original stored procedure.
This is the stored procedure that the application uses so that it’s transparent to the application and in this stored procedure, now it becomes like a dispatcher. This stored procedure, it’s rule now is to decide which one of the other stored procedures to execute. So I’m going to look for my common countries table, whether my currently parameter used by the user is in the common countries table. If it is, if it exists in the common countries table, I’m going to execute the _common stored procedure. Otherwise, I’m going to execute the _uncommon stored procedure. Now, what’s going to happen here, each one of these stored procedures is going to have parameter sniffing. So the first time someone uses an uncommon like IL, like Israel for example, we’re going to use this stored procedure.
Thanks to parameter sniffing, we’re going to get a plan with index seek, because we based the optimization on the value of IL. We’re going to put that in cache and every time we get an uncommon country, we’re going to reuse the same stored procedure, the same plan here. The first time a user uses a common country, this stored procedure is going to execute this one, the underscore common stored procedure, which also has parameter sniffing and it’s going to produce in clustered index scan.
Hopefully, put it in plan – in the cache, sorry, and then what we have is two execution plans. Only two execution plans in the plan cache – one for each type of parameters and we’re going to be able to reuse the appropriate – the right plan for each case so we don’t have to recompile for each execution, we just compile twice and we don’t have this casino effect so we get the best solution. Let’s see it in action.
Brent Ozar: The best summary of where this is going is in the Slack channel. Chuck says – he captures everyone’s emotions very well when he says, “this is just barely not over my head. I’m going to have to watch it again though.”
Guy Glantser: Okay. So first with the IL, if we look at the plan, this execution plan is for if it exists. It has some overhead, you need to test it. In some cases, it’s very simple if exists select null statement might think here’s some overhead, which might mean the solution is not going to be good enough for some cases maybe, but it should be very, very fast, very quick compared to the actual statement you want to run. This is the actual statement, the select from our customers table and I get an index seek from my IL parameter. Now I’m going to use US and I’m going to get a scan. So we have the right plan for each choice. We had of course to recompile the plan twice, but that’s it. We’re not going to recompile any of those again until we have to because the statistics are up there [inaudible 0:37:33.8], or something other different happens so this is the five solutions.
Recapping Your Options So Far
Let’s summarize that. Okay, so again, first, we can use sys – sp_recompile to recompile, which means the plan is marked invalid in the plan cache. The next time a user is going to run the same stored procedure, it’s going to recompile the stored procedure with a value used by that user, and again, we might get lucky and get a better plan. For most cases, most executions we might not. Maybe in some cases, we’re going to even get a worse plan than before when we do that. It’s the casino effect again, it’s gambling. We can use with recompile at the stored procedure level, which means the plan for the stored procedure will never be put in the plan cache because we’re going to compile and generate a new plan every single execution. And then we use parameter sniffing for every single execution and we get the best plan we can for every execution, which is really great if you can afford it.
Affording it means you don’t execute the same stored procedure many, many times per second and you can afford to waste CPU resources on compiling the plan again and again each time. In some cases you can, in some cases, it might be very problematic. Instead of using with recompile at the stored procedure level, you can use option recompile at the query level, which also happens at run time, so only when SQL Server gets to the point of running this specific statement, it will generate a new plan for this statement, execute it and continue with the rest of the stored procedure. We can also use option optimize four to instruct the optimizer to generate the plan based on some value that I will choose and effectively disabling parameter sniffing.
And the last solution using multiple stored procedure, we can distinguish somehow by some logic that we need to develop between common values and uncommon values or different groups of parameter values and then we can generate a different plan for each group, which will be only compiled and generated once and reused again and again. So those are the ways to handle parameter sniffing issues.
Gotcha: Changing Parameters Inside Code
Next, the next thing is changing parameter value. This is something that I see a lot with developers. A common mistake developers do, they have a parameter in stored procedure, they change the value of the parameter inside the stored procedure and then they get poor performance and they don’t understand why, so I want to show you what’s going on in there. So, let’s empty the plan cache again. A new stored procedure this time, customer by last purchase date, and I have the parameter date with the default value of null and if the date is null, this is something that they do inside stored procedure, I’m going to set the date to last month for example.
Now, the story behind this is that we have an application, we have a report, one of the forms of duplication is the report that our users use many, many times and before we had this stored procedure, the reports – we had a date picker in the report, then users can choose a date. I want to get all the customers according to the last purchase date based on same date that I choose and we didn’t have this default value here and then we had select from customers where last purchase date is greater than date, okay, that’s it.
At some point, users told us, the developers, listen, we usually want to get the last month. I don’t want to have to pick the date for last month every time. So we want to have a default value for this report, when we enter the form and we don’t choose anything for the date, give me the last month. This is going to be the default behavior. Unless I choose something else, I want to get what I chose. So what the developer did, he added a default value of null and if we get null, we just change it to last month and we run it. This is the scenario, the story behind this thing, and I see that a lot in many, many places. Of course, let’s add an index for the last purchase date. And now, first of all, let’s execute the stored procedure with some value and we get many, many rows. We get in this case, a scan because actually I think it’s the entire table in this case.
Let’s execute the stored procedure again with a default value, so I’m not specifying anything. I’m adding here with recompile on purpose because I want to generate a new plan to see what happens when we do it for the first time. So with the default value of null, we get almost 9000 customers from the last month. If we look at the plan, the estimated number of rows is only one, while we have 9000. Not so good, so using index seek in this case is not a good plan and the reason is we have parameter sniffing.
Parameter sniffing happens at compile time, before this happens. This happens at run time. At compile time, the value is null, so with parameter sniffing, now the optimizer has to generate a plan for this query using the value null and of course, whenever you compare something with null, assuming you use the ANSI_NULLS setting set to on, to true, every comparison will null returns false. So the optimizer estimated this is going to return between zero or in this case, one row and of course this is not the case because we changed it and we want to use one month before. So that’s a problem.
Whenever you change parameters in stored procedure, you have a problem because the plan is going to be based on the parameter value passed by the user and not the actual value used at run time, which was changed. What we can do, first of all, we can do this: instead of changing it before, we can just put the expression here. We don’t change the parameter value, we use the expression with the ISNULL and everything inside the query. Let’s try this. Okay, so again, we execute it with a default value of null, we had 9000, the actual number, this time, the estimated number of rows is 30,000. Again, it’s not good but it is not good for a different reason. In this case, the reason is because of this expression, because of ISNULL actually, ISNULL function, SQL Server treats this expression as unknown. It doesn’t know how to evaluate this expression.
It’s unknown just like local variables, and with unknown value, they test some rules, like as I said before. In this case, the rule when using the inequality operator, the rule is to estimate 30% of the table. You have 100k rows in the table, this is why it estimates 30,000 rows in this case, which of course has nothing to do with reality. So not good.
Using Local Variables to Avoid Changing Parameters
Solution number two, we can – instead of changing the parameter value, we can use a local variable. I also see many developers do that. Use a local variable, I put the expression in the local variable and then I’m going to use the local variable in my query instead of the parameter. We talked about the problem with local variable before and we can see it here again.
In this case, we get the same problem with the 30,000 because when we use local variables, it is an unknown value, so it’s the same thing. The same rule of 30% happens here again. Not good.
We can add option recompile. Whether we use a local variable or we change the parameter, we can use option recompile. Now, I told you before, option recompile happens at run time so this happens after we change the parameter value, which means it solves the problem. If we look here, you can see that we have a very good estimation, very close to reality, when we use option recompile. But again, the downside is if you’re going to use this one many, many times, it’s going to kill performance because of CPU resources.
Creating Inner Stored Procedures to Leverage Sniffing
Another option we can do, similar to what we did before, we can create another stored procedure with _inner here. It’s an inner stored procedure. It only gets the parameter and then executes it without the default value, without changing it. Very simple, and then we’re going to modify the original stored procedure. We’re going to get the parameter with the null default value. Okay again, this is the API, the stored procedure, the application users, so we’re not changing it. It’s transparent to the application.
We’re going to change the value here and then instead of running the query right away, we’re going to execute the inner stored procedure with the parameter after the change. And what happens here, the other stored procedure also has parameter sniffing, it’s also going to compile a new plan but it’s going to base the plan on the new value. So now, again, we get the same estimation as before, the good estimation.
So the right way to go if you can afford it, just use option recompile. Very simple and a great solution. If you can’t afford it and you need to test it and see if you can or not, you can use this option of running an inner stored procedure or instead of running an inner stored procedure, if you don’t want to maintain many copies of stored procedures, you can use of course, dynamic batch inside the original stored procedure, a dynamic batch with parameters of course, the parameterized dynamic batch. This will also work because you also have parameter sniffing there. So remember not to change parameters inside a stored procedure unless you handle it correctly.
Simple vs Forced Parameterization
The last topic I want to talk about related to parameterization is simple versus forced. SQL Server has two modes of parameterization. The default is simple. If you don’t change anything, you’re under the simple parameterization mode and with simple parameterization, in some cases, although you have a query using a literal, a constant value, not a parameter, SQL Server, by itself, without asking you, is going to replace that literal value with a parameter.
Now, this might sound like who asked you to do that, but SQL Server will only do that when it is absolutely sure that no matter what value you’re going to use, the same plan is going to be the best plan for all values. So in that case, it has to be stupid not to do it. Essentially it means you were stupid not to do it, I will do it for you, and it’s a great thing. It’s a good thing. Again, remember the example of where transaction ID equals something, you have potentially billions of different values, billions of [different plans 0:47:11.9] and in this case, hopefully, SQL Server is going to parameterize this query under the simple mode and you’re not going to get billions of plans. We’re going to see when that happens.
You can change to the forced parameterization mode. In this case, for every query, every time SQL Server is going to parameterize your query, even if for different values, different plans are going to be the best plans. This can be sometimes good, sometimes it can be very risky. You can apply parameterization either at the database level; for each database there is a property parameterization, simple or forced and it will affect the entire database.
Or, you can do it at the query template level. You can take a specific template of the query with some placeholder for the value and say, whenever this query comes, even if there is a constant value there, please replace it with a parameterized value, but only this specific template. Other queries inside the database are going to stick to the simple parameterization mode. So doing it at the database level is very risky, I don’t recommend it. If you can, do it at the query template level and then it can do some fine-tuning and only in specific places. You can say I want it here but I don’t want it there.
It’s quite complex, of course the best solution will be to go to the application and rewrite the code to use parameters so you don’t have to change it behind the scenes because it has some side effects.
I’m going to show you a very quick demo and then we’ll summarize the session. So I’m under the simple parameterization right now, I’m going to use the same query or IL and of course, as we’ve seen before, we use a constant value here, not a parameter, so this is what we expect. If we look at the plan cache, we see that it’s an ad hoc query, not a prepared query.
But look at this query, the same customers table where ID equals 1,2,3,4. Now, ID is the primary key on this table and we have a clustered index on the primary key, and although I use 1,2,3,4, which is a literal value, a constant value, I get a prepared statement. And SQL Server decided for me that this is going to be a parameterized query, it replaced the 1,2,3,4 value with a placeholder, with a parameter F1. It chose the type small INT based on the value I used, 1,2,3,4 and now I have a placeholder. I have a template.
So when I run the same query with 2,3,4,5, of course we get a reuse of the same plan behind the scenes, which is great because if you think about it, if it’s the primary key, no matter what value you put here, you’re going to get, at most, a single row and you’re going to get a clustered index seek as the plan. There is no other choice. So there’s no point in generating the plan again and again for different values. Great stuff doing it with simple parameterization.
Brent Ozar: You got a lot of questions about that in the Slack channel. The explanation was very good because everybody was like, wait a minute, what does this fix or cure, so good haul there.
Guy Glantser: Okay, if we look at the databases; the databases view, we have is parameterization forced, so it’s currently zero because we’re under the simple parameterization and we can do this: alter database at parameterization forced. So now, even with my country equals IL, if I look now at the plan cache, we get a prepared statement and it shows X0 as the parameter name with nvarchar 4000, which is also a problem because in some cases it might be not an appropriate type, it has some side effects as well.
Let’s go to the simple parameterization mode and then if we want to do it at the query template level, we have this: sys.sp_getquerytemplate. I’m going to pass it my query with some value here, which has no meaning in this case. It doesn’t matter what value you put here, SQL Server will extract the template from this and replace this value with a parameter placeholder and it’s going to output the statement with a parameter and the parameter’s definition. I’m going to show you the output here.
So this is the output with some parameter name at zero and the definition of that parameter. Essentially, this is what it does behind the scenes when we use forced parameterization, and then, I’m going to use sp_create_plan_guide and this plan guide is a type of template and I’m going to pass it the statement, the parameter definition and the hint, in this case, option parameterization forced.
This way, I’m instructing the optimizer when you see only this specific template, regardless of the value here, please parameterize it, replace it with a parameter and reuse the same plan again and again, but only for this one. Because I remind you that we are still under the simple parameterization mode at the database level, but under the simple, because this is the template I chose before, I run it now three times with different values. I look at the plan cache, we get a prepared statement with execution level three. So we get the reuse for this specific template even under the simple because we have a plan guide for that. Let me drop the plan guide. So I know this was quick, you get access to of course all the scripts, you can go through them later. You have the set up of the database, you can play with it to get it to learn better.
Recapping What We Learned
Just before time ends, I want to summarize everything. So again, parameterization, I hope you agree with me right now, is a very important topic. It has many side effects on performance. If you don’t handle parameters appropriately, you’re going to suffer from performance issues. Some tips I want you to take away from this session, first of all, in most cases you should prefer parameterized query. Prefer to use parameters in your queries and not ad hoc queries dynamically generated with the different literal values because in most cases, our distribution of data is more or less uniform and for different values, we want to reuse the same plans again and again. So for most cases, use parameters. If the data in your column is not uniformly distributed, then you need to apply some methods to fix that and we saw some options like using option recompile, optimize four, multiple stored procedure with some logic to distinguish between groups of values and all of that stuff. You need to apply those.
Avoid using local variables as an alternative to parameters. They are not parameters. If you use local variables, they are unknown values to the optimizer and in most cases when you use local variables, you’re going to suffer from poor performance because it can only use general statistics, which has nothing to do with reality. In most cases, don’t change parameter values inside stored procedures because parameter sniffing happens at compile time and the optimizer uses the parameters as they are passed by the users, not after they are modified. If you have to change them, use option recompile or use an inner stored procedure or a dynamic batch in order to solve this problem.
If you want to force parameterization, be careful. Do it only if you have to, if you don’t have access to the code and can’t rewrite the code and use parameters appropriately, you still have a choice by using forced parameterization. It’s very risky, please test it and if you have to do it, prefer to do it at the create template level and not at the database level because it might be very risky.
Back to Our Story: The Customer with the Suddenly Changed Plan
And now back to the story with our customers, we had a stored procedure running for 50 milliseconds each time and starting one hour ago, for some reason, it started to run four seconds each time. Now, the customer was right. They didn’t change anything and it happens once in a while. What happened is one hour ago, the statistics has crossed the threshold to get automatically updated, and it was updated automatically one hour ago and it was updated the plan that was based on that statistics was marked as invalid and then it was recompiled the next time the user executed it and we had the casino effect. We were unlucky. That user used some very ware value, parameter value, which produced a different plan as with the plan we used to run and now we’re stuck with that plan in cache and from that point onwards, we get an execution plan of more or less four seconds each time, which is a problem. It happens once in a while, it might happen in every system because of how SQL Server works. You have to understand what to do with those things, and again, we talked about all the options that we can do to handle it so you should be aware of that.
I wrote a complete series about this, you have the link here. You can access it on my blog. Everything that I talked about here in that blog post series, including all the code, the demos, so you can go through that, and if there are questions that would be a great time for that.
Questions & Answers
Brent Ozar: There is. You know what you need to do, you need to switch back into SQL Server management studio for a second because there’s a huge question that a bunch of people asked and I believe Paras Matalia was the first person to ask it. Over on the right hand side, you have that beautiful graphic. Everyone wants to know how to turn that on.
Guy Glantser: Okay, so you can just right click on the scroll bar okay, and then you have scroll bar options, or you can access it with the tools options. You can get to text edit for all languages, scroll bars and then instead of use bar mode, you can click use map mode and it’s going to look like that. I love it. I think it’s much better.
Brent Ozar: Yes, lots of people asked for that. Let’s see here. Several people said we should probably stop calling it vNext, because Guy and I were talking about that before the show as well. So now if you’ve got any questions, feel free to ask them either on Twitter or in Slack or in the Q&A in GoToWebinar. You should throw your contacts slide back up too so everybody can see how to follow you on Twitter too.
Guy Glantser: Yes, right here.
Brent Ozar: There we go. Beautiful. So several people said very nice presentation, thanks. Gerta says, “Great presentation.” They’re also posting links over in chat. Reduk says, “I know you can’t hear me but I’m clapping. Very nice, excellent session.” G. Bernett says, “Great presentation, I’ll re-watch this one.” Elsmer says, “Youtube the next day or so?” Yes, we record all these – it may take three or four days, depending just on how long it takes us to encode and upload and all that good stuff. But all these will be available on our Youtube channel in Groupby, in the podcast, we have an Android app now and then over on Groupby.org, the whole thing will be transcribed as well too. Chen Du Xi said, “Very nice presentation, thanks, I love it. Vladi says, “I’m just glad I won’t be the only presenter with an accent” and then Mike says the accent made the presentation. Yes, these two Americans, anytime somebody has an accent, it sounds smarter so yes.
Guy Glantser: Really? No.
Brent Ozar: Oh yes, no, it’s true. Especially the British accents, we’re just like oh, they must know everything.
Guy Glantser: Yes.
Brent Ozar: Well thanks a lot. James Ukanah says, “You should sing before you leave.” So I don’t know if we can – on demand at the end of the session though, that’s when it’s really tough.
Guy Glantser: [Sings] “There must be 50 ways to leave your lover…”
Brent Ozar: And if you want to hear more of his excellent singing voice, SQL Server radio, the podcast SQL Server radio. He sings a lot of the intros there.
Guy Glantser: Oh yes.
Brent Ozar: Excellent, well thank you very much, sir. Really appreciate you presenting today and big round of virtual applause for you. Nice job, excellent job.
Guy Glantser: Thank you very much.