Gems to help you troubleshoot query performance

Audio podcast:

Enjoy the Podcast?

Don’t miss an episode, subscribe via iTunes, Stitcher or RSS
Leave us a review in iTunes

Target Audience:

All roles that want to learn what’s available to aid in SQL Server query troubleshooting space. That will be the developer that needs to squeeze every ounce of performance from his queries, to the DBA that has to understand the problem space of an ongoing performance issue.


Query Store in SQL Server 2016 is one of the most relevant improvements for production DBAs. It introduces the concept of click-once mitigation approach to query performance regression.

But that’s not the sum of all enhancements in the query performance analysis and troubleshooting space introduced in SQL Server recently.

In this session we will learn about other enhancements that can help you troubleshoot query performance.

Ranging from new xEvents to Showplan improvements, from LQS (and underlying infrastructure) to the revised Plan Comparison tool, learn how these can help you streamline the process of troubleshooting query performance and gain faster insights.

Why I Want to Present This Session:

The SQL Server engineering team has been adding so much to SQL Server, in terms of new capabilities and tools to streamline the query perf troubleshooting, that we want to showcase how it all comes together.

Additional Resources:

Session Transcript:

Brent Ozar: Welcome to the next session at GroupBy, in which Pedro Lopes – oh no, Lopes, I’m going to get it right here. So now the next session, Pedro is going to cover Gems to help you troubleshoot query performance and you notice that his slide deck has a Microsoft thing on it. That means he’s not like the rest of us who are faking it just to you know – trying to get in from the outside. He’s actually inside the big monster here, so take it away Pedro.

Pedro Lopes: Thank you Brent. So welcome everyone. This session will mainly – is mainly a vehicle to let you know and then be kind of evangelization of the work we’ve been doing in the last year or so. About adding troubleshooting artifacts, if you will, to the engine, all round to make just the ability to troubleshoot query performance just that much faster and that much focused. So the objectives I have set for today would be to show you what new diagnostic improvements we have in SQL Server engine, some of you will already have contact with some of these, others hopefully this will be news to you and you can start using them. We’ll learn how to use some of this new diagnostics to troubleshoot the common performance issues and I’m going to take you through several of them.

So here’s a generic question though, when was the last time that you actually dealt with a query performance issue? We here get – the Tiger team, one of the activities that we have is actually to receive escalations from the field and from customers. And a lot of them have to do with performance issues, namely query performance issues. And those can have a different variety of root causes but it is very common that even our CSS gets a heavy load of query performance troubleshooting topics. So I’ll take you through a couple of fundamentals, just to make sure that everyone’s on the same page as we progress through the session.

So how does – why does query slow down? There are a number of questions that you must ask yourself and a number of root causes that may explain why generically speaking, a query slows down. So most often, you will have excessive resource consumption, so that can be either your CPU is running high or you’re doing a lot of I/O and every of – all of those aspects, even running with a high memory, all those aspects that you actually feel in SQL Server as the instance, it all boils down to ineffective queries or ineffective use of resource inside your server. So those can be driven by poor indexing strategy for instance, whereas your index design was probably – once a upon a time it was good enough for your work load but as your work load kind of changed in its profile or actually your data size grew, that index may prove not to be reliable anymore. And therefore it will trigger a number of other issues, namely excessive resource consumption.

So another topic is you may lack useful statistics and the engine has been becoming smarter – has become smart in using, for example in 2016, it’s using again, multi-column statistics and we’re only loading the statistics that are actually found to be needed for the optimizer, rather than in previous versions where we just basically loaded everything. Or you may be facing for example, lack of useful partitioning, and that has to do directly with the case where your data has grown and that has like, tipped over some performance aspect and therefore you need to start using strategies like partitioning to handle your data. Now, another category may have to do with blocked queries, so not necessarily excessive resource consumption but just queries take time and time forever because somewhere there’s a blocking chain and some queries blocking a number of others. And this is – the last one is rarer, but it happens. For example, incorrect server configurations, and those may have to do with, for example, you are running a lot of ad hoc workload and you hit perhaps some compilation bottleneck so you can change some server or database configurations to handle those sorts of scenarios.

And the context has to be set when you’re analyzing slow queries, so it may not be necessarily that SQL Server is the culprit when a user calls in and says, hey my application is slow, but it can be related to another component that is outside SQL Server itself. For example, slow network performance or I can have some – I have a service running in the server that may be causing performance declarations. For example, taking a lot of memory or consuming a lot of CPU, whether that is becoming concurrent with SQL Server process itself. But then if it is SQL Server itself then obviously SQL Server only has an issue if you’re running queries, and therefore the query plan shape might be an aspect to look at and you need to understand if you have issues with queries all across the board or if you’re able to identify specific subset of those queries. And if so, you then can drill down and start to look at whether you have optimized the query with useful statistics and we actually have some useful information here. Do I have suitable indexes available? Is my index design still up to date and that relates to my predicates I’m running? Do I have any hot spots in my data or when I’m looking up my query performance am I hitting some parameter sensitivity issue, for instance? And those are a number of question you need to kind of find the answer to scope down from a big sentiment, if you will, which is a user calls in and says, my application is slow or your database is slow, and drill it down to some aspects that you can actually tackle with.

So how I do I analyze the performance of a slow running query? At this point I’ve gotten to a point where I was able to drill down and understand it as a subset of queries for example that is slow running. So here are some of the tools that we’re going to be using today. Showplan XML is something that we’ve been investing heavily in the past year. We’ve added a lot of relevant information to Showplan, essentially to make it a one-stop shop to get the context as we’re analyzing a query plan and kind of avoid going back and forth getting data from other DMVs or getting data from other sources to kind of complete the picture. Because it is important when you’re analyzing query performance to also understand the context in which the query execute. We’ll also be using Query Store, the plan comparison tool, which is kind of an Easter egg we kind of released a year and a half ago and we’ve been investing on it to make it smarter.

Live query stats, again, we have made several investments in the engine itself to make this feature more usable in a day-to-day basis and xEvents, obviously we’ve been investing in xEvents, that’s the de facto troubleshooting artifact if you will, that we have all across the engine and across most new features. So a query plan will include a lot of useful information. By looking at a query plan, you can understand how the data is accessed by looking at the baseline operators, the seeks and the scans if you will, how the data is joined so what type of physical join are we using to resolve what kind of logical join. Also the sequence of operations because unless we instruct the engine otherwise, if you’re joining several tables, there will be some join optimizations happening so the sequence of the upper engines may not be exactly the same as you’ve set textually in your query.

And you may see for example, work tables or other artifacts showing up in the engine, in the plan that you were not accounting for and those may be relevant whether or not you have warnings or not. Also, understanding what size of data flow. So not only the estimations but the actual data flow, so the amount of rows that are flowing from one operator to the other and the cost for each of those steps and the cost that goes upstream. That is super important to understand. For example, in a very complex plan, what are the nodes that are most relevant that I really should focus on? And again, we’ve been making some investments there to make it easier for you to identify that.

Whether you’re using parallelism or not, do I have warnings? What are the execution stats? So a number of relevant information is right there in Showplan. That will allow you to not only understand how the query is getting executed but hopefully – and with information we’ve been getting, and I will show you today why certain decisions were taken and how to mitigate them if you want.

So – and this will be my last slide on the fundamentals if you will, it’s the notion of understanding fingerprinting. So query fingerprints and query plan fingerprints, and why is this important? Because as we move through several DMVs, as we compare plans with the output of certain DMVs and DMFs, and even xEvents, we need to correlate the information. So if we’re talking about queries and query plans, so we need to – some way to fingerprint them and to univocally identify them in the system. So we have the query hash for instance that will explicitly identify a specific query in the entire plan cache and that’s – we designate that as a query hash, it’s a query fingerprint, essentially. You can use them to filter for example, ccmexec request, ccmexec query stats and get some insight into other – your inflight request or queries that have already ran and we are caching some performance stats. Now, there is also the SQL handle. Basically, this is the token that identifies the SQL text that relates to a batch. So for example, you can use the SQL handle to join different DMFs and DMVs. ccmexec SQL text is the DMF that is mostly used for you to grab the exact query text of a certain output for example, from ccmexec request or query stats. Then there is a notion of fingerprinting with query plans and query plan hash will be that piece of data. So it will be the specific fingerprint to determine a specific execution plan within the scope that you have in the plan cache. You can even, for example, just with these two pieces of data, do some interesting exercises. For example, I may be able to identify that one query hash actually may have more than one query plan hash in the plan cache. For example, if two different users are connecting independently, and they have different set options in the connection, those may drive different decisions by the optimizer and therefore you may even find two different plans in the cache for the same specific query hash. So that can be useful exercise for you to determine whether you have this kind of scenario.

And finally, the plan handle. This, similarly to the SQL handle is a token for that specific cached execution plan that again, you can use in DMFs or DMVs such as you see on the screen. So establishing these fundamentals because we’ll talk about some of these topics as we progress through the session, we’ll use this information so I just wanted to establish this.

So we move on to our diagnostics and troubleshooting in-house since we’ve been doing over the past year. Now, taking the – we’ve just talked about query hash and query plan hash and those are actually actions in xEvents because if you’re capturing xEvent collection, you can capture that piece of information that you can then later correlate to some DMF or DMV that we’ve seen. Now, one problem that you guys in the community have identified and opened a connect item that’s been there for some time that we have taken care of now is, hey, when I collect my xEvent session, actually the query hash, the query plan hash is there, I can’t use it because – I mean, when I try to use it to filter out for example, ccmexec query stats, I can’t find that query there.

So that actually made it difficult to correlate the information and I know that – I think a couple of years ago or three years ago, there was someone actually in Microsoft CSS that came up with a CLR to kind of mitigate this difficult correlation, but it was still something you needed to deploy on your server and actually know about it. So what have we done? Into the 2016 RPM and we back ported this in 2014 SP2, one of the main missions of our team, the Tiger team, is actually to understand not only what we can improve in vNext. In whatever comes in the next version but also understand what we can bring back as much as possible to add value to in market versions and this is just another case that we did. So not only did we ship this in 2016 RPM, but we back ported to 2014 SP2. So we added two new actions, as you can see that on the screen, query hash signed and query plan hash signed and those will actually be the same data type and the same exact hash that you will find in the engine DMVs. So now it becomes much easier to correlate this information between xEvents and like RPC Completed, for example, and some DMVs like we’ve seen, and I’m actually going to have one small example of this for the [inaudible].

Another problem space we set out to resolve in this past year is try as much as possible to avoid round trips. Meaning you have a query performance issue, you are a DBA or a Dev in your organization and someone complains and you get the actual execution plan for example. But that doesn’t really have all the information that maybe you require, so you do another round trip. And this is especially harmful when we’re talking about you opening the case with Microsoft and you send them the actual execution plan for example, thinking that you have all you need to tell me why my query is behaving badly and give me a mitigation for that. But then you get a phone call saying, hey we need to collect information from this DMV, we need to understand if trace flags were active or I need to understand what I/O we did in this query, so please collect statistics I/O. So, a number of round trips are perhaps required for every engineer that is analyzing query performance to get the full scope and the full context and that is kind of frustrating.

So we set out to correct that as much as possible. And for example, up to SQL Server 2016, the three pieces of information you see both there, actual rows, actual end of scans and actual executions – those were the only pieces of data you had in the scope of an operator in Showplan XML. So I would click on scan and I would see how many rows I’ll put from that scan, how many scans were actually did and how many executions were done. That’s it, no more information. So I needed to resort to other DMVs and other artifacts in the engine to actually get some more information.

So what we did in 2016 and 2014 SP2 was to include a lot of this missing information by default in the actual execution plan. It’s actual because these are all run time performance counters, so they’re only collected at run time. So as you can see, we’ve added the elapsed milliseconds and the CPU time that each operator took, we now have information about the logical reads and the physical reads and read-adeads; so all the I/O information for each operator that actually does I/O. And that’s all included under the run time counters per thread inside Showplan XML.

Actually in 2016 SP1, we included some more information in this case, about memory grants. You can see them in the bottom there. For certain operators like sorts for example, or hash operations, we now have information about the input memory grant and the used memory grant. That is super important to get the context of whether that plan executed with a reasonable amount of memory, or better put, if it was given, if it was granted a reasonable amount of memory for what it actually used. If you see a huge gap there, that’s a symptom of, perhaps, miss-estimations and you need to take some action there.

Just a note, when you’re actually looking at these run time counters, is that if you’re looking at row mode operators, for example, you have a couple of seeks and they’re being joined. You will have this run time counters for each of the seeks and when you look at the join, you will actually see the values are being pushed up the tree, meaning the join will have run time information on itself, summing with the performance counters of the children. Now, for batch mode operators and that has to do with the way we measure performance for batch mode inside the engine, you will actually see singleton values, so you won’t see the values summing up the tree. Just a quick note there in case you kind of find the strange when you’re analyzing mixed mod plans. So here’s how we actually see that information exposed.

So in – you can – in this case, I was clicking on a clustered index scan and I can see information exposed about the actual I/O statistics. This means that for all intensive purposes, I can do away with the round trip of requesting someone to run the query again with set statistics I/O. I don’t need them anymore, I have the information right there. I can see there on the right how that is exposed inside the Showplan XML, in case I’m actually looking at the XML but it’s very comfortable if you will, to actually look at the properties inside SSMS.

So another thing we added, like I’ve said a minute ago, is the amount of rows that are flowing between each operator. And in this case you see here, I was running this query in parallel, this – actually this scan was running in parallel. It was running in four threads and I can see that actually at this point in time, how many rows each thread was reading. And in this case, I see it’s not very well balanced. Obviously in a perfect world, with a perfect data distribution, and this is what the optimizer attempts to do, is to evenly distribute in this case in a scan, each of the threads that is running, the scan will take more or less equal part of the workload. And therefore, you are able to finish the threads at essentially the same time, given there are no external conditions occurring here.

And in this case, what I can see is that thread one actually read over 40,000 rows and thread two only read 17,000. That may be again, right here, a hint that we may have outdated statistics for example, or statistics with a wrong sample mode so when you have … statistics or created statistics, those did not have a good enough sample. And you now have this kind of information right here, so you can start to proactively look into what you can act upon in your system. And another thing we added that can do away with the need to collect set statistics time is actually the overall time statistics for a specific operator.

So you can see here and this is not collapsed, this is collapsed but you would see again, per thread, you could see the elapsed time that this operator took and the elapsed CPU time for this operator. Again, you – as you move up the tree, if you’re in row mode, you would see these values growing as you move up the tree, so you could easily identify which nodes and which branches in your plan actually weigh the most at run time and make those the primary focus of your attention when you’re trying to do some query tuning. But like I said, we have been investing in xEvents, and this kind of information is also in the new xEvent query trap profile.

Now, this xEvent as you may notice on the screen, is a debug channel xEvent, so if you want to use it you need to pull down the debug channel, but as you can see, you can collect this kind of information per node, per operator. So it can be kind of chatty so you can use it in your test server just to get a feel of what output you get. But if you’re running a very heavy workload, a lot of executions coming in and you’re trying to collect this, you’ll see a very chatty output. So this is meant to be actually for debug purposes, exactly. One caveat again, is that while the Showplan timescale you’ve seen just now is showing in milliseconds, the xEvent is actually shown in microseconds, and that’s running microseconds for CPU and total time. As you can actually see on the screen – let me put my mouse there. You see here that total time and CPU time is specifically in microseconds and that’s also in the name of the event field itself, so it’s easier to understand what unit we’re looking at.

Okay, so more into Showplan, what else can we put into Showplan that makes it the one-stop shop for the query execution context? We’ve added information about trace flags, namely to understand which trace flags are active as we execute a specific query. Because as we know, a number of trace flags may actually influence the optimizer to take different decisions on the plan shape and the – and on a number of rules that are enforced. So it is very useful to understand what we have active at a certain point in time, when you’re running a query because for example in this case, I see on the top that I have three trace flags active in the system as I ran this query and two of them, as I can see in the bottom, were not compile time trace flags, meaning they did not affect compilation of plans, but actually one did – 9481. 9481, is in this case, is enforcing the old CE. So it is very important for me for anyone analyzing a query plan to understand, hey, did I have any outside interference, so to speak? And again, it’s all about not only getting the context just right at the first time but avoiding a round trip, because let’s say we’re doing a query analysis on this plan, you’re doing some recommendations and someone all of a sudden says, hey, but you know what, I was actually looking and I have a 9481 trace flag running globally on my server. Or in this case, it was a session scope – hey, is that important for you? Well, yes, that actually affects query optimizers so it will be interesting to know that beforehand. Now you have that information right there.

So yes, so I know if it’s the compile time trace flag or if it does not affect compilation. And speaking of trace flags, in the last few months, we’ve been revamping the trace flag list in Microsoft docs. You have this short URL here you can make use of. So any and old documented trace flags that apply to current team market versions and many of them were scattered throughout some obscure KBs. We’ve pulled them all together and you’ll see documentation under a single URL, so it becomes easier to understand what is supported versus some other trace flag you found somewhere. And continuing on the trend of adding more context information to Showplans, we’ve also now added the top waits. Now, again as a context information, it’s super important to understand if when I run a certain query, what was the query on – in this case, the session in which the query was running on – what was it waiting on?

So it’s super relevant to understand this information, and again, being in Showplan precludes you from having to go to a different DMV, getting the information from there and then trying to more or less match the time codes that you got in the DMV with the time that this query ran, so it’s now right here. So for example, recently we – our Microsoft CSS got a case where just by using this – for example, they were seeing they had a lot of CX packet waits in that specific execution, and when they moved to the plan to look at it, they actually saw that most of the – sorry, they have SOS casual yield, and most of the CPU was being taken actually by an exchange operator. Now, as a short term mitigation, you can just run that with MAXDOP 1, because it was the exchange operator itself that was taking a lot of time and impacting the weights on the execution so just by – as a short term mitigation, running MAXDOP 1, you do away with that kind of operator and the query performance stabilized, and then you can go in and do some more deeper query tuning. But this is the sort of information that being right there, provides you a context that allows quicker action, understanding what you’re seeing.

Another very – I say very old connection request connect item that we’ve dealt with right now, is actually adding the parameter data type to the parameter list in Showplan. So it gets easier to detect when you’re looking at type conversion issues. For example, you have a seek plan warning or something like that – it gets easier, again, not leaving the context of Showplan. And especially if you’re analyzing remotely, you now have the information about the incoming parameter data type right there as you’re analyzing in line with other pieces of relevant data, as you can see here, and we did not stop. For example, we have this kind of information, which is the query time stats, this is found in the root operator of the plan, if it’s a select you will see in the select node – the root node, and this is the overall CPU time and elapsed time for the entire query.

So not only did we add CPU time and elapsed time at the operator level as we’ve seen a few minutes ago, but we actually also added for the overall query. It’s super important for me to understand, again, when I’m looking at a query – a query plan, an actual query plan how much time did it take. Because that’s the useful little piece of data that again, being in Showplan, I don’t need to actually ask someone, hey don’t forget to make a note of how much time the query took and send it to me when you send me the query plan. It’s now right here, so you have that information from the get go. And, well, you can see we’ve done a lot of changes to Showplan lately.

Another piece of information we added is about how resource governance settings are impacting the use of memory in a specific execution. So for example, we’ve added to the optimizer hardware dependent properties that max compiled memory and that tells you what is the maximum memory that the optimizer had to compile, at compile time, obviously. But what this gives you is, if that query was running for example, under a specific resource pool, then if it is let’s say a lower value than what you would expect if you were running on default for example, you can understand if you’re having for example, compiled weights or something like that, you can see if you have a very small amount of memory for example to run compilation, and that can have adverse effects. So, this has a direct correlation to what resource governance setting you have set for the context of this query specifically. And we also added max query memory. And this one, again, can map to a resource governance setting, so we have the max memory percent hint and the resource governor, so what you see here under max query memory is the potentially max memory that can be granted if needed, for a specific query. So these are a system or slash resource governance settings that are right here, and again, make up some relevant context information when you’re analyzing a query plan.

So let me quickly show you how this can work. Brent, are there any questions or comments you would like me to address?

Brent Ozar: Yes, Hugo has a really good one. Hugo says, “With all of the additional run time counters in the plan, are there plans to change the logic that computes percentage of cost? To like then say, this operator is actually 70% instead of running off the estimate?”

Pedro Lopes: No, that is still not changing, that is an artificial computation made from compile time costs, so we’re not addressing that. Hugo, if you care to email me with some more details on why you would see that relevant or what use case you would have, I would love to chat more about you. Is this Hugo Kornelis by the way?

Brent Ozar: Oh yes.

Pedro Lopes: Great. So I’d love to chat with you about this. Okay, so in this case, – oh yes I have a note here because I ran this demo once, I had optimized for other [crosstalk] sort of stuff happened, so let me create a session here. As you can see, I’m using the query thread profile event. Can you see the – is the size enough?

Brent Ozar: I’m fine with it, usually people are you know, watching on good monitors, except Hugo. Hugo for some reason is using a 12 inch screen, but everyone else is fine.

Pedro Lopes: Can you use a zoom it or something? I’ll use it as often as I can. So I’ve just created an event session with this query thread profile event. As you can see, I’m already collecting with a query hash sign and a query hash plan sign properties instead of the ones that don’t have a sign. Because I will want to correlate information I collect here with information from DMVs, so I need to use that. So let me try and get an execution plan – okay, let me run this – as you can see, there’s a very simple query here on AdventureWorks and I’m enforcing Max UP1 just for the simplicity of the output and you’ll see this in the xEvent trace. And actually just for the – I can then run it with removing the synth just for the kick of it because we’ll see how chatty this xEvent can be.

So I’ve collected this trace here, let me open that, and you can see it collected one event per note. So I have note ID one and so on and so forth and this is note ID six for example, and I’ve collected a ton of information about I/O, CPU and other information relevant for the context of the execution, of this specific note. Now, why I did it on Max UP1? Because let’s say I was running with – I have four CPUs here and if I set Max UP4 and this could use four CPUs, I would see four times the number of entries here. So just for the sake of the demo, I didn’t want to visually pollute this as much, so I kind of used Max UP1, but you see, if you were running with Max UP8 or even more than that and you actually are able to run with that degree of parallelism, you will see this event becoming very chatty very quickly, so please mind that when you’re using it.

So I could see here that note ID six in my plan was returned actually 120,000 rows and it did 1200 logical reads and it ran in row mode, so I have all sorts of relevant information here, even the CPU time in microseconds. But let’s say I want to correlate this information with some DMV or I want to go to ccmexec query stats. So I can, for example, grab the plan handle from here and go back and I’m able to easily correlate with information in my plan cache. So I have the other SQL handle, I have here all the relevant information about – so for example, I can use the start off set and the end off set to cross apply with ccmexec query text and get the text from there using in this case, the SQL handle, so there are all sorts of joins and cross applies you can do between relevant DMFs and DMVs to get your own flavor of output.

So I can see here that this has executed once, I can see the total worker time, less worker time, the output in terms of I/O, so I can see in my logical reads here – so these are all again, new columns we’ve added to this DMV. I can see the grant information, so memory grant information is here, of course this was just one execution. But as you look at this holistically over a period of time, if more than one business cycle has gone through the server, I’m able to understand even by looking at this DMV whether I may have a memory miss-estimation issue in some query for example. So there are all sorts of insights I can have from this DMV here. So I can go in and get the plan from the cache using this – so I can go in and grab the cached plan. Obviously, the cached plan will miss all the run time information because I’m getting this from the cache, but we’ve still made some additions to the cached plans. I’ll show you that later on.

So you can finally also use the information about query hash and query plan hash that we capture in the xEvent trace. So for example, I’ll go in and grab the query hash signed that I’ve captured before and I’m going to use it here – query plan hash, was that the query hash or the query plan hash? I kind of got lost. Query hash. So it’s here, and I’m now able to correlate directly. So there’s no more need for CLR function or guessing. We now have a direct correlation between information we collect in xEvents and being able to retreat that information from several DMVs. Any questions here?

So moving on, more investments in Showplan. We’ve added – so this is possibly a known warning that you might have seen, we’ve been – we’ve added this in SQL Server 2012, but basically you had information about hey, memory grant took x amount of time to be given to this execution and therefore for example, in this case let’s say we ran the query in 41 seconds, but we see the warning here that it needed to actually wait 40 seconds for memory grant, so that’s a relevant piece of data again when you’re analyzing this, but that’s basically what you had. However, so we’ve added more information about memory usage and I’m going to show you more warnings going forward. This is what I’ve shown you just now live, so in new columns in CCMExec query stats, everything about memory grants. So for example, here I have a query that was run, again, only once, this was part of the demo, but I see hey, the last grant was almost 800MB and how much memory did it use? Not even 1KB. Talk about misestimating memory or misusing memory, and why is this relevant? Think about a number of queries running in your system that are for example, taking 1GB of RAM and using a couple of kilobytes. That will hinder the ability to actually concurrently run workload in your server. So therefore, you will want to be attentive to this sort of misestimations because waste is – it’s not only about wasting memory, it’s about not being able to have your SQL Server run other queries as concurrently as you could potentially, if you kind of have more reasonable memory grants as it relates to the actual usage of the memory. And if anything else fails, we’ve actually added a couple of hints that you can use, so the min grant percent and the max grant percent that you can – if anything else fails in terms of trying to adjust the – your query and the underlying statistics and what not to make sure the engine actually gets a better estimation. You will then be able to administratively if you will, be able to drop the ceiling, for example, the max grant percent to a value that is more close to the actual usage of memory that you’re seeing. But we’ll see more about this later. So that kind of information is obviously then also inside Showplan XML itself. Yes, this is what I was going for, so having this information right there in Showplan, it’s super important to detect is this sort of scenario is happening. So not only if you look at your workload holistically through ccmexec query stats, but if you’re analyzing one single plan execution, then you’re able to also get that specifically from Showplan.

But, although this information is there, it may be missed with all the other stuff that’s inside Showplan that you’re minding. So, what we’ve added is also another warning that has to do with memory grants and this one has three conditions. I’m not going to bother with the conditions here, I’ll make the slide deck available so you can go and have a look at it later, but you can see here in the right side of the screen where I was running a certain query and I got a run time warning saying this query detected excessive grants, and this may impact reliability. Reliability meaning in this case, the ability to do more concurrent workload. That’s a kind of measure of reliability. So in this case, I can see the initial grant was almost 300MB and guess what, I used the 176KB. Again, this is not a good use of memory for my – good sound server that is trying to serve a lot of concurrent workload.

So if anything else fails, I could for example use the max percent hint here and if I kind of have the notion that this kind of query profile is always overestimating memory and I have no other way of addressing this, I can just go in and say, okay, from the executions I’ve seen, I take a max of let’s say 200KB of memory, I can set an artificial ceiling for memory grants and just make sure that the grant is not higher than a certain amount and therefore, SQL Server can have more queries running concurrently, as long as work is not available, obviously.

So we’ve added also in memory grant usage xEvent 2016 to detect this kind of scenarios. Now, mind you that for this xEvent, so it doesn’t become too chatty, we actually only fire the event if the grant as a minimum of 5MB. So only those fairly larger queries if you will in terms of memory grant, if we detect inaccurate and insufficient grants here, then we’ll fire the event. So yes, and these were the options I was talking about. So in SQL 2016 and 2014 SP2, we give you an option to administratively address grant sizes. So you have this query hint here, the max grant percent and in this case, if I set it like option max grant percent equals 0.1, I’m saying that at max, 0.1 of the available memory under my specific resource governor resource pool that I’m running on may be available for this query. So mind that the usage of this hint has to be taken into account where in resource governor am I running this. If I haven’t set up resource governor, I’m running under default, so therefore, the max available would be potentially the limit of max server memory. So you’ll find there from that perspective, but if you’re actively using resource governor, mind you that you need to take those configurations into account.

So obviously the valid input here would be between zero and 100%, as you would expect, max percent needs to be bigger or equal than min percent and the reason we have a floating point value here is we want to allow it to be very granular in the amount of memory you set for either min or max. For example, if we just have it as an integer, let’s say the minimum we allowed would be 1%, think of one terabyte memory machine, and that’s not so uncommon nowadays. In the case that we’ve just seen that I was taking 800MB to use 200K, well 1% was still overkill. So in this case, I would be using 10GB, so that’s not good. So that’s why we started to use floating point value.

Okay, so this is a super artificial query, I’m actually forcing as you can see here, but forcing a memory issue. But to not only give you a gist of how the warning surfaces, but also for you to see how you can use these kinds of hints, in this case, min grant percent, but you can use max grant percent, how you can use it in line. In this case, I’m artificially setting that hey, the min grant percent would be 20% of the available memory – in this case I’m not using resource governor so I would fall into the default pool and in this case, I’ve set 16GB as the max server memory for this specific instance. So when I do this, and let me let it execute – so you can see it’s absolutely bogus data here, the point was really to create a memory condition. So I as I go to the actual execution plan here, I already see that I have warnings and now if I expand on this, I can see that hey, you’ve hit an excessive grant warning here and this will impact the reliability of your server, and see in this case, 600MB, almost 600MB of grant, and in the end I only used 350K. So talk about a misestimation. Obviously, in this case, I fabricated it, but if you find this in the wild, one thing you can do is as you try to at least mitigate it administratively is set max grant percent to whatever, let’s say I would be happy with 0.01 of memory on my server for this query, and if anything else fails, you still have this fallback strategy to deal with these misestimated grants.

Now, I if I look at the properties of my plan, I will see that in terms of optimizer hardware, the properties I have here, the max compiled memory available at compile times so as I’ve shown you in the slide deck previously. And if I go specifically to memory grant info, I am able to see a lot of information here like the granted information – let me bump it up, the granted information here, I can see the max query memory, I can see the max used memory, so these two pieces of data is where – are being taken into account to trigger, or not, that warning and I’m also able to see the max query memory as I’ve said, is part of your resource governor setting if you have set one. I can also set an event session with that query memory grant usage xEvent, so let’s set one. So let me set up this xEvent session and I’m going to run this – the same query in the scope of this collection and let’s see what fires up inside the plan.

So as you can see, obviously I have the same memory grant warning here and looking at the xEvent trace, I can see that this fires a memory grant usage event. Again, remember that it only fires if the grant memory was above 5MB, which in this case it was, and I can see here – let’s say if I’m collecting this kind of event over a larger period of time, this is – being in xEvent is also very lightweight, which means that if I’m firing this every now and again, I may be able to get a gist of whether my workload is kind of suffering from this concurrency problem if you will, often or not. So I can see that this is the granted memory and I can see the used memory here and I can see the usage percent of the granted memory was not even 1%, so it’s very skewed here. So I also have the query plan hash signed and the query hash signed that I can take this, go into my plan cache or even take the plan handle, go into my plan cache, retrieve the plan from there, retrieve the query text from there and then I can go into my test server and what not and kind of find what’s happening with this one. Again, taking advantage of all the information I can have in the actual plan when I run it, when I collect it.

So in terms of warnings, and yes, I’m – almost half of my session is all about Showplan. That’s where we’ve done extensive investments here. In the left, you can see what kind of information you have. For example, for a sort warning up until SQL Server 2016. You kind of saw a sort warning, you went to see the properties of that operator and it told you well, this spilt data using a spill level one, which didn’t tell you a lot unless you went to documentation, and that’s it. You had no other information. So to kind of understand how costly that spill would be or not, even though in principal a spill would be costly anyway, you would maybe have to collect sets statistics I/O, that we’ve seen that we can do away with that kind of collection right now. But in the scope of the sort warning itself, we can now see on the right that it gives you a lot more information. I can see that okay, it spilt with level one, meaning I only needed one pass through the data to build the sort table. But I’ve spilt through 12 threads and I wrote x amount of pages, 4400, and I’ve also used a lot of memory to execute this sort. So I have a lot more complex information on how relevant it is in the context of let’s say a very large and complex plan, for me to address sooner rather than later, the fact that I’m doing a sort spill here.

The same kind of context information was added to hash spills. So you can see on the left there, the warning before was just hey, this hash match spilled with spill level one, oh great, okay, but no more information, and now, since 2016 RTM and we’ve back-ported this to 2014 SP2 and we’re actually moving back a lot of this information that you’re seeing up until now, we’re actually back-porting to an upcoming service pack for SQL Server 2012. So we’ll have as far as much as possible, we’ll have a match in terms of troubleshooting artifacts between different versions.

So you can see, as I was saying, on the right, how much more useful information you get for the context of this hash spill in the newer versions of SQL Server, and again, as we would, that has its respective counterparts in xEvents and you can see in the top, the only kind of information we had, much like Showplan in the scope of the xEvent hash warning was the recursion level. That’s it, that’s the spill level ID, and now we have much more information as you can see, highlighted in the bottom, in more recent versions of SQL Server. And similarly in sort warning, the same thing, so we get the same kind of information, namely, we even get the specific information on what kind of I/O we did on the work table that is supporting in this case, the sort warning. Okay, so let’s move on to another problem space, which is understanding how to detect if our predicates are being used efficiently by the engine to do as little I/O and use as little resources as possible.

So when you’re looking at the query plan and as we’ve seen several examples today, when you see the actual number of rows returned by any operator, that means only that those are the rows output by the operator after whatever predicate needed to be applied was. So those are not the actual number of rows you get from the underlying scan table or seeking on an index or whatever, and that may be a huge thing to look at. This is usually a scenario that is hidden from the actual execution plan. For example, I see a scan or even I see a seek that is underlying doing a range scan, and I see it’s returning ten rows, but it’s taking a long time or I even see a lot of CPU or a lot of I/O if I’m for example, collecting set statistics I/O or nowadays, looking at the I/O statistics for that scan. They kind of don’t match, why am I retrieving ten rows and I’m doing 100,000 reads? So the query plan doesn’t really reflect that very explicitly and so what can we do?

Let me just – before I go into what we’ve done to address this problem space, kind of give you a visual representation of what it means to search without predicate pushdown. So in this case I have here my sales order detail table in my database AdventureWorks, and this is the query that I’m running. So I’m selecting product ID from this table and I have a couple of predicates there, modified table where I have between – with a couple dates and what are the quantity, whether it’s bigger or equal than ten. Now, when I run this query, I can see that I’m doing an index seek here and by the name of the index ix_modifydate and some more, I can see that – or in this case, you’ll see that the modified date is the first column in the T of this index. Meaning that, as you might know, the underlying histogram will be built over modified date. Now in this case, this seek is actually representing a range scan over the modified date part of my predicate.

So between January 2011 and January 2012. And now – so this would be the actual rows returned by that index seek, and now I would need to apply a second operator, in this case a filter to go and take the output of that range scan and be able to filter down the order quantity part of my predicate. So I would have the actual rows be trimmed down to that part of the predicate itself, and I would actually have the output of my query. Now, this is not very efficient, so by default, what SQL Server – yes, so this would be my results. So by default, what SQL Server does as often as possible, and this is actually very difficult to artificially show you an example where SQL Server is not doing this, is pushing down our predicates to the storage engine as much as possible. So instead of needing filter operators and wasting CPU time there to doing, after the fact, filtering all my data, SQL Server tried to resolve that at the storage engine level as much as possible.

So here looking at the same table, I have the same exact query and I’m doing my range scan, but see here, how my range scan is actually already looking – scanning through the data that’s matching both of my predicates. And that’s because this index is first column on modified date, second column on order quantity, so SQL Server detecting this, detecting that I have the right index for my predicate is able to push down both predicates to the storage engine and is able to read only the relevant data for my output, meaning the actual rows that I see in this case, in a perfect scenario, will be the actual rows that were read, and not much difference there. Yes, so I can see that in this plan I actually done away with the filter operator becauseI was able to push down both predicates here.

So let me show you this in action and why this is relevant here. Okay, so I created here – I bumped up the sales order detail table so I won’t do that anymore. Let me clear the cache here. Okay, so in this execution here, in the first one, I will be disallowing non SARGable expressions to be put being pushed down to the storage engine. So, you see here, this is the query that I was running in that visual animation. I’m using here an undocumented trace flag just for the sake of not pushing down the predicate to the storage engine. Like I said, it’s super difficult to actually – at least for me, I couldn’t come up with a case where I could artificially – or naturally, sorry, prevent the storage engine from pushing down the predicate, so that’s why I kind of force it. Please do not for the life of you run this in a production server because you’ll be hurting yourself a lot.

Okay, so now and the second one, I’m allowing whatever SQL Server wants to do by default to do it. So I’m going to grab the actual execution plans for both of them – so I got two execution plans here. These are actually the execution plans we were seeing before. I see here the – my clustered index scan and how can I detect if I was doing a good use or not, of predicate pushdown? In this case, I know I know I was not doing a predicate pushdown.

So what new stuff do we have here? We have – we used to have only estimated number of rows and the actual number of rows, right? And so in this case, the scan was actually scanning everything and I see that the engine was estimated to read the entire table and the actual output was also the entire table. I now have two extra properties here, which is the number of rows read, actually read by the engine, in this case it’s the same so it’s not a very good use of any predicate pushdown because I actually did none, and the estimated rows to be read.

So both of these actually will be present in the cached query plan, meaning for example, if you do some sort of analysis over your plan cache, you can look for where you have a large difference between these two – between the estimated number of rows and the estimated number of rows to be read. That, even from a compilation perspective, will give you already a hint if the engine is expecting to do a good use of predicate pushdown or not.

Now, in the second one, I was able to see that the estimated number of rows was 2100 but the estimated number of rows to be read was still 121,000. So again, although I did use predicate pushdown, not that efficient use of it. Why? Well in this case, remember the first column in the index that was also part of the predicate was the modified date, and it goes – it spills over the tipping point for a seek, so in this case, that just read the entire thing. So this can also tell me that hey, if this predicate is very recurrent, I may need to kind of change my index a little bit to be able to have a more focused use of predicate pushdown. Now, I can see here that although the actual number of rows from this scan were 1600, I actually still had to read through the entire thing. So again, in this case I was using predicate pushdown, but from an I/O perspective, not very relevant. Now, I still have – let me go here and look at these two – I still have a relevant difference. Where is it? So if I go to the root node of my query one, the one I’m not doing the predicate pushdown, I go to query time stats and look here – this ran in 200 milliseconds and my CPU time was 26. Yes, it’s a very small query but try to imagine something that really running in production concurrently over lodger tables really bumps up these numbers.

And on the second one where I did away with the filter, I actually cut down CPU time by 50%, actually a little bit more than that, and the elapsed time was cut down four times. So yes, this is not the perfect example, I will have one in a few minutes – this is not the perfect example where you’ll see predicate pushdown actually relieving your I/O usage, but by the simple fact that I didn’t need to use a filter in this case, I was able to cut down on other kinds of resources like CPU and that also cut down on the elapsed time. So it is still a relevant strategy to allow SQL Server to do this as often as possible.

Okay, so let’s try to unleash predicate pushdown to see it in all its glory and to understand what it’s doing right now in each and every SQL Server that you’re running on. So again, very simple query on AdventureWorks, this one is – so we’ll be looking at where last name is S something and first name will have to be John, and in the second one, I get all the Smiths, whatever starts with J. So let’s look at the actual plan for these ones.

Okay, so I see the output for each one as I would expect, I only have two Johns here, one of them is actually a Smith, on the other I have all the Smiths, whatever starts with a J. Looking at the execution plans, now let’s see information about the range scan. So I’m doing index seeks on both, but as I see by my predicates, I’m actually applying predicates here so I’m actually doing a scan on a range of rows in the scope of this seek. And I’m able to see that in this case, I read 2100 rows and the – I can see the table cardinality also, so I can see that the number of rows that were read were 2100, but the actual number of rows was just two. So not the best use of predicate pushdown per say, although as I saw, I had my table cardinality is about 20,000 so I was still able to – the optimizer looking at the predicate, looking at the existing indexes, was able to do an efficient enough use of I/O.

I can also see here how the estimations were very close in terms of the amount of rows to read in the underlying table – were very close between the estimations and the actual rows that were read. I can see that I/O was estimated 2100, and that’s basically what I did in reality. So again, by scrubbing your cache and looking for this sort of scenarios, may give you a hint of whether you’re doing better or not so efficient use of predicate pushdown.

But let’s look at this one: so in this one, I’m able to see – I can see here that the actual number of rows was 14 and the number of rows read was also 14. And by the way, this is using the same index. And I can even see by my estimations that they were matching, which means that this is a perfect example of using my underlying indexes line efficiently to cut down as much as possible on I/O. In this case, I was able to do a very efficient range scan and in case you were wondering where this is coming from, for example in the first one, where you can see that my estimation was around 2100 rows here. Let’s look at actually the statistic that supports the index I’m currently seeking on.

So if you notice here, I’m actually using a new DMF that we’ve released just now in SQL 2016 SP1 CU2, which is a CCM DB stats histogram. So this will essentially give you the same output as running DBCC show statistics with histogram, but the difference is I’m able to use it programmatically and therefore apply some computations or whatever I want to do with that thing.

So if I look at the range, when I was looking at all the S’s because the last name is the first column in this statistics. Therefore it’s where the histogram is built on, that my first query look at the last name predicate so I can apply the same predicate to my query that’s looking at the histogram and actually get a feel of how many rows are coming from there. So pretty similar, and I know that for that query, I was actually looking at the histogram and retrieving the estimations from there.

Now, remember the second one? Let’s look at it again. This one I was estimating 14 rows to be read overall, which is actually the actual number of rows I got. Sorry, I was estimating 35, and that was also the estimation of rows to be read, 35, right? So let’s keep this number in mind, 35.32. Now, let me look at the histogram for – so remember the last name here I’m looking for Smith, let me look at the histogram to see if I’m getting the 35 rows from there. Well no, I kind of have 103 rows that cover all the Smiths, so I’m not getting a direct estimation from the histogram itself. So where is this coming from? I can do something like – okay, so where is that coming from? For that, we need to dive a little bit deeper.

So for example, I’m able to use the query optimizer estimate cardinality xEvent to actually see where that is coming from. So I’m going to create this xEvents session, now I’m going to run it, run the same query where last name equals Smith and first name anything starting with a J, and let’s see the output here. Got it – still have my 35.32 estimated number of rows there, and let’s look at the output of this.

So I can see here from the stats collection that I have a – I can see the estimation already here. So I can look at the calculated list, I can see that hey, I was using exponential back off, this was the calculator used for this one. Here are the selectivities that were used for each of the predicates. So for last name, I loaded up stats ID three and this is the selectivity I got and that’s coming directly from the stats, but now I’ve calculated a try-based selectivity for my first name column inside the person table.

So what does it all mean? I’m using exponential back off, that’s the relevant piece of data, here are my selectivities, 0.005 and 0.118, so how do I use this to understand where that came from? So I can do something like this: so exponential back off is as simple as taking the – in this case, I’m looking at the lowest selectivity times the square root of the second lowest selectivity and I will be doing this until the fourth one times the table cardinality, I’ve seen before, it’s 19972. So if I run this, I actually get the round up number of the selectivity I got. So this is where my selectivity is coming from. So I’m able to get all sorts of relevant information just by looking at the execution plan and understanding these differences between estimated rows and estimated rows to be read. The number of actual rows coming after the predicate was applied versus the overall number of rows that I had to read to resolve whatever predicate, and that becomes very relevant when we’re looking at these sort of scenarios.

So in the final – we’re approaching the end, I’ll actually take the full 90 minutes. To move a little bit – analyzing specific query performance, let’s move back a little bit. This is something that I’m sure possibly every DBA in the call, if any, has experienced. Someone calls, my application is slow, all eyes are on me, everyone’s pointing at the DBA to fix whatever problem there may be there and he doesn’t even know what’s happening. So possibly you’ll grab your toolbox, running TSS diag, running your pre-customized xEvent traces, maybe running profiler – hopefully not. But the point is all of that is collecting data after the fact. Meaning as the problem is occurring, you are collecting data so that you can then take that data, go somewhere else, re-pro, do some analysis and then if you find a mitigation, deploy it. But while that’s happening, you have possibly someone screaming at you because they are desperate, “Hey, our business is down because there’s some problem with the database and we can’t put our finger to it.”

So what if I could do actually live query troubleshooting. And although yes, you might have understood at this point I’m talking about live query stats, it’s much more than that. I’ll show you why. So the point is to have the ability to do in-flight query execution stats, we need to enable something that is actually a mouthful: the query execution statistics profile infrastructure. This needs to be enabled on demand, in the engine to be able to start up tracking live query execution progress. But it has a payment, it’s a costly proposal. When we enable this profile infrastructure, the cost overhead can go up 75%. That’s what we’ve measured with TBCC-H kind of workloads.

So if I’m trying to do live troubleshooting in already brought down system, it makes bad things actually worse. So this is why customers, and I’m sure that you guys are not running this older time in your servers, and you’ll still revert back to that pattern of trying to collect data, moving it to some other server that has the same database and trying to see if there’s something wrong with the plans, if an index would help you there or whatever other mitigation you’re trying to deploy because you don’t have that ability to do low cost live troubleshooting in your production servers.

So, enter lightweight profiling. This is what we’ve – the kind of situation we’ve unleashed. There is the same red alert, everyone calls in, my application is slow, everyone is running at their DBA, but now the DBA can tap into an in-flight execution, find a hot spot, deploy a mitigation and did away with a lot of time needed moving back and forth some data and going to a second server and understanding what was going on, and why is this possible?

So what’s the impact with running like query troubleshooting – so live query statistics, with this new lightweight profiling infrastructure? So as you can see on the screen now, and this was measured with TPCC-like workloads, running the normal – I’ll call it here legacy but that shouldn’t be, it’s a normal because it will exist side by side – the normal profiling infrastructure, with no running xEvents, just running the infrastructure, will have an overhead up to 75%, as I’ve said. In service pack one – 2016S service pack one, we’ve made enhancements to side-by-side lightweight profiling infrastructure that has an overhead that we’ve measured, between 1.5% to 2%. Now, for the majority of workloads out there, if you’re not running already CPU bound and if your workload for example, a trade market – stock application for example, that needs very low latency and even adding 1% there is already an overhead.

So if you’re not running in this kind of edge scenarios, our recommendation is have this profiling infrastructure active all the time because it unleashes the possibility of going in and doing live troubleshooting with a very, very small overhead, as you can see here, up to 2%, which most workloads can survive with a 2% overhead on this.

Now, let me show you how you can actually do this in minutes. Okay, so let’s say I’m running some workload, I’m the DBA of this system and I’m experiencing this scenario there. Someone is calling me saying hey, my system is down, or that part of my application is very slow, please fix my problem. So what I can do is I can go in, in this case, let’s say I’m using activity monitor, I don’t know how many of you use this or not, just for the kick of it. So I’m using activity monitor here, let me look at active expensive queries – oh wow, okay, few here. I can look at any query here, right click and show live execution plan.

So I’m able to see – right click, show live execution plan, so this is what enables me by running that lightweight profiling infrastructure all the time, I can tap into any running execution, in this case I’m using activity monitor, there are obviously ways to do it with DMVs. Show live execution plan and – so I’m looking for the query that someone is complaining is taking a lot of time. Look at this one – elapsed time is already huge, a lot of logical reads, a lot of allocated memory here, so right click, show live execution plan – and while I was talking I lost my query. Okay, here it is.

Brent Ozar: This is exactly what a real life one is too.

Pedro Lopes: Found it, okay, this doesn’t look good, right? Just look at these estimations here. This is ginormous, right? Probably something is less than right with this execution here. I’m estimated a gigantic number of executions, the operator cost is through the roof. Yes, this actually doesn’t look good. And as I navigate through the plan – okay, I see it’s table spool here, I see all sorts of wrong with this plan, right? And this is the plan that someone was complaining of. So what do I have here? Also have a gigantic estimated number of rows, the subsidiary cost is huge, so yes, that’s probably the thing I need to be looking at.

Let me look at the query. So one thing I can do is – by the way, this button, if you miss it, it’s everywhere in SSMS. You can just click on it and have a properly formed query appear so you can add it. Okay, so this is the query I’m running. Oh, okay, my Devs don’t have a lot of faith in a new CE apparently. Someone deployed this and they’re using the old CE. I’m a bit more optimistic, let me try to actually go to the right database and run this. I’ll leave out the forcing of the old CE there. Oh, I’m done? Not even one second, and okay, this was done also, but it took a – but it took several minutes to run. It’s hurting my performance in production and I was able to with this case, see well, just do away with this trace flag and you’ll be fine. My execution plan is much simpler, it does not have the – for example if we go here, I was looking at this nested loop with a lot of miss-estimations all throughout the three and I can see here that I’ve changed – the join order is different, the type of joins is different, all the plan is not suffering from huge differences between estimated rows and actual rows. For example, here I see almost 300 actual rows and 500 estimated rows. So it’s fairly close and the plan just – I just was able to in this very simple case, do live query troubleshooting just by having the power to go in and see live what’s happening in my server.

Now, how did I do that? Live query troubleshooting, it can be – yes, so before I go there, just a quick difference. You still have regular profiling that is fine to get full run time statistics for a query plan, you can enable processional globally – actually if I click in – let’s say I’m SSMS and I click the button show live query stats for that query that I have in my window right now, so not through the DMVs, not through activity monitor like I saw now, I’m still using the normal profiling, or the regular profiling. I’m using lightweight, if I enable it and then I go to the DMVs or in this case I go to activity monitor and I’m able to plug in to a running execution. There’s one caveat with using lightweight profiling is that as you can see on the screen here, we do not do CPU tracking. That has a huge cost, so we still track I/O, we still get all the other execution statistics that I’ve been showing you so far, apart from CPU.

So if you really need to track your queries and get the CPU time, at any point in time, then you need to enable regular profiling, although it has that huge overhead. So again, I fall back to me previous recommendation of using lightweight profiling all the time. You can enable it by – for example, with trace flag 7412. It’s a fully documented trace flag, you can go to your trace flags docs page and see it there. I also have a blog post on this that has other ways of enabling both the regular and the lightweight profiling infrastructure. So you can get more details there.

So we’re almost at the top of my time. Just I want to leave you with some other thoughts before we close here. The Query Store UY you might be familiar with it at this point in time, we have several let’s say, features inside Query Store UY itself, for example, the ability to look at two plans for the same query hash and compare them both. And like we’re seeing on the screen here, I’m able to understand what there is, I have similar, what is different, I can even see in the properties window here between a couple of plans that I’m looking at, which properties are different, because I have this mathematical different symbol here. In this case, I can see the cache plan size is different, I can also see the CE version between these two plans is different, so I have a number of important data between – different between the two plans that just jumps out to me and we’ve enabled this in the scope of Query Store, but also stand alone.

Just some more investments we’re doing in SSMS itself, for example, in the current release, the latest V17 that you can download, Showplan comparison already has supports for multi-statement Showplans, meaning if I have in the previous versions a – if I’m looking at a plan that has multiple statements, I was only able to compare the first statement. Now think of where the first statement is a used database; that pretty much killed it, right? So we have support from multi-statement, we now have as we’ve seen, the operator level performance stats is shown in the properties window in Showplan. If you’re still using an older version of SSMS you won’t see that, you need to go and look at the XML itself.

Query Store, on all the reports there, we’ve added the ability to filter by number of different plans. For example, I just want to filter for anything that has two or more plans. So I’m able to see that kind of scenario right there just by filtering on Query Store on any kind of report. Upcoming, and you can already see it in the RC versions of V17, we have a couple of new reports in Query Store. You have their queries which force plans, for example, if you’re using Query Store and you forced a few plans, it was kind of cumbersome for you to quickly understand hey, which plans are forced and how can I do something about them? So we now added a new report that will show you all the forced plans. And we also have something else baked into SSMS, which is the ability to open up a plan and start introducing something we call query analysis scenarios. We will look at the plan and we will give you some hints on what you can improve, or try to improve your plan by looking at some heuristics in the plan itself. For example, I see CE differences here. Please go in and try to update statistics or maybe this has something to do with an input parameter. Go and see if we’re doing some conversion there.

So we’re starting to build intelligence into SSMS that you can use with any version obviously, and that will ship already in V17. So I still had a demo on Query Store and plan comparison, I’m on the top of my time, so I’ll skip that one.

Hopefully I’ve been able to give you enough information about all the stuff that’s new in Showplan and in terms of troubleshooting and diagnostics artifacts that you can leverage for multiple scenarios, and I’ve shown you how you can use some of them to gain the specific kind of insights to then take the next step, which is to mitigate it. I’ll leave you just with a few bookmarks. Our GitHub has a ton of solutions there, we have also our team blogs, like I said before, for the lightweight profiling you can get a lot more insight if you look at a blog post I’ve done recently, so there’s also the trace flags here and thank you for all that were able to stick until the end.

Brent Ozar: Very cool, thank you, sir, and lots of people chiming in with they’re going to use this on query tuning. So good stuff, very good stuff.

Pedro Lopes: That’s what we like to hear. Please, if you have any feedback, through the blog, through Twitter, reach back to us. We love to get feedback, we love to also understand what’s working right but also what’s working wrong and we can try to address it. So thank you very much.

Brent Ozar: Well, I mean I would just like to say, in terms of feedback, I know especially with Erik being on here too, the new stuff that’s in execution plans is fantastic, and same thing with the memory grant fields and the DMVs. It’s just been wonderful. It makes performance tuning so much easier. You guys are amazing.

Pedro Lopes: Well, on behalf of the team, thank you very much.

Erik Darling: You mind if I ask you a quick question about V Next?

Pedro Lopes: Of course, if I can answer it.

Erik Darling: I got the joins.

Pedro Lopes: Yes.

Erik Darling: There’s a new query plan operator for those.

Pedro Lopes: Yes.

Erik Darling: Are those going to be in cache plans or only in actual plans?

Pedro Lopes: That is a run time…

Erik Darling: Bummer. Is there going to be anything in the plan that says if there was an adaptive join use or is that only one-time stuff? It’s fine if you don’t – I’m just poking. You can’t say.

Pedro Lopes: Yes.

Brent Ozar: Cool, well thank you so much. Pedro Lopes everybody, fantastic, good job, round of virtual applause. Thank you so much man, that was fantastic. Thanks for volunteering your time, we really appreciate it.

The following two tabs change content below.
Pedro Lopes is a Senior Program Manager in the Microsoft Data Group, with special focus on the Relational Engine for all in-market versions of SQL Server. He has over 13y of experience with SQL Server, and before moving to the US he worked as a Premier Field Engineer based in Europe, supporting mission critical deployments of SQL Server. He blogs about all things SQL at and, the latter home to AdaptiveIndexDefrag, where he also publishes the SQL Swiss Army Knife script series for everyday troubleshooting and performance analysis.
Previous Post
Worst Practices & Less Known Limitations for Columnstore Indexes
Next Post
Crash Course on Better SQL Development

2 Comments. Leave new

This seems similar to “SQL Server 2016 Features for Performance Tuning Lovers”. If this is different, the abstract should make that more obvious.


I consider myself to be good at performance tuning and spend a lot of my time doing it, but I know there’s much more for me to learn even in the 2012 I use all day. I haven’t worked with 2016 yet, and am interested in what new goodies await me. I don’t want just a list of what’s new, I want to see how to use the new features effectively. So I hope there will be good demos.


Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.