Enjoy the Podcast?
Are you a:
- Production DBA that wants to know which new xEvents and DMVs/DMFs are available to get further insights into SQL Server operations?
- Production DBA that needs to troubleshoot a performance issue, and needs to collect performance data for others to analyze?
- Developer that needs to analyze query performance and already has basic knowledge of how to read an execution plan?
Then this session is for you!
It just works – performance and scale in SQL Server 2016 database engine and what is being added to in-market versions. This session will showcase several improvements in SQL Server, focusing on the latest enhancements that address some of the most common customer pain points the Database Engine, involving tempdb, new CE, memory management, T-SQL constructs as well as diagnostics for troubleshooting query plans, memory grants, and backup/restore. Understand these changes in performance and scale, and the new and improved diagnostics for faster troubleshooting and mitigation.
Why I Want to Present This Session:
Evangelize the power of SQL Server 2016 and what possibilities it unlocks! The SQL Server engineering team has been adding so much to SQL Server in terms of new capabilities and tools to improve scalability and diagnostics, that we want to share how it all comes together.
Brent Ozar: Well, with no further ado, ladies and gentlemen, welcome Pedro Lopes for our last session today, SQL Server 2016 Service Pack 1 Enhancements that will make your database engine roar. Take it away, Pedro.
Pedro Lopes: Well, hello everyone. Probably some of you have seen some flavor of this session before if you have attended in person in some venue. My name is Pedro Lopes, I am a senior PM with the SQL Server target team. We’re a part of the SQL Server engineering group, and we specifically own every in-market version of SQL Server, which right now means 2012 to 2016. Soon to be 2017.
So, let me dive right into it. We’re talking about specifically enhancements we’ve done in Service Pack 1, which has been out for a few months. We’ve already announced Service Pack 2, will come – we don’t have a specific date yet, but it will come very soon, at which point we will probably have a big overhaul of this session specifically. But let’s dive into this one.
So, one of the main changes that we’ve introduced in Service Pack 1 stemmed from the fact that we were observing a number of trends with a lot of our customers. Mainly it’s very usual that businesses grow from small, maybe a couple servers, small shop, and then grow into larger businesses, which means that users and developers specifically need the flexibility of not tying specifically their code, or the way they code their applications to a specific skew of SQL Server that typically has to do with the capabilities of the engine itself, in terms of scalability for example.
So, it was very usual that we saw customers that had their initial deployment targets be Standard Edition to keep the cost low because they probably had a small ISV that works on a niche market for example, or I’m developing for my medium or small company, and I need to use Standard Edition to keep the cost low. But as my business grows, so would potentially my need to scale up SQL Server, but we had a number of features that were available only in Enterprise Edition. So, we – because initial development was running Standard Edition for example, or even in Express, Standard means that whatever features were used in terms of development, features that we could use in our application that the database layer were bound to Standard Edition. And if the business grew enough in the sense that it would require to scale up SQL Server capabilities and therefore migrate to Enterprise Edition, we found that a lot of customers were actually reluctant to then go in, and because we’ve changed editions, let me know modify my application to make use of these other features. The example would be using disk-based tables for OLTP workloads when I could instead use for 2016 for example, in-memory OLTP for such workloads, and these were bound by edition.
I mentioned ISVs, that was a major target also. One thing we’ve observed, there’s – if an ISV targets Enterprise Edition, then it may be precluded from using the same code base to – customers that deploy lower editions of SQL Server, so they have some limitation in terms of the spectrum of customers where they could or not deploy their application to scale up or not. And obviously, that’s more important for ISVs than any other. The ability to maintain a single code base is super important here. So, to have a single code base that is not dependent necessarily on the engine edition that you are deploying.
So, we set out to remove these limitations, which led to something – it’s a mouthful – that we call the Common Programmability Surface Area, which means that a number of programmability objects if you will, were unlocked for a lower SQL Server editions. It’s the ability to have an optimal design, which means that you can think and architect your application, thinking of the workload and the capabilities of the engine that don’t necessarily have to do with the skew that you are deploying, and also the freedom to deploy the same application, the same code base in any SQL Server edition. So, you will only – or any customer will choose SQL Server edition that fits their requirements of performance, scalability, even high availability, but not according to the features they want to put into their programmability objects; the code that runs the workload on the applications themselves.
So, this is just a – give a quick gist of the differences in terms of features, surface area, between RTM SQL Server 2016 and previous editions obviously, and Service Pack 1. As you can see here, for the most part, all these features are available across many versions of the engine. You see two exceptions there because those have to do with the architecture of the engine itself. For example, for Express Edition and LocalDB, we don’t have changes of the capture because that requires SQL Server agent, and that’s not part of the Express Edition, nor LocalDB. The same limitation or another limitation happens within memory OLTP in LocalDB because as you might know, when you create an in-memory table, you are – you need to create that on a specific file group that is enabled for in-memory tables and that relies upon FileStream technology, which is not available in LocalDB because LocalDB does not have enough permissions at the OS level.
So those are kind of the two exceptions there. Anything else as you can see on the right side of the table here, we’ve unleashed or unlocked all these feature usages across several editions. You may find, if you go and look at the documentation, that you have the choice to use all these features. Some of them will – for example, the use of column store made different in terms of performance scalability between for example, Standard Edition and Enterprise, but the use of the feature is there. So, if I’m maintaining a single code base, I don’t need to be concerned about those topics that have to do with scalability because those will be directly tied to the engine skew that I’m deploying, and those will be automatically tackled by the engine if you will.
By the way, at any point Brent, I’m not actually seeing the chat window, so feel free to interrupt me with questions if there are any.
Brent Ozar: Yes, I can throw – you want to see the chat window? I can put it up there if you want.
Pedro Lopes: Yes, sure.
Brent Ozar: Hold on a second, I’ll make you the organizer too. So now you should be able to see the questions pane.
Pedro Lopes: This is a previous question, right?
Brent Ozar: Everything that’s in there is previous, yes, you’re good. You can ignore anything that’s in there so far.
Pedro Lopes: Do we have anyone in this session?
Brent Ozar: Yes, like a hundred – 95.
Pedro Lopes: Just checking. So, I’m dividing this session mainly in two parts, the storage engine and the relational engine. I’m specifically the PM for relational engine in SQL Server in market, but the storage engine and relational engine, so how the queries get executed and how the access methods return our data are very intertwined, so I kind of like to talk about the whole storage.
So, for storage engine, specifically in Service Pack 1, we kind of set out – in one of the enhancements we’ve done, we kind of set out to answer a question. Does the user ever need to, or customer ever need to create a copy of the database? Let’s say I need to troubleshoot and this kind of becomes our bread and butter here. We do a lot of troubleshooting exercises with a number of customers, as does our support. And part of that actually has to do with troubleshooting query performance, as you do Brent, and other folks on the call. So, the ability to actually look at a query plan with actual runtime stats is super important.
If you can’t look with runtime stats, at least understanding how the optimizer dealt with a specific plan and how that specific plan came to be is important, then that stems from a number of metadata that exists around the engine, which I’ll talk about later on in this session.
So, my point being that having that database readily available for us to iterate in troubleshoot for example, is important from that perspective. Now, if we’re talking about a database that has terabytes or it has millions of objects, and I’m talking millions of tables, stored procedures, whatever, that becomes a big bottleneck because we really can’t take a backup of a multi-terabyte database and send over the wire to someone just because they need to restore and do some query troubleshooting.
So, this is a scenario that we set out to resolve. Another one has to do simply with testing. The ability to get the database schema, deploy on my test server for example, populate with dummy data, but I still keep exactly the same schema, I’m making sure I have the same schema on which I can run functional and performance tests. Also becomes important at certain points of a lifetime of the application.
So, we set out to do something about this, and we introduced the command DBCC CLONEDATABASE. Now, this was built to be fast, as minimally invasive as possible, and provide a consistent schema only copy of the database. So, we introduce it first back in SQL 2014 SP2. It wasn’t in 2016 RTM, and that was only introduced in 2016 SP1.
We also enhanced it in 2016 SP1 to support a number of objects that were not available when we released the same command, DBCC CLONEDATABASE, in the previous version. Support to bring in CLR objects, FileStream, in-memory table obviously, and this becomes very important when you’re talking about 2016 and above, the ability to bring in the query store if you’re using query store in your specific database, you can bring in query store with the schema only copy of the database. And again, because query store is like a flight recorder for your workload, it becomes very relevant when you’re doing a performance troubleshooting, specifically query performance troubleshooting.
So, these were the enhancements made to clone database in 2016 SP1. Now, I do call out for that KB, you can see on the screen now. We don’t support – let me try to clarify what I mean by not supported database. The way DBCC CLONEDATABASE works is when it’s a very simply syntax – I’m going to show you in the next slide – but when you run it, you actually generate another database in your server, which is the clone of the previous one but just with the metadata. And then you can back it up and send it to someone or restore it on your test server, whatever you want to do with it.
The point being that at this point in time, because we still don’t have a 100% reliable copy of all metadata objects from the source to this clone; for example, we are missing statistics over column store indexes. We have a published script as a workaround to as you generate the clone, then you can still add the statistics from column store from the source to your clone, but again, this is a workaround. So, we are still working to the point where we will have 100% coverage of all the potential objects that may exist in a database between the source and its clone. So that’s one of the reasons we don’t support it as a production database because let’s say you clone a database and there is some type of object that we don’t clone, and if you’re relying on your clone database to then run some part of your business, well, that may yield strange results or unwarranted behaviors. So that’s why we don’t support this in production as of yet.
Just to give you a few numbers here because numbers are obviously important, we want to understand why have you released this command when you have other ways of cloning a database. Let’s look here, and I’m going to actually talk more about the last example you see on the table there. Very large ERP database, very well-known ISV that makes up this software, and we were attempting to clone database or to get a schema only copy of a database that had over one million objects. Now, when you have the option of scripting out a database with data – sorry, with the metadata in SSMS, do keep in mind SSMS is a 32-bit application, therefore it is limited in the memory. It can use – so we – not long after we got out of memory error from SSMS and SSMS crashed. It was just a very large database with too many objects to script out, and we run it out of memory.
Now, we also have another method of cloning a database to create the schema only copy of a database that has been out there for some time, and we’ve previously used to actually generate the schema only copy of the database if a customer came to us with a performance issue and we needed to run the same type of exercise I mentioned earlier.
But as you can see here, it took quite a few minutes to run sp_clonedatabase in a database that had over one million objects. So, it was still a hurdle because you had to wait a long time before you were able to get the schema only copy of the database. Now, with DBCC CLONEDATABASE, because this works at the access method layer, it’s not working in SMO or any high level over the engine kind of artifact, we’re able to do this kinds of operation much faster, and for the same database, multi-terabyte, over one million objects, we’re able to do it just over two minutes, which is a 60 times improvement over the method of cloning with sp_clonedatabase.
So, what I mean by this is it now becomes easier, and again, I’m referring back to the troubleshooting aspect of it, although testing also works. But when you need to troubleshoot, you need – and if you actually need the metadata from your source database with all the tables, statistics, indexes and what not, you need to generate this fast and with minimal intrusion, if you will, in terms of resource usage in your source system.
So, we – by doing it at the access method level, we are able to get extreme performance for generating this very large database clone. Let me rephrase that. The clone is very small actually, just a few megabytes, but you’re able to iterate through a ginormous database with more than a million objects as you can see here. Very, very quick.
So, let me move on from here. I said I would show you a few test cases or a few ways you can use clone database. The syntax is very simple, as you can see here. DBCC CLONEDATABASE give me a source name, a target name and that’s it. I will see a new database created in my system, which will be tied as clone, and it will be read only by the way, and that’s it. So, I will generate with all the statistics, with query store metadata, and I’m done. But for example, if I want the schema and query store only clone, I have a few options I can do here. For example, as you may know, statistics in the histograms do have the potential of having data because let’s say I’m creating an index over a table that has a first name column. In the histogram, I will see a few names of people that are in that table in that column that make up the several buckets in my histogram that allow me to – or allows the engine to check the selectivity and a number of other things based on that.
Well, if you need to provide a clone of the database, let’s say for whatever purpose, but you have some limitations in the kind of data you are allowing to give to someone else, you may not want to include statistics in that clone. This is more like for functional tests for example, because if you would do performance troubleshooting, you would need the statistics. But if I’m doing performance functional tests and I need to copy the database, then this is enough. I don’t need the statistics there, so I can create the clone with that – with no statistics keyword. Or for example, if I need to run actually performance testing, which means that I would need the statistics there to kind of simulate the quality of the plans in my source system, but I don’t want query store because I’m generating a new set of performance test cases. So, I could do without query store as you can see in blue on the screen there.
Or if I actually don’t want statistics nor query store, let’s say I just want to generate the schema only copy of the database, my purpose there is to populate with dummy data so that I can run some kind of test that have nothing to do with how the data looks like in the source system. Well, then you can do without statistics and without query store, and still generate that clone very, very easily.
These are just some of the improvements here. In terms of supportability improvements that we’ve also added to Service Pack 1, for example, in the sys.dm_os_sys_info, we are now allowing a very easy way for DBAs to understand the lock pages in memory privileges been given to SQL Server account or not. This used to be very often kind of privilege that you would give to SQL Server account back for example in Windows server 2003 and even 2008, because of the way the Windows memory manager worked, we could have the potential of paging out much easier than we do nowadays in Windows server 2012 and above.
Still, if you want to make sure the OS cannot page out part of your SQL Server process, which if you’re running low on memory that may be a very good idea, you can see if the SQL Server account already has a privilege or not. If it doesn’t, you can act accordingly. So, it becomes as simple as running a query over sys.dm_os_sys_info, and you’ll get an indication of which memory model you are using right there. Before then, you would have to run some WMI query or use some UI tool to be able to understand that, or PowerShell.
Another one is the ability to understand if the SQL Server Service account has the proper permission to use the Instant File Initialization feature. This is very common to use out there in the wild. It is in fact an efficiency we can take advantage of, an efficiency that’s part of Windows OS, that SQL Server can take advantage of when it’s for example growing files, data files specifically. So again, you would need to use some kind of UI or PowerShell or WMI query to understand if this was available, which meant for a DBA, going outside the scope of SSMS or the tools that a DBA usually uses. More so, if you are in an organization that has a very strict separation of duties between a DBA and a sys admin, in which case you probably wouldn’t be able to even go into the OS and look at the permissions of the SQL Server account to understand if those two permissions that you really – that any DBA will want to understand their status would be able to understand.
So, by adding this data to the CCM server services DMV, we’re now able to see, as you can see on the screenshot right here, that Instant File Initialization has not been enabled in this case, for this default server here. So, if you’re using this as part of any script that collects information over an instance, again, this also becomes a data point that’s relevant to consume.
For this next one, well I’m purposely leaving those fonts very, very small. It’s not for you to read the information there, but rather for us to just show to you another way that we always listen to customers and frustrations and pains when we are thinking about what to improve next in the engine for in-market versions, which then obviously for the very nature of versioning, also gets to higher versions of SQL. So, in this case, if you are using in-memory OLTP in SQL Server 2016 RTM or any update after that but before Service Pack 1, more often than not, you will see your error log kind of polluted, if you will, with a number of debug messages that we kind of left there. But if you are making a heavy use of this feature, then you would see that your error log would be too noisy and maybe if you rely on the error log to look in relevant information, you would kind of be distracted by that. So, in Service Pack 1, just made sure that those noisy messages are turned off by default. So, as you can see in the bottom of the screen there, for the same operations running on OLTP – in-memory OLTP if you will, you see much less information from there.
Okay, so more supportability improvements that we’ve added to the storage engine have to do with understanding the status quo of your tempdb. So now starting with SP1, you will see the error log message regarding tempdb to have more information there. You can see here the number of tempdb files, you can see here if any of the tempdb data files has a different size or a different auto-growth setting from other files in tempdb, and this boils down to the fact that in tempdb, more – even than any other database, you will want to have the same – all the same data files growing at – if they need to grow, obviously, I would expect you to manage it manually – but if they need to grow, grow at the specific auto-growth setting that is the same across all files. But also, that the size of those files remains the same, because that’s the way you can use more efficiently the proportional field algorithm. So those messages became part of error log.
So, a few other performance enhancements have to do with migrating large data sets. And this is a statement from the – there’s a post in the SAP on SQL Server – no, SQL Server on SAP blog. An enhancement that we’ve done to make migrations of large data sets much faster as it relates to previous versions. So specifically working with SAP, we were able to reduce migration time by about 60% as you can see here, when they were moving data between heaps. And how did you do this?
Well, the bulk inserts specifically will be able to use auto TABLOCK using the trace flag 715, and this was introduced back in previous versions as you can see in the screen there. Enabling this trace flag allows the performance of bulk insert to be much higher than it was before, and I do have some comparisons here.
As you can see on the screen, if you didn’t – this is a database, SAP database using transparent data encryption, which is used by default nowadays, and you can see the difference between not using that trace flag or using the trace flag, both in terms of time to run the operation and in terms of throughput per second. So, as you can see here by enabling trace flag 715 and using auto TABLOCK, we’re able to kind of triple boost if you will, the throughput of rows flowing between heaps in the scenario of SAP migration. So, this is probably something you can use in your own data migrations if you have the same kind of scenario that SAP uses by moving data with bulk insert here.
This is not so much an enhancement; this is more of a change. Here’s the thing; back in SQL Server 2016 RTM, the behavior for inserting to select from something was different between user tables and local temp tables. Essentially, yes, this was a bug that was there by default. So, in essence, if you were running the kind of inserting to select from operation on user tables, a parallel insert would require a TABLOCK hint to be able to run – to do those parallel inserts. But if you were doing this kind of operations by inserting into a local temp table, then the parallel insert was there by default, whether you used or not the TABLOCK hint.
Now, this was an inconsistency of behavior when using inserting to select from, depending on the type of underlying options. So, we’ve kind of resolved that in 2016 SP1, so to be able to use – to be able to run parallel inserts in that kind of operation, inserting to select from, you will need to hint the query with TABLOCK. But if you do so, you will see that you’ll be able to do parallel inserts here, and in that sense, if you are ingesting large data sets and you can take the TABLOCK hint, you will see that the runtime for that data ingestion will be much smaller.
And this is just because if you want administratively to just turn off or disable parallelism during inserts, whether you’re using TABLOCK hint in some query or not, DBA still has the privilege obviously of turning on this trace flag and disabling parallelism and insert. By the way, this trace flag and a number of others, we’ve been having even as recently as this week, we’ve documented yet another trace flag in the trace flag page in Books Online. Just to make sure those are as much as possible the reference in terms of what trace flags exist in SQL Server.
And we do have a KB on that. So, let me just give you a quick peek on – how am I doing on time? I still have to speed up.
Brent Ozar: Well, remember you can go up to 90 minutes here if you wanted.
Pedro Lopes: I can?
Brent Ozar: Yes.
Pedro Lopes: I’ll probably take close to that. Thank you.
Brent Ozar: There you go.
Pedro Lopes: I was imagining 60, I don’t know why.
Brent Ozar: I know, everybody thinks that.
Pedro Lopes: So, let me give you a quick peek on what’s coming next for storage engine. Next means not only SQL Server 2017, but also when applicable, a future service pack, in this case, for SQL Server 2016 for example. So, we’re trying to make sure that backups become smarter and whether you’re using your own flavor of script or you’re using actually the OS solution or you’re using some vendor application to manage your backups, as long as they support this kind of operation, you’ll be able to manage your backups in a smarter way.
What do I mean by this? So, look on this slide and what I have here is a week – a workweek, and my backup strategy, which actually should be a restore strategy, let’s not even go there. But the way I’m doing backups, let me rephrase like that, is I have my differential backups starting on Monday, and let’s say 10% of my database changed, so my first differential backup, and I’m assuming here that I’m taking my full backups on Sunday, obviously. So, I’m taking differential backups every day of the week, but it comes to a point – in this case, on Wednesday already – I already have most of my database changed and still I continue to do differential backups throughout the rest of the week, which will basically be full backups.
The caveat being that when you actually need to restore, your restore chain is now too long and it will impact your recovery time objective. Why? Well, because you obviously need to restore first a full backup, which was taken on Sunday, and then you need to jump into the last differential backup, which now is exactly the same size as your full database backup, or even bigger because the database actually grew, and any transactional log backups you have after that one. So, by – a better way of managing this would be on Wednesday, I should already be doing another full backup, because I’m already changing too much of the database to justify continuing with differential backups, and let’s say I would have done my full backup here on the Wednesday, that meant that on Thursday I would probably be back at 10%, at Friday I’ll also probably be back at 85% – 40% sorry, and on Saturday again, another full backup. If I had a disaster after this, for example, based on the last full backup I would have taken on Wednesday, I would restore that, a 40% size of the original backup restored here and then any transactional log backups. So just making it smaller and I would have a shorter RTO.
Now, we’re attempting to make differential backups smarter because we’re now reporting in the DMV CCMDB space usage how many extents were modified since the last full backup. So, a backup solution that starts to take this into account will be able to make a call at any given point in time, if too many, and too many would be your own flavor obviously, we’re just providing the data. But let’s say my rule of thumb would be if 50% of the database changes, then I don’t do a differential, but rather do a full backup. Well, at this point in time, by resorting to the data that’s made available here, my backup solution will be able to understand okay, from the full number of extents in the database, I’ve already changed 50-whatever-percent, it’s time now to do instead of the full backup, I would – sorry, differential backup, I would be doing – now I’m switching to full backup. So, it just makes this smarter.
In all essence, what this means is I’ve taken my full backup on Sunday, I’ve done my 10% changes on Monday, so differential backup, another differential backup on Tuesday because I’ve changed 40% of my database, but now on Wednesday, because I’ve already changed 85%, I’m now doing a full backup instead of a differential backup, which means that then by Thursday I probably only have 10% changes again. If this was obviously a very predictable data changes in my database, I would have 40% again on Friday, which would mean that on Saturday I would have also possibly doing a differential backup. And with this ability, I’m now being smarter on when to take my backups, what type of backups, now keeping my recovery time objective in mind, rather than the size of the backups themselves, or the cadence in which I do database backups.
So, I would have not only faster restores, but I would be saving on storage, because if you look above, you will see a number of days where I’m actually taking differential backups that are 100% of the database size, whereas by using this ability and making sure my script solution, backup solutions are more adaptable, you’ll be able to also have some storage saving here.
That had to do with full database backups. Now, we’ve also done some enhancements to allow a smarter transaction log backup. Now you see here a timeline, and in this specific timeline, as my workload progresses and as I get more log records into my transaction log, you would see these lines kind of represent size of a transaction log backup. So, you see the size – I’m taking backups every 15 minutes, I’m taking at 15-minute intervals, but obviously depending on the workload that’s been running in the previous 15 minutes, I will have different sizes of my transaction log backup, right? And if I don’t backup in time if you will, for my log to be able to circulate, to circle through, then I will have some autogrowth kicking in here, which will obviously also delay my workload itself.
So, this becomes unpredictable because I can’t really predict the transaction log activity. I will have different sizes of transaction log backups, I will potentially have autogrowth kicking in, so it’s not a good place to be at. So, we’ve introduced a new column in the DMF CCMDB log stats, in which you’re able to understand how much log has been accumulated for that database since the last log backup.
So now for example, you can be smarter about your log backups, and in the same timeline, now I’m able to do my log backups based on the amount of log, and not on some specific intervals that on which I have absolute unpredictability of the size of the resulting log backup. So now I’m able to say for example, let’s say these lines represented 100MB, I’m able to set off whatever script or backup solution that I’m using if they make use of this new information, they can now just trigger a transaction log backup at the time that already accumulated the amount of log that I preset as my required transaction log. So that means that I have consistent transaction log backup size, I can delay autogrowth in the sense that I can now see at what transaction log rate I’m at, I know the full size of my transaction log, and therefore I’m able to – again, if nothing else stands in the way obviously, we have a number of caveats here, that may prevent log truncation, but if nothing else prevents log truncation, you are able to kick off your backups just before your transaction log would have to grow. And that makes it more consistent, but also more predictable on that perspective.
Okay, I’m actually going to skip this demo in the interest of time, but again, like I did last time, I’ll put all my demos in our GitHub, so you’ll be able to go there and take a look through those. Okay, another one that’s coming in is the ability to do selecting to on a specific file group. So, as you can see in the syntax on the screen here, I’m able to – this would be your current way of doing selecting to. You can create this database and then I would select into that specific table from a specific source, and that’s it.
Now, we are adding this syntax that allows you to specify on which file group you want that selecting to kind of drop the rows that you’re moving over. So, it becomes easier to manage if you are – if you actually require different – if you’re managing a lot of database that has multiple filegroups, and you’re able to do some read workload in parts of the database, and in other specifically to selecting to of large datasets.
Something else we’ve done is to improve setup time, the tempdb configuration aspect. So, we’ve – again, we’ve heard the community, this was a change – so the overall change you see here was done in 2016 RTM. It had different limits for a number of options you see here, and we’ve listened to the community and we’ve changed those. So now the max file size for tempdb has been increased to 256GB. We’ve also – so yes, keep aware that larger values will increase setup time, because at this point in time we still don’t have Instant File Initialization turned on and we will warn you, if Instant File Initialization is not enabled, if you had set an initial size over 1GB. So if you have done so, if you haven’t turned Instant File Initialization on, you can still go back in the wizard, there is an option now in setup since 2016 RTM to enable the permission that allows SQL Server to use Instant File Initialization and therefore at this point in setup, you would be growing your files the way you want to and set up the initial size as you want to, without being necessarily running through set up a long time because if those files need to be zeroed out, it could take some time for it. And we also will warn you, if the log size you’ve set for tempdb is over 1GB, and that’s obviously irrespective of Instant File Initialization because the log files cannot use that.
Yet again, still in the storage engine, some changes we’ve done, or we will introduce, this is not in Service Pack 1 obviously, this is future changes, and you can already take advantage of those in 2017, is for example, understanding how your version store is being used within tempdb. So, we’re introducing a new DMV, this is the ENDTRAN version store space usage. It’s quite a mouthful. But in essence, it will allow you to see how much each database, or in the context of each database, how much of the version store in tempdb is being used.
It has a lot more performance than I understand the compatible if you will, way of understanding which in this case, session was using version store, and that’s using sys.dm_db_session_space_usage. With this new DMV, you have not only better performance in retrieving that data, but you do it per database, which means that you are now able to very quickly pinpoint which database for example, is using – not only is using version store, but even maybe that’s using more of the version store, and maybe that is warranted behavior, but maybe that it’s not, but you have that insight there where it wasn’t there before.
Also on transaction log monitoring and diagnostics, maybe you’ve used before DBCC LOGINFO. As a DBCC command, it’s something that – it’s not necessarily a programmability object that you can use in line and derive some insights from there programmatically, and it’s also a command that has some intrusion in terms of the access methods. So, we’ve released this new DMF, the CCMDB LOGINFO; DMF because you need a parameter of the database info, but it will give you a lot of information you would dump DBCC LOGINFO to obtain. You will get for example, the VLF information. This is one of the main outputs that you may be using nowadays DBCC LOGINFO, is to understand how many VLFs might transaction log has, how many of those are active, how many of those are inactive, and even the size of those. You can also see information about checkpoint, recovery information, so a lot of information that is available in DBCC LOGINFO, now is available in a cleaner way in this DMF field.
In terms of some backup improvements that we’ve done, this has to do with the fact that as you may see from the screen here, as the current behavior, or if you are in 2016 RTM, or in 2016 Service Pack 1 or previous versions, you may notice this behavior, whereas as your database size grows, the time it takes to do a backup also increases. And for example, you see exponentially increasing actually, for when you start really getting into the multi-gigabyte area in terms of storage issues. And that compared to SQL Server 2017, it takes less time to do so. Why is that?
Well, if you think when you do a database backup, something you need to do is to for example, scrub the – all the memory, and to understand which pages in the buffer pool are dirty or not, so you need to also take them with the backup obviously. Now, we’ve enhanced the way we do that scrubbing of buffer pool so that it is faster, which means that when you have a very large buffer pool – even, let’s say you have a buffer pool that takes 1TB, but you have only very small databases in your system. Now, we would still need to for the most part, scrub the entire buffer pool to look at the pages that belong to that very small database.
For example, let’s look at the example of 1GB database that has let’s say just one table inside, have almost 1GB. That meant that if I had changed all the pages there, I would need almost two minutes to scrub the memory space to retrieve information about dirty pages to then be able to do the backup here. That goes down to four seconds under SQL Server 2017 for example.
So, this way, maybe some issue that you’ve gone through in your own production server that is kind of similar to this, which is I have a very large buffer pool, very large server in terms of memory capabilities, I don’t have very large databases, why would 1GB database take two minutes to backup? That should be almost instantaneous, right? And yet, this was the outcome that we were seeing. That stems from a number of things, but namely if you think that only in the recent years have servers grown in terms of hardware, and all this method was architected back when we had a few gigabytes of memory.
So nowadays, as you can see, as your database grows, obviously the time to backup also grows, but it’s significantly less than previous versions, so previous to SQL Server 2017, and also in small databases, that’s where you actually see the most gain in this scenario. So, this is something we’ve done here. And just to finish the part of storage engine, improvements we’ve done was to add more information to sys.dm_os_sys_info. For example, it became kind of cumbersome to understand really how many cores do you have on your SQL Server by doing tSQL queries to some runtime system. We had the hyper thread ratio, which is not really a very trustworthy way of understanding how many cores you have in the underlying [inaudible]. So, we’re exposing these three new columns so it becomes much easier and transparent to understand how many sockets I have, how many cores per socket I have in my server, and therefore this information for DBAs is very important, that becomes apparent right there.
For the clone database for 2017, we’re also doing a couple of enhancements, support for full-text indexes for example. And one other topic here, it’s – so when you create a clone database for example in 2016 SP1, if you have a query store running in a specific database, we will actually flush all the runtime statistics that are still in-memory to query store before generating the clone DB. This is something that will happen, but does not happen right now. So, what happens is right now on 2016 SP1, you take a clone database, you may see that some data may be missing from query store; that all depends on the setting you’ve set to flush information from memory to query store storage. But we made sure that in 2017, we’ll actually – if you’re taking a backup clone right now, we’ll flush memory data to storage before actually generating the clone here.
Okay, so moving onto relational engine, which is actually my favorite area of SQL Server, we also set out to fix a few or to address – call it like that because those aren’t exactly fixes, but to address a number of inefficiencies that you had when we’re trying to diagnose performance of query plans. So up until SQL Server 2016, we had a number of very relevant information already in Showplan, so in the execution that you can retrieve from cache, or at runtime.
We already exposed the max memory that was enabled for the query, these add in terms of the brand, be that in terms of the memory for optimization. We also add information about the trace flags that were enabled, this is something we’ve introduced in 2016 and also in the previous service pack for 2014. We also add a memory grant warning that I’ve talked about in previous occasions. But we are still missing a number of other relevant information that we can use to essentially make Showplan almost the one stop shop for anything query performance related.
So, what we’ve done is to add some more information here. We’ve added information about the parameters data type, that may be kind of redundant if you will, if you’re looking at a plan from a stored procedure for example, where not only do you see the – if you have inclusive conversion issues for example. You will see in the inclusive conversion warning what data type you’re converting to, and if it’s a stored procedure or something that you have access to seeing the incoming parameters, you will see the source data type. So that would kind of be redundant to use this information. But for example, if you’re in a case where you’re using prepared SQL, where we’re only gathering the plan from an execution handle, you don’t really have that data. So, understanding it when I’m looking at an implicit conversion warning that may affect the ability of the optimizer to do its job properly, I can now see exactly, okay, I know which data type this was converted to, and now I know in line which data type the incoming parameter was. So, I can understand if I can do something either in my code base or even at the table level.
I also now have CPU and execution elapsed time for the entire query. This is located at the root note of your plan. Kind of makes redundant using set statistics I/O and set statistics time. If you collect the actual execution plan, your actual CPU elapsed time and execution elapsed time will already be there in Showplan. We’ve also introduced – and that’s based on the sys.dm_exec_session_wait_stats DMV, we’re now capturing the top ten – up to the top ten waits that a specific execution can wait on.
I know there are a couple of connect items that have to do with a couple of wait types that are not included in this – in Showplan, or may not be included in Showplan. We are working our way through those, just in case you are wondering. And those will be dealt with very soon. So, here’s the kind of information you had up until SQL Server 2016, in terms of understanding – if you were looking at a plan, understanding per operator what are the performance metrics you could get for each operator.
For example, thinking of a scan, an index scan. So, you see here, three rows are highlighted, or I bold them. These are the three data points you could get up until SQL Server 2016 RTM for a scan. You would able to see how many actual rows were output by the scan, you would able to see the number of executions, and you would able to see the actual end of scans, but that’s it. So not really a lot of actionable information when you were doing query troubleshooting, and that’s probably why you were collecting set statistics I/O to be able to see more specifically, the I/O characteristics of this scan.
So, what we’re done in 2016 RTM and 2014 SP2 was to extend the data points we have for each operator, and the runtime counts per thread. So, everything that you see here is available in these versions when you collect an actual plan. So, for the scope of the same scan for example, I would be able to see – let’s actually use another example. I’m looking at an index seek operator that is actually a range scan. Now, I would be able to see for example, not only the actual rows that were output by this operation, but also the actual rows that were read by this operation. That may differ, and I will have a quick example of that in the next couple slides.
I would be able to for example, see how much CPU or elapsed time that scan took. So, I would also be able to look at the logical reads, physical reads, read aheads, a number of relevant information for the scope of analyzing the performance of the specific scan in this case. This is for a number of data read operators that we have this information available. And in SQL 2016 SP1, we’ve actually added three new properties for the runtime counters per thread. These apply to hash matched and sorts, whereas we are also outputting the memory grant information for these two types of operators. So again, hash matches and sorts.
Do mind this note here. Depending on whether – so these runtime counters per thread and the cost of the operators runs up to three, but it only runs up to three if we are looking at row mode operators. If you are looking at batch mode operators, then it is – up to three will not see the values from the chart. So just keep that in mind in case you’re analyzing a plan that has both row mode operators and batch mode operators specifically.
So how are we actually exposing these? These are in the Showplan XML obviously, but you can see them in SSMS just by in this case as you can see here for this index scan, just by looking at these properties of that clustered index scan. I’m able to see here all the I/O characteristics, so the logical reads, the read aheads, the physical reads, so that means that I don’t need statistics I/O anymore. I don’t need to ask anyone to – hey, when you run this query to collect the actual plan, please keep in mind that you need to set statistics I/O beforehand and then collect the plan.
I also have information about the actual number of rows, and in this case, per thread, if I’m running a multithreaded operation. And what I can see from here for example, and this already be an insight you may derive from looking at this data, will be for example, that obviously the output, the actual number of rows was 121,000, but you see some imbalance between these threads. Now, that is for the most part natural in the sense that most tables will not have a perfect data distribution, and therefore between the SQL Server optimizer estimating the amount of rows that each thread would be doing versus what they actually do, has very much to do with how accurate your statistics are as they relate to the underlying data distribution. If they are not that accurate, then you can have this kind of imbalance. So, what I’m seeing in all essence is if I see this kind of example, one thing I would want to look at after the fact would be how accurate are my statistics. Are they being updated with proper – with a proper sample size? Am I updating with a manual sample size but then overriding that with auto blitz stats, which by the way it’s something we’re also addressing in the near future. So, I can already derive that kind of insight about parallelism imbalance if you will, when I look at this. But then like I said, we also have the actual time statistics, and this is per operator also, which means I no longer need set statistics time anymore.
Okay, and we have the corresponding x event that has all this information in and just move forward. Another topic that is important here is – and this is something that Rob Farley’s quite keen on, so shout out here for him because he helped us decide what it could do to make this easier to troubleshoot. It’s about how to detect if our underlying index design is efficient enough for the predicates that we’re using. In all essence, are we using our predicates to search indexes efficiently or not. So, if you think about the concept of what actual number of rows means, this is – if you look at the output of any operator like the index scan that we had a minute ago, the actual number of rows are whatever rows are output by that operator. That does not mean that it’s the number of rows that that operator actually touched on, that that operator actually scanned in the case of a scan. So, we’re only seeing the number of rows that are returned after any predicate, if any, is applied to that operator and typically, this means that we’re able to push that predicate down to the storage engine, rather than filtering after the fact.
This is a scenario that’s usually hidden from the actual execution plan, or was, should I say, because for example, it would be – it could be common for you to find a plan where you would see a seek operator that was returning ten rows, but it was taking a long time to execute. Any maybe you were collecting the set statistics I/O and you were seeing that the seek…
Brent Ozar: Lost you there for a second. There you go. Did you lose the internet there for a second or?
Pedro Lopes: I don’t know, this just said we lost your audio. Did you lose my screen also?
Brent Ozar: No, the screen’s still good. So, you were in the middle of – last thing you said was – take a drink now or whatever if you want. Moments of excitement there, I thought it was my internet. I’m like, did I?
Pedro Lopes: Well, okay. So, as I was saying, a scenario where you’re seeing a seek returning ten rows but taking a long time, or maybe you’re even seeing a lot of CPU being used, but the query plan didn’t really reflect that, you only saw a seek with ten rows coming out of it, what gives? So now what can you do to actually troubleshoot this kind of scenario? And this is what we set out to improve; the ability to troubleshoot these sort of scenarios.
So, think about this query here, over AdventureWorks, so it’s easy enough, and let’s keep this two key stats in our mind. We have the actual number of rows and the result set is 39, but the actual rows read by the underlying table was over 100,000. So, this is being done by clustered index scan. So, keeping those key stats in mind, let’s look at the type of information I would have – let’s say for example in SQL Server 2014, pre SP2, for this scan. I would know that this scan returned 39 rows, and it estimated 1500. That’s it. I have no more information here. So, if I was kind of in the scenario I just mentioned, I was seeing a plan that had very few rows coming out of this scan, but I knew it was taking a long time, and I couldn’t really connect one piece of information to the other. Maybe looking at set statistics I/O, I was able to see I/O there, but not really giving me much more than there’s a large disconnect between the number of rows I’m outputting here and actually the – something that’s happening at the storage engine level.
Well, in SQL Server 2016 RTM, we’ve introduced a new property there, which is the number of rows read. So here in both cases, I’m still looking at the actual execution plan obviously, but now I’m able to see, okay, I’ve estimated 1500 rows, I’ve retrieved – so I’ve output 39 rows, but in essence, this scan actually read through over 100,000 rows. So, I now have more actionable information that allows me – I have a better understanding of why this scan that only output 39 rows actually took as long as it did. It was actually reading much more. But this already gave me the ability to – when I’m looking at an actual execution plan, see this difference, and so create a hypothesis over the ability that SQL Server is having or not, to properly push down my predicate. Meaning – what do I mean by this?
So, my predicate, whatever the predicate is in this query, so these two causes here, makes up 39 rows in my transaction history table. Now, whatever index I’m using; in this case I’m doing a clustered index scan, to filter out even at the storage engine level by the way, this is being pushed down because I only have a clustered index scan. I don’t have a filter after the fact, but I’m still reading a lot just to address the predicate I have. So, this Is not very efficient. A more efficient use would be I would have, if not the same, a very close number between the actual number of rows and the number of rows read. That meant that not only do I have the best possible index for these predicates here, but also says that the execution engine was able to push down the predicate to the storage engine, in which case I only read the number of rows that were needed for my output, instead of this imbalance you see on the screen here where I read many more rows than those that were needed for my actual output.
And many times, this kind of skew already exists at compile time, which means that I should be able to somewhat detect if I’m prone to this situation at compile time. But alas, in 2016 I only had runtime information. So, in 2016 SP1, we’ve added this extra property. Estimated number of rows to be read. This tells me at compile time, which then means that I can for example, scrub my cache plans and kind of do some exercise of understanding do I have plans there that have a large skew between estimated rows and estimated number of rows to be read, because those will potentially be queries that I can without going into run time, already take to the side and try to tune them, or the underlying index design in order to close this gap. So, we’ve added these extra information here. These two are available in the cache plan, which means that now I can programmatically scrub my cache and do some exercise there.
Okay, so again, in the interest of time I’m skipping this one. All my demos are running either on databases that are created in the script or on AdventureWorks, so I’ll make sure all those are available again in our GitHub. Brent if you will, once you post the session or whatnot, if you can just be so kind to put the link there, then everyone can run the demos and if they have any questions, Twitter, any other means that you have to reach me, please do and I’ll be happy to address those.
Brent Ozar: Perfect.
Pedro Lopes: Okay, so another pinpoint that we’ve set out to address in Service Pack 1 was the ability to affect or to influence query execution without being sys admin. Now look at this simple query here, and let’s say I needed, because I went through some tuning exercise, and for some reason I need to use a trace flag at the query level. In this case, 9481. If I’m a sys admin, this runs just fine. However, if I’m not a sys admin, that query cannot run because the trace flag bit of it cannot be run by a user that does not have sys admin context, and therefore I will not be able to run the query.
We have since then added a framework of new classes of hints for the optimizer, which are exposed through this new use hint notation, and specifically to use the legacy cardinality estimation instead of using a trace flag, I now have plain English, or almost plain English options or hints that tell me exactly what these are doing. Not only that, but if you go and look at the books online page on query hints, all these hints are documented so you can use them not only in line in your query, but without requiring the user context on which this runs on to be sys admin.
There are overall nine different hints supported here that were previously only available as trace flags, and we’ve also provided this DMV here, sys.dm_exec_valid_use_hints to make sure that at any given engine version that you are on, you’re able to check if a specific hint is available or not. Let’s say for example you’re developing code that will run on SQL Server and you are in the future, you would be running with a use hint that is not on a previous version. Right now, we haven’t made changes to this, but let’s say we introduce new hints and in the future version you have a hint that is not in SP1. You would be able with this DMV to programmatically understand if whatever hints I’m using are available in the previous version of the engine or not, before I generate the SQL that will run with that use hint. So, it’s just a programmatic way for example, ISDs or someone that maintains the code base that can be used across SQL Server versions as users apply cumulative updates and service packs to then be – can make that call programmatically.
And I lost audio then for a second. Okay, so these are the use hints that exist, and as you can see, for now, the old map to specific trace flags, now that was the first wave if you will, we can at any time introduce new hints documented or not but documented, that may not have a trace flag counterpart. So, in all essence what I mean is if you need for any reason to hint the optimizer for some behavior, do use hints instead of trace flags for that purpose.
You also see three here that have database option counterpart for example, so as much as possible, we’ve moving on in the direction of if we need to add some knob, do it in a way that is at least clear enough when you are looking at the code. I don’t have to keep in mind that 4138 means disabling the optimizer row goal. I will see that in plain English when I’m looking at a query plan. And this is the short URL you can use for reference, which has all our trace flags documented there.
Okay, so let me run a – I’ll just run one demo at the end, but I’ll keep this one for now. Again, allowing you guys to run the script later. So, create or alter is something else we’ve introduced in SP1. This is not really a – it’s not anything but a time saver for developers if you will. Writing if exists specific table or stored procedure or function and then drop and then if not exist now I can create – it’s just too cumbersome to write. So, to increase developer productivity but anyone that writes code, is now able to use create or alter to very quickly just write their stored procedures or functions or triggers or UDFs or views and be able to with a single expression create or alter, take care of the fact that the object may exist or not, and if it does I would need to drop or alter, whatever.
Now, keep in mind that it’s available only for those specific types of objects, so stored procedures, triggers and whatnot, because those are the kinds of objects that if you go and look at the tSQL that creates or alters those, nothing really changes but the create or alter keyword. Now, that’s not the same for tables or even schemas. For example, if I create a table, I can see here on the left side the tSQL to create my table, but if I need to alter a column, as I can see, it’s very much different. So, I still need to have separate create and alter commands for these two types of objects.
And this is something that’s very exciting for me and for us here and for a lot of customers that are starting to use this, is the ability to track query progress, and this is obviously showing something that it shouldn’t, the [1:13:38.2] is bad. But let me go straight to the point.
So, the ability to – if you’re running into a query performance issue, or you think it’s even a query performance issue, the ability to do in-flight troubleshooting is very important. The ability to be able to connect to any server and understand looking at the specific session, what that session is doing, and even looking at the plan as the plan progresses in its execution, is important to detect and try to troubleshoot namely scenarios where you have long-running queries.
So, this can already be enabled, actually since SQL Server 2014, if you do these three operations here. Either if you are using SSMS, you point to 2014 or 2016 and click on the include live query statistics button, you are enabling this ability of the engine. Or in line with your query, if you run with set statistics XML or profile, you’re also turning that on. But the point is, all this, what it does, it turns on something we call the profile infrastructure, and this has a very high overhead. As we’ve measured in TPCC like workloads, the overhead can be actually over 75%. So that’s why probably you don’t have this running all the time in your server. You could actually – if you run the next events session with this event, which again I would not advise you to do in production all the time, but what this does is this enables that profiling infrastructure in the background and then you would be able to look at – while this was running, you’d be able to tap into any running session, not only the one you’re running on, which would be this case here, and look at the live plan as it is progressing. But again, very high overhead, probably why you’re not running it all the time, and that’s a good decision.
So, what we’ve done is to introduce something we’ve called the lightweight version of this ability. A lightweight tracking if you will. This dramatically reduces that performance overhead, and now allows for you to continuously run this infrastructure because if you continuously run this ability in the SQL Server engine, this will allow at any point in time, to do live query troubleshooting. This can be enabled by this global trace flag 7412. I would – unless your server is already CPU bound, like you’re running all the time with 95% CPU, unless you are at that point, turn on this trace flag at any server you have. This would be my advice here because this enables that lightweight profiling infrastructure there and then you’ll see in a few minutes what it unleashes here. So one thing that happens when I enable the lightweight profiling is that the sys.dm_exec_query_profiles DMV, which is something that actually populates the live query stats ability or feature of SSMS, now also is also populated with this lightweight profiling, which means that for all essence, we are now able to run a live query stats on all fashions at any given point in time, and this is extremely useful for let’s say a production DBA that someone calls and says, “Hey, you have a problem. To tap into running system and look at what it’s doing.”
We’ve also by the way, enabled this new DMF, sys.dm_exec_query_statistics_xml and it requires the session ID and when you do that, you will get a snapshot of the actual execution plan with actual runtime metrics, as it is right now running in SQL Server. So, this is also extremely useful not to actually see the query progress, but to take snapshots of it if that’s something you would want to do.
Okay, so some numbers here, because numbers are important, before I go into the quick demo over this. With the same TPCC like workload, the regular profiling infrastructure as you can see here, has at least 75% overhead. The lightweight profiling as we introduced in Service Pack 1 has 1.5% to 2% CPU overhead. So that’s what I meant by unless you’re already running on servers that are CPU bound, if you can say, take this overhead and trade off being – the ability to do live troubleshooting at any point in time, I would advise you to actually turn on trace flag 7412 and leave it on any of your servers.
Okay, so just a quick comparison between the regular profiling and the lightweight profiling. While the regular profiling has the full run time statistics for any plan that is running, we do have one limitation with the lightweight profiling, which is we’re not tracking CPU. Now, we’re tracking very other bit of information you can see in Showplan but CPU here. If that is paramount for your query analysis, then you can still run that query by clicking on the include live query stats in SSMS, which turns on regular profiling and for that query you would see CPU usage. But if you’re tracking queries that are heavy on I/O for example, or have a lot of waits, whatever it is, you’ll be able – you don’t really need CPU tracking for that, but you’ll be able to use this feature to troubleshoot those live in production.
So, this one I’m going to do the demo very quickly. I’ll put up my – so I’m already using the latest version of SSMS, which is 17.2, but actually this is available in any version that has what I’m about to do. Any version that has live query stats. So, I will kick off some workload in the back, and let me use this one. So, I’m kicking off some workload in the back, and I’m the DBA, and I got a call, and the call says, “Hey, as usual, my SQL Server is slow” because as any production DBA knows, the database is proven – guilty until proven otherwise, right? So, it’s up to the DBA to actually prove that it is – either a database problem or not.
So, what can I do? One of the things I can do, which I’m already giving a sneak peek, but I’ll be talking about this in a couple minutes, is I can – if I’m using the latest version of SSMS, I can actually right-click my server and go to the reports and I now have this here. So, the performance dashboard, which is something that has existed for some time now, in 17.2 we’ve made it a native embedded report into SSMS, which also means that for anyone that has used performance dashboard before, that you needed to deploy a schema into master and then you would run this report from maybe your laptop against whatever servers that you have pre-deployed that schema. No more. So, everything is self-contained here so I can click and I can go and check what is happening in my server in terms of performance. I can see that I have parallelism waits, I can see that – actually, I do have a lot of CPU usage now. In this case, most of it is not SQL. Let me actually do the disclaimer here, I’m running the same client application that’s generating the workload in the same server. That’s probably not the best scenario, that’s why I’m seeing a lot of CPU not being SQL, but if you’re running in a server that runs other services, for example, SSIS, besides the SQL Server engine, you can use this kind of – you can use this performance dashboard to understand if I’m looking at a CPU problem inside my SQL Server, or anything else running in this box.
I could perhaps go here to the user requests and see what is running here, and maybe I would be able to find some query that may not be running so well; session ID 58 for example. But maybe I like other kinds of flavor. I can right-click here and use activity monitor for example. And activity monitor – let me go here to the recent expensive – actually, the active expensive queries, and I have a few active expensive queries here. I actually have this one that’s been running for 69 seconds now, and well, that’s the CPU time actually. The elapsed time is over two and a half minutes. So, I’m a production DBA, I have set trace flag 7412 on by default, that’s a startup trace flag, so that means I can look at any query execution. I can right-click on this and show live execution plan. And when I’m looking at this, I’m able to actually see – well, not the right one, that’s not 56 – I’m able to see the plan as it is progressing. So, I can go through the plan, and I can see I’m building this lazy spool here. I’m building stable spool here, and this one, so quite big. You can see how many rows out of the how many rows I’ve estimated are coming through, so this is actually quite a nice plan to look at. I have something like 43 quintillion rows estimated here, so yes, seems I really have a query performance issue here.
Now, I was able to quickly do this in production rather than collecting a trace or collecting PSSDIAG, or collecting ex-events and then taking them onto my desk server or my laptop and then cracking open those traces and trying to see this. No, I am looking at it as it progresses, directly in production, with nothing more than 1.5% or 2% overhead by being able to do this, so very manageable for the most part.
What I can also do is let us see the query that’s running here. While I can’t see everything, maybe I can right-click here, see properties, go to my statement, copy – I don’t like this, hang on, I have something better. In case anyone noticed, for a few versions we now have this little button here, which allows me to with any plan that I’ve opened, I click on it, and I can actually see the well-formed tSQL here. So very easy and very cool to just be able to do this. So yes, it’s in the wrong database, I know, let me put it in the right one. I’m looking at my query here, I already know my plan because I’ve looked at it in the live query stats. I’m looking at this and I see, hang on, so remember when we see in the slide some half hour ago, I had a query running with 9481; this actually means that it’s forcing the legacy C – yes, probably this went to some query tuning and someone found that the old C or the legacy C are the better performance than you see that we’ve introduced in 2014 and above. I’m not so sure at this point what I was seeing there in the live query stats. So, I will just actually comment this out, and let me execute it – this is a simple select so I’ll just execute this in production. And let’s see what comes out of it.
Well, first of all, it executed in less than one second, or is about one second, and I can see that the plan is very much different from the one I was running – or I’m still running actually – in live query stats. For example, if I look at the estimated number of rows here, you can see those 45 quintillion rows, with a new C for example, I’m estimating 500 rows. So, and the plan did execute in about a second, right? So, it kind of seems that I fixed, as a DBA, I was able to see the problem and in this simple case, was able to hypothesize what was going on and in this case, I actually have a fix and I will send to my devs to be able to push this up to production so that they don’t use the old C here anymore. So very, very powerful to be able to use the ability of doing live query troubleshooting. All of this is available because that underlying profiling infrastructure that unleashes all this is now very lightweight, and if you enabled it all the time, this is super important for production DBAs. So, let me stop my workload here, and I’ll go back to my slides. Any questions here?
Brent Ozar: Only one. There’s one person asked, Chitan asks, “Does trace flag 834 large pages apply for SQL Server 2016 SP1 still?”
Pedro Lopes: It does. Yes.
Brent Ozar: That’s it. Lots of people are excited about what you’ve been showing and stuff though.
Pedro Lopes: Well, I’m glad. I’m hoping that you start using it and feel free to flood us with feedback. Good, bad and the ugly, we want to hear about everything. So still some things we’ve added in V16 of SSMS. Support for multi-statement Showplan comparison. If you’ve used Showplan comparison, it’s the ability to pick up two plans and be able to drive insight from those by comparing them. If we have a couple minutes, I would do this. I’ll still do this since we have time. This is actually in query store, the UI part of query store, we have introduced in the last version of V16, so it’s three or four months old already, four months old already. We’ve introduced the ability to filter out by the number of different plans. For example, let’s say you will crack open a report in query store and you only want to see those queries that have more than two plans stored in query store. You can do that very easily by applying these filters, which are able in all query store reports.
In V17, we’ve added these new reports here. For example, one of the things about query store is that from there you can force plans, right? If you found a plan regression you can force the better plan to be persistent for all subsequent executions. But let’s say some you change, you move your database to SQL Server 2017 for example, or you applied some CU that has a fix for something that may be impacting your query plan shape, and now you want to really remove that forced plan. You want to let the optimizer do its work again, maybe it will come up with a better plan going forward. You didn’t really have a very explicit way of doing that in the UI. With SSMS you would do it in a script, so we added this report, query force plans, where you can clearly see which queries you have that you have previously forced a plan.
And there’s also this new feature of SSMS that I’m going to show, which is the query analysis scenario, and we’ve introduced one key feature there, which is the ability to troubleshoot C cardinality estimation differences and I’m going to demo that in a minute. I’m in the last few slides of my presentation.
So, what’s next for relational engine? Meaning what’s there in SQL 2017 and beyond that, and maybe even in market for post or with the upcoming SQL Server 2016 SP2. So, for example, something we have in 2017 is the ability to now also in Showplan, have information about which statistics were used by the optimizer for a given plan when it was compiling that plan. So, I’m able to now store that information Showplan again, iterating through the notion of Showplan being as much as possible the one stop shop for everything related to query execution. If someone ends me an actual execution plan, I’ll be able to understand what is going on, what trace flags are active, what statistics were loaded, I’m able to understand the full spectrum that – of the ecosystem that impacted this execution here.
So, for example, in this plan that I’ve collected, I was looking at a large misestimation from an operation running over this index, which obviously has the statistic with the same name. And I was able to see that, okay, I’m using a full scan, so 100% sampling; why do I see such skews in the estimation versus the actual rows? And with that, I was able to also see that I have a modification counter of 19,000, which in this case is more than the row count I had in the table. So, at any given point in time, I probably turned off autostats, and somehow this was not getting updated.
So, what I did was a direct insight I got from looking at a plan is I need to update these stats, which I did. And running the plan again, actually got a slightly better estimations versus actuals, and I could also see that yes, pretty much my – these stats were updated. I can even see when they were last updated, as you can see here. So, since I was working at 3 am.
Okay, this is – I also have scripts for this demo, I will again, leave them on our GitHub. Go grab SQL Server 2017, that’s the on docker, which is something that you easily run on your own laptop, and get a feel of what’s there in SQL 2017 even at this point of stuff we’ve added in Showplan because you’ll be pleasantly surprised.
Other enhancements on 2017 obviously we have automatic query plan tuning. This is the ability that if you’re running a query store on a given database, if we detect regressions, we can automatically address those and force the plan that seems to be the best one, and then we’ll observe performance there, and if it turns out this was not a good decision, we’ll just rollback what we did. We also have adaptive query processing. This would be like at least a whole hour of discussion so I’m obviously skipping that here. But if you go and get PASS for example, I will be having a session with Joe Sack there that will dive deeper into these topics.
In SQL 2012 SP4, in which we are adding a number of enhancements that we’ve added to higher versions like Service Pack 2 for 2014, we are bringing back to the last known service pack for SQL 2012, so a lot of those enhanced extended diagnostics in Showplan XML will be available in 2012. Memory grant warnings, the diagnostics will also be available there, so you have a better storing in terms of Showplan diagnostics across several versions. And while we do have some more exciting improvements coming for an upcoming SQL Server 2016 SP2, and those again will be a whole new session so I’m keeping those in my pocket to speak at another occasion, but tune in for that.
This is just a number of bookmarks I want to leave you with, and Brent, you’ll be the judge if I have three more minutes to do a quick demo on the query plan comparison or not.
Brent Ozar: Yes sure, one more demo and then we’ll bail. Go for it.
Pedro Lopes: So, let me bring up again my SSMS. So, what I’ve – it’s still picking up on the same scenario that we have here. So, I have here a slow plan that I’ve captured. So, it was the actual execution plan, I had run the – I’ve collected the actual execution plan and saved it, or I could be using the – I could have just run this query here. Let me see actually how much time it took. Yes, it still takes – it takes 30 seconds. Not long. So, I’ve captured this that took 30 seconds to run, which I consider slow for my workload. And I’ve gone through some exercise where I was actually able to do some changes to the query. Now, I went and this query became faster. But I want to compare them both. I want to understand what can be happening there. So, I can compare Showplan, I can use this dead option, so let me run that again. I right-click on any open region of the plan, and I have this compare Showplan. I actually also have this new feature here, which is the analyze actual execution plan, let’s see if this gives me anything. It actually does.
It’s telling me that – so I’ve opened this plan which I’ve previously saved. Let me close this. I’ve started the single plan analysis, which for now only has the inaccurate cardinality estimation scenario, and I’m able to see that I have 14,000% difference in this clustered index scan, which is this one, and going down the three I can see that for example, in this clustered index seek here, I also have 12,000% difference between the actual rows and the estimated rows. So, this is what information is telling me. Now, as I click on each one of these, and let me stay on this one for now, I’m able to see some details of my findings.
First thing I’m seeing is that okay, the predicate for this operator depends on the parameter, start order date, and the compile time value was either unknown or different from the runtime value. So that’s probably why you have a skew between estimations and actual runtime values. Another option, or another possibility would be – so this is not absolutely – this will give you, the user, some actionables that they can go and try to do, or to change or to observe or to analyze in a given plan, so if you will give you hints on what needs to be looked at.
This is especially useful for folks that maybe have a bit less experience looking at the query performance and from this feature in SSMS, which by the way is offline, so I opened an offline SQL plan file which could even be from SQL 2014 or something, and I was able to – or 2012 – and I was able to run this and SSMS itself offline gave me some insight on what to look at.
Okay, so it’s telling me I have a difference between compile values and runtime values for this parameter. Okay, let me zoom out and go look at that because I do have information about the parameters in the root node. So, I select the root node, let me look at properties of that root node, and one thing I can look at is the parameter list. So, I do have start order date right, and I can see that the compile value was this date, but the runtime value is this one. What I have here is obviously then a classic case of this parameter was sniffed at compile time, the query was optimized for this incoming parameter, but then apparently the data distribution is so different between buckets, between dates if you will, that using another parameter yields – obviously the plan was cached so then I’m using the same, but yields what is perceived as not so good performance for this same query.
So, I did, however, apply a few workarounds that are possible, and I did come up with a better plan. So, what does that better plan have? Let me compare, so I’m going to close this. Right-click, I will compare Showplan with the one that I got after I applied some optimization like for example, using some optimized unknown hint or recompiling all the time, something that made the plan faster if you will, for that other incoming parameter.
So, this plan was faster, as I can see here, in the slow plan, the actual number of rows out of this clustered index scan is 400,000 and it was estimated 2800, and for the good plan, for the same 2800, it was estimating – it actually got 2000 rows. And why is that? Well, this is actually the plan that I got so I’m looking at the bottom plan, here’s the good one, so bottom plan here. I’m looking at the parameters there and yes, so the compiled value was the same as the runtime value. So, I’m actually able to see that for the compiled value, I actually have a very good plan. But for other data distributions I don’t. So now I would apply a number of known techniques to fix this. I could – and this is the demo by the way, you can run it on your own and then do some work here. I could use option recompile, I could optimize for unknown, or optimize for the start order date that my users use the most, and that way I would optimize for the majority of my users using that query and some other users that use a different order date, they would perhaps have worse performance. Or I could later the procedure to have a local variable, I could use this hint, which is one of the new use hints, which is to disable parameter sniffing. So, I have a number of other options I could use to fix this scenario, and these were more easily unleashed by the ability to use these features in SSMS, which again I started by looking at the single plan, looking at the – analyzing the actual plan but also then comparing to the other one that led me to understand exactly what the optimization was, or what the optimal plan was versus why did this one go so slow.
Okay, and that kind of concludes my session for today. Again, thank you very much for sticking around.
Brent Ozar: Thank you.
Pedro Lopes: I always have the time with this, problem with my time management.
Brent Ozar: No, you’ve got so many cool things you get to talk about. Jeez, SQL Server team’s been on fire lately, so nice job sir.
Latest posts by Pedro Lopes (see all)
- Gems to help you troubleshoot query performance - April 21, 2017
- Enhancements that will make your SQL database engine roar – 2016 SP1 Edition - November 29, 2016