Anyone working with SQL Server and interested in where Microsoft is going with the data platform.
A new release of SQL Server is like a new season of your favorite TV show. Some releases in the past were lacking; I always felt like SQL Server 2008 R2 was about as exciting as the Seinfeld finale. But Microsoft hasn’t disappointed in recent years, and SQL Server 2019 is no exception.
There are some really exciting features here. Some are big ticket items, like support for Secure Enclaves, which provide all the benefits of Always Encrypted but without the client round trips. Others are smaller but powerful improvements, like deferred table variable compilation, and better error messages. Some are in between, like enhancements to batch mode, more online operations, and query hints to target specific compatibility levels.
With plenty of demos, I’ll show how these features will improve your performance, security, workflow, or all three. And when necessary, I’ll point out pitfalls or gotchas that you won’t find in the official documentation.
These and many other features are worth talking about, so let’s talk about them!
Why I Want to Present This Session:
I was lucky enough to play with several of the early CTPs, and have insight not only into what’s already been announced, but also some enhancements you haven’t even heard about y
String or binary data would be truncated in table ‘GroupBySubmissions’, column ‘WhyYouWantToPresent’.
Truncated value: ‘I was lucky enough…’.
ERIK DARLING: Alright, so folks, coming up next at GroupBy is Aaron Bertrand with new engine features in SQL Server 2019. Take it away Aaron.
AARON BERTRAND: Alright, so good morning everybody. Hope everyone’s doing well. I’ve been on vacation all week and I’ve shaved for the first time since SQL Intersection. So I feel a little weird. I’ve had this beard for a while. Alright, so my name is Aaron Bertrand. I’m on Twitter at my name, Aaron Bertrand. I’m a product manager for SentryOne. I work with Andy Mallon who’s also on this webinar. I’m an author at MS SQL tips, I was born in Canada, which is really exciting for virtually nobody and I write some articles and manage content at SQLperformance.com and also our team blog at SentryOne. I’m a community moderator at Database Administrators, Stack Exchange, and the product that I own in SentryOne is Plan Explorer. So if anybody is doing any plan analysis and you’re using Management Studio, stop doing that and use Plan Explorer. It’s going to give you a lot more information.
So today we’re going to talk about new engine features in SQL Server 2019. Not really deep dive, I just want to kind of give an overview and tell you what’s coming so you have an idea of what these things are all about. I want to talk about these in five different areas, and they are administration, so administration and system configuration, those kinds of things. Security, there’s some pretty cool security features that are coming. Availability, a lot of really important enhancements here for availability, and then performance, there’s some performance enhancements. Some things that you can take advantage of with intent and some things that will just run faster by default. That’s been a pretty common theme for the Bobs. Bob Ward and Bob Dorr at Microsoft have been trying to do a lot more things where systems run faster by default, and then programmability. So not a lot of T-SQL changes but there are some enhancements to programmability that will make your life easier.
So for administration, one of the first things that you probably – unless you’re at the South Pole or something, you’ve heard about these new things called big data clusters. And this is – I don’t know exactly how to describe this but it’s basically building out a whole system of SQL Servers that help you provide an application that will deliver – that will accept data and deliver data in a much more scalable way. So if you take something – it’s a concept similar to availability group. Not exactly the same thing, but similar to that. And then you have integration that you can add to that so you can pull data in from Oracle and Spark and Hadoop and Teradata and all these other different data sources. And you have direct access to different kinds of unstructured data from the centralized system.
And then we’re using things like Kubernetes so you have containers and pods and compute notes. And then on top of that you throw a caching layer. So you can have a bunch of SQL Server instances that essentially act like a caching layer so they’re kind of dumb terminals that just surface a bunch of relational data so that you can have really high scale out reads. So a bunch of SQL Servers that just act as a caching layer, and then you can scale out any of these however you like. So if you need more storage notes you can do that, if you need more caching layer you can do that. You just add instances to SQL Server.
There’s a guy in the community named Chris Adkin and he’s starting a great blog series. When you are able to download these slides, every slide has – maybe with the exception of one, every slide has a bunch of links to a lot more information than I can share when I have 30 slides that I deliver in an hour. So there’s a lot more reading you can do on that stuff.
One thing they haven’t announced yet is how much will this thing cost. So don’t ask that, that’s not anything anybody can answer yet. So I can imagine it won’t be cheap. And this is kind of the overall architecture of this thing and I’ve – I could spend a whole presentation going into details of all these little aspects here, the Spark, the SQL Servers, the HDFS, the persistent storage and everything else, but I just like to say this is magic here because that’s easier and there’s a lot of content out there that can show you this. There are sessions at Ignite that you can watch that go into details of all the little aspects of this.
Next thing is SQL Server on Linux. On the last presentation, Tracy was talking about running a bunch of things on Linux and SQL Server is no exception. You can run SQL Server on Linux now and they’ve made a bunch of enhancements, but what I like about what they’re doing now is previously they’d introduce a new feature and they’d have one blog post and there’s one page and books online that explains how it works, and if you need anything that’s outside of the lowest common denominator that made – that was important enough for marketing to make the blog post or important enough to the product to make the documentation, you had to rely on outside bloggers or inside information or trial and error. And what they’re doing now, again, I mentioned the blogs before, they are writing up just about everything and explaining all the background and the technical details of how all these things work under covers so you can understand them better. And they’re written like humans, not like documentation. So this is just – again, don’t try to memorize any of these links. You can download the slides and click on all these and I do like too in the blog format the title of the blog post is basically the slug at the end of the URL. So it’s very easy to tell what each link is about. And if you really want a deep understanding of how SQL Server works on Linux, get the book. Written by Bob Ward, it’s very good. This is – actually he sent me this copy. I just got it yesterday. So I’m kind of – excited is the wrong word but I’m looking forward to reading that and seeing a lot more about what SQL Server on Linux is all about.
So one of the things we can do as administrators is set scope configurations at the database level. So we can set a bunch of options that we used to have to set using server wide trace flags or server level options or individual – things at the end of individual queries or using compatibility level as the shoehorn thing that makes a bunch of behavior changes, including the one that you want but maybe a bunch of other things that you don’t want. So they’ve made this a lot more granular so that at any single database you can set things like max stop or the parameter sniffing trace flag 4136, or query optimizer hop fixes, that’s trace flag 4199.
And then you see 17 new ones here that are almost all about how optimizer, query optimization works. So you have new things like turning on and off batch mode adaptive joins, turning on or off optimize for ad hoc workloads, turning on or off batch mode on row store. You have a bunch of new things that you can control at the database level so it’s very easy to force a certain behavior or prevent a certain behavior without affecting anything else. You only affect that one single behavior, so that is pretty powerful stuff.
Another thing that I used to do a lot when I was troubleshooting things whether I was looking for what resource belonged to a – what page was in a specific resource that was mentioned in a blocking chain or in a deadlock where all you have is this page resource ID, which is a binary string, figure out how to parse and pull the information out to find what data is causing the problem. And so we have these functions called DBCC ind and DBCC page, which allow you to go in and look at individual indexes, find all the pages that belong to an index and then look at the contents, the actual contents of a page.
So the problems with this, one, you can’t do this in a set-based way. You can write a loop if you had to inspect a bunch of pages. You could write a loop and do DBCC page for a bunch of things and then you’d have to build some kind of parser because the output is really awful. And then it also requires sys admin. So anybody trying to troubleshoot a deadlock had to have sys admin to be able to do this kind of thing and figure out what page was involved. So first thing they did, they added a new dynamic measurement function called sys.dm_db_page_info and into this you can pass, if you have the page number, you can pass that in, then you can get from that the object ID and the index ID. So it’s very easy to resolve things from a deadlock or some other sys exec request will sometimes show you the page resource ID that someone – that a request is waiting on, but it doesn’t tell you the object or the index. So this is one way that you can look that up once you have the page.
If you don’t have the page and you only have the page resource, you can use the new function sys.fn_PageResCracker and what this will tell you is what database file and page page resource belongs to. So the big benefits here is now you can do this set-based. So if you have one piece of information or the other, you can use cross apply to put these functions together and pull out all the information you need. It’s documented, it’s supported, so DBCC ind and page, they’re not documented and they’re not officially supported. So if you’re using those for some specific process or you build it into your product like, I don’t know, a monitoring vendor might do, those things aren’t supported so it can get you into trouble. And the biggest thing there is you no longer need sys admin. So you can have more granular permissions that are assigned there.
They have – now the column store is actually getting some huge adoption. They changed the stored procedure sp_estimate_data_compression_savings, so now you can pass in types like columnstore and columnstore archive, so you can determine in advance how much space you will save by using one of those compression methods. Previously, you just had to create the columnstore index and see what it did. And then the output is determined by the input. So if you use either a heap – if you have a heap or a clustered index or an already existing clustered columnstore index, what you get out from the procedure is a clustered columnstore index. Is you have a non-clustered index or a non-clustered columnstore, you actually get the compression savings estimate for a non-clustered columnstore index because in those cases, you wouldn’t be able to add a clustered columnstore to those objects. So that’s just something to keep in mind if you’re running this against a bunch of tables, the source of the table or the index that you pass in determines what your actual estimate savings refers to.
Alright, so some security features. First one is always encrypted, and that’s not the new feature. Always encrypted was introduced in SQL Server 2016, but the current implementation requires you to encrypt and decrypt at the client, so that’s great for security because no one on the DBA or the sys admin on the SQL Server box can’t see any of the data because it’s encrypted in the database and it’s encrypted on the way to the database. So even if you had man in the middle, they can’t see that data. It’s completely protected and SQL Server has no way to show you what that is. It relies on client drivers.
But this makes things very hard if you want to do range searches or pattern matching or anything like that, you can’t do that in the database. So what you have to do is if you want to perform a range query against something, you’re going to have to pull all the data back to your application and perform the search there after you’ve decrypted all the data. So when you have a table of employees, then you have 100 employees, that’s not a big deal. If you’re dealing with financial data, that might be a problem.
The other thing that you can’t do in that case is if you want to encrypt a column, so if you use the wizard in Management Studio to encrypt a column that already exists, what it has to do is it has to shuffle all the data outside of the database, encrypt all of the data and then put it back, and that’s very expensive. So what they did is they introduced a new concept called secure enclaves. So there’s a new concept called secure enclaves, and what this is is a protected and an isolated area of memory on the SQL Server machine. So what this does, and I’ll show a little diagram here, this allows you to do things like encrypt in place. So if you need to do key rotation and you have to re-encrypt the column of the new key or you have a column that you realize you have to encrypt and you hadn’t encrypted it before, you can do this on SQL for several bucks in this little black box on the machine that’s still protected from sys admins and DBAs. Only the security, the person who manages the certificates can have access to the things that are on client and on the SQL Server box. There’s no way to see that it’s actually built into the hardware to do this. So this allows SQL Server to have information put into the secure enclave box and have all these rich computations perform there.
And there are two types of encryption. There’s deterministic encryption and random encryption, and this allows you to even do searches on things that are encrypted with randomize encryption because it’s all decrypted when it’s in that secure enclave. So you can actually see the plain text and you can actually run queries against that without any fear of the data being leaked. And the only thing you have to do when you’re creating your column master key and there’s DBL for this as well is check this box, allow enclave computations.
Now, having that box show up inside of Management Studio and having the feature actually work requires -in currency to these requires a server level configuration. I forget what it’s called. It has enclave in the name if you look in sys.configurations. You have to turn that on, and of course, you have to have hardware that supports this. This is only supported on a specific set of server class hardware. So tough to do. I spent about a day trying to get this working in VMs and I wasn’t able to get it to work.
PINAL DAVE: Yes, so one of the thing about secure enclave, this is very, very interesting. I don’t know, I heard about this word earlier when I was reading about GDPR and they said this is something to do with the iPhone and they had some local memory. They called it secure enclave and where there is – they store all the fingerprints so they cannot get access except the product itself cannot get access of it. And when I hear this one, it sounds very similar and I’m sure this would be helping to the GDPR people, people who are interested in it very much. Eventually, it could be interesting to see a demo but as you said, it may be very, very difficult to set upon our [inaudible].
AARON BERTRAND: Yeah, if you’ve got some budget to get me some hardware that supports this, I’ll be happy to do a demo. I don’t have that kind of hardware. I mean, I run Macs here so.
PINAL DAVE: So while when you said VM, I had one question also coming up so that is do you use a virtual machine or do you use a Docker? What is your preference?
AARON BERTRAND: I use a virtual machine.
PINAL DAVE: I see.
AARON BERTRAND: I’ve tested – I’ve played with Docker but I prefer virtual machine because well, one, I like to stay on top of Management Studio and what has changed in Management Studio, and also I manage products that I have to run on Windows. So a Docker container can’t run the SentryOne client application for example.
PINAL DAVE: Alright, thank you very much. I think it’s a good question, many people would like to know how do they install this new feature when Microsoft is releasing continuously new CTPs and as well as other things, so thank you.
AARON BERTRAND: No problem. Alright, the next one. Add sensitivity classification. So Pinal just brought up GDPR and this is a hot topic because a lot of companies are rightfully concerned about what’s going on. So in SSMS added a wizard, a little screen called identify and classify data or something like that and you can go in and you can – it was kind of useful at pointing out columns that might be GDPR sensitive. So it would identify columns by name like if it’s a first name, last name, email, social security, national ID, all these different – it has a bunch of catchphrases that it looks for and identifies those. If you come from an Oracle background and your columns are random eight character strings, it’s not so good at finding those. But in any case, what happens is you pick out these columns in the wizard and you say these are my GDPR related columns that I need to pay attention to, but all that does is add an extended property to those columns. So it’s not very discoverable because extended properties are just kind of this thing on the side that aren’t scripted by default and it’s just not discoverable. It’s very hard to find that information and there’s nothing that helps you go back and say was this column known to be GDPR related at the time of some of that. So if you’re auditing for example, and you see that someone accessed a column, could they have possibly known that that was GDPR sensitive? And you don’t know because who knows if they look at the extended property or when the extended property was added, that kind of thing.
So this new add sensitivity classification DDL, this is actually creating a first-class citizen. So in – there’s a new DMV or a new catalog view called sys.sensitivity_classifications and it stores information and you can give it labels like say it’s GDPR or it’s just PII or it’s HIPAA or SOX or PCI, any compliance things that you have to worry about. The beauty here is once you’ve created this, you’ve classified data as GDPR or PII or HIPAA, this classification automatically gets picked up by audits of the data. And so that information is in the audit. Yes Pinal?
PINAL DAVE: There are a couple of questions just popped in and you started this very interesting questions so I would ask both of these question, otherwise we’ll lose the track of it. So one of the question was ask about secure enclave and said how much – I think Jenny Walker is asking how much latency does it to the application? Or is there any metrics or something? I searched in between on internet I couldn’t find anything so I just thought now I will ask you.
AARON BERTRAND: How fast is your memory? It’s how fast your memory is and there’s not going to be much latency at all. Very fast.
PINAL DAVE: Alright, okay. And the second question…
AARON BERTRAND: There’s no network, there’s no transport, there’s – it’s just going in and out of memory.
PINAL DAVE: I see. Alright, so that’s good and that way it’s faster. And now second one is I think it’s the same one, he works for USDOD and he wants to know about custom sensitivity tags. Is there any way we can specify the custom sensitivity tags or something when you are talking about it or…
AARON BERTRAND: Yeah. I’m going to show a demo of that. So you’re talking about this feature here, right? I have a demo for this.
PINAL DAVE: Look at that Jenny Walker, you should be happy right now, we have a demo coming up for you so thank you.
AARON BERTRAND: Alright, so demo. Sensitivity classification. So let’s say I have a table called contractors and I have a contractor ID, which is just a meaningless – can pretend it’s an identity column. First name, last name, social security number, and the hourly rate. Now, when I have that table and then I can add these sensitivity classifications and IntelliSense so if you can see here, not sure if you can tell, so the IntelliSense is highlighting incorrect syntax near label because the IntelliSense in Management Studio hasn’t caught up yet.
So you’ll see things like this and you’ll think, oh, this code won’t compile, but it will. It’s just IntelliSense being silly. So I can add – the syntaxes add sensitivity classification and then I give the three-part name. So schema, table, column name. And you can do two columns at the same time or 10 columns at the same time so you don’t have to do an add sensitivity classification command for every single column. And then you are able to add these things that are – this is just arbitrary text and I believe that’s what the question was about. So you can say this is confidential GDPR and you can say the information is personal info. If I want to change this to really personal info, I can do that. It’s just whatever – you can come up with whatever labels you want for these things.
And so we have sensitivity classification on first name and last name and this is just GDPR info, and then we have SSM, that’s highly confidential because it’s a national ID. You know people like to protect their social security numbers. You can’t really because you have to submit it to just about everything you apply for ever, but we can label that as highly confidential and then we have the hourly rate, and we want to make sure that this is really sensitive, and so we call it lawsuit waiting to happen. And so we can add these three classifications and then…
PINAL DAVE: Jenny Walker is extremely happy, is extremely happy, said I can program classified label on the system. Woo-hoo, game changer, thank you very, very much.
AARON BERTRAND: Awesome. Alright, so now we can look at this catalog view that shows these are the object IDs and the column IDs and then the labels that I put on them and then the information type. And there’s also label ID and information type ID. You can actually build a hierarchy of these things but I’m not going to do that for this demo.
Then I’m going to go and I did this before so hopefully I’m not – nothing breaks because I’m overwriting a previous audit. So I’m going to create an audit, and this is just a very simple serve level audit. I’m going to turn it on and then in my database, I’m going to create a database audit specification, and this is something that’s probably pretty common in organizations that deal with data that is very sensitive to various type of compliance. I just want to know any time anybody accesses the data in these columns. So we use schema object access group and so that’s my database – well, I have files already so I can skip ahead and show that what I did was I came in, I created this database audit, I selected from the contractors, so I just did one simple operation and then if I look into the files, which I hope still exist – alright, so here we go.
So this is what was audited when I was running through these demos last night. We’ve got the host name, the server principle ID, the object that was accessed. So some of these are system things, I was doing catalog few ways, but here’s my statement. I’m interested in the select star from dbo.contractors and then you have this new GDPR. So this is just an alias that I created GDPR_yaya try and convert XML, this new column that’s exposed in the audit called data sensitivity information. And so in here is an XML column. I have to convert it to XML because it comes out as plain text by default.
And if we click on that, I can see that this person at this time running on this host name accessed from that table confidential GDPR information and highly confidential information. So these are the things that – and obviously they don’t match the names that I just changed during this demo because it was something I did last night, but that shows you that the audit, I didn’t have to do anything to the audit to say pick up these classifications. The audit just automatically pulls those in and has that information there for you, so even if someone goes and removes the sensitivity classification later, this stuff is still plan text in the audit file. So as long as they can’t tamper with the audit file, you can always get back to this. So sorry about the broken demo.
Alright, next, static data masking. So in 2017, Microsoft introduced dynamic data masking and one of the problems with this is it’s not really a way to completely hide it. It’s very run time specific and very prone to leakage. There were plenty of ways to get around – find out what the data was or at least in what range the data was. Now, static data masking is a way to solve those issues. So there’s a new wizard in SSMS and this is kind of – this is not really a feature in SQL Server per say, it’s more related to the tools around SQL Server but they’re pushing this as CTP, a feature that was introduced in the CTP even though it’s a wizard in SSMS.
But basically, what this wizard does is it creates a database copy with truly masked data. So you can point it at the tables that you want and it will create a copy, so this way you can have a backup that you can send to devs and the data is actually static. It’s hardcoded as masked. It’s not something that’s done at run time and that sys admins can bypass and all of that.
And it’s pretty easy to reverse engineer what they’re doing through the wizard but it’s not super complex. So you basically go in and I’m only not showing this in a demo because I have a dark clean in Management Studio and the menus are unreadable for some of these sub-menus off the database. So you have the screen where you pick the tables that you want to mask and you pick the columns and you can do various different things. So a string composite is – it allows you to set a pattern so you can set – you can hardcode values and then you can have things like random digits and random letters and you can do patterns. I didn’t do a repeat pattern there but you can repeat patterns and pretty powerful masking there.
And then what happens if you look at – if you go down the UI, so you tell it which columns to mask and then you go down and you say where do you want the clone backup and then what do you want the masked database name to be called. And so you can have the backup – what this wizard does is first it puts the backup on the drive, wherever you say to store it and then it restores it to the local system as well. So it creates a copy of the database locally, which has all the masked data, and then it also creates the backup file. So if you need to ship it to a UAT system or acceptance system or the developer’s work station, they can then restore that backup and they don’t need a new copy of SSMS, they just need a current version of SQL Server and it restores it as masked data and so there’s no data leakage or anything like that. So that’s pretty cool even though it’s just an SSMS feature for now, but it’s easy to see what they’re doing and you can build and extend on that.
Next section is on availability. So first thing is resumable and online operations. In 2016 or 2017, must be 2017, they added resumable online index rebuilds. So you could have a maintenance window and say your maintenance window is two hours and a rebuild of one of your big tables takes five hours. You could say okay, do two hours of work and then stop at the end of the maintenance window and the next maintenance window you can resume and pick up where you left off. And it only costs you the disk space of how much work was done during the first window.
And now they added that for resumable – they created resumable index create. So I can actually create the index and if the server crashes or I decide I only want it to run for 10 minutes or whatever, I can save the work that I’ve done, kind of like defrag has always worked, and resume it when I have time to do more work. This also allows you to have better control of your logs because the data is actually stored, and so you don’t have to maintain information in the logs about what you’ve done. So you can continue with your typical log maintenance and that index – the index creation that has been paused doesn’t impact that at all. But keep that disk space for paused operations in mind if you have a 10GB table and you get 90% done and then you pause, you’re taking up that nine gigs of data until you can finish the job and it creates the new index.
One limitation is you can’t sort in tempdb. That’s the same with online rebuild because obviously if you are sorting in tempdb and you’re storing the work in progress in tempdb and the server restarts, well, that kind of screws the ability to resume because the data’s gone, because tempdb is rebuilt.
They also added online create and rebuild of clustered columnstore so today in SQL Server 2017, if you wanted to build a clustered columnstore, table’s offline while you’re building that, and for a very big table, that means you’re offline for quite a while. So right now, the existing index, whether you’re creating a clustered columnstore on a table that doesn’t have a columnstore or you’re replacing a traditional clustered index, that existing index is available during the rebuild or the create.
And then one of the database scope configs that I talked about – that was on a slide earlier is the ability to make these the default. So you can elevate the default to online. So if you create an index or you are rebuilding an index, it will automatically do that online for you. Enterprise only, of course. And we can do a little demo of this. So if I come in here I’m going to create a slightly large table. I’m just going to put 250,000 rows into a table called new table because I’m very creative, and it’s just a sys all columns cross join sys all objects. Then it takes a few seconds.
And then I’m going to start a long running index create. So create clustered index on dbo.newtable, that’s on object ID column ID with online equals on, that’s necessary for resumable, and then resumable equals on. And I’m setting a max duration here even though this is never going to last 10 minutes, but just so that – to show the syntax so you can say this will run for 10 minutes and then it will automatically pause after 10 minutes, which is pretty nice. You don’t have to remember to turn off your maintenance.
Alright, so now the trick is I have to switch to a different tab and pause this operation before it’s finished, and it’s pretty fast. So I’m going to have to do it again because I wasn’t fast enough and I screwed up my cursor. So demo fails all over the place today.I don’t know what’s going on, wrong with my demos, they’re all failing today, but basically what happens is there is a DMV here that has resumable operations and it shows things like percent complete and page count, total execution time, the start time, the last pause time, the actual SQL text of the create index operation. And so you can use that to determine maybe if you have five of these index operations that have happened and you have a 10-minute window where you can have some downtime, you can pick an index from that resumable operation catalogue view and you can say okay, I have 10 minutes, I’m going to finish this one that’s at 99% done and you just say alter index on that table, resume, and it will pick up where the index – the create index operation left off. I promise you, all of these demos work. I’m just having repeated failures today and I don’t know why.
PINAL DAVE: One comment Kendra has given is very interesting, said we’ve never seen your demo ever failing, so this is definitely not usual. And she also recommended that you do not cross any street today.
AARON BERTRAND: Okay. That’s pretty sound advice I think. Anyway, these things work. They really work, I promise, and I’m just – I don’t know, not having the day I guess. Alright next one, accelerated database recovery. So in the past for things for recovery for database were highly dependent on the longest act of transaction at time of the crash. So there’s all kinds of information stored in the log and the recovery process has to read the whole log and scan the whole thing and do these undo and redo phases that really took a lot of time. So very busy system or a very big transactions and you have a failover or a restart, recovering some of that information can take a really long time.
Well, they now use – they added a thing called the persistent version store, PVS, which it actually keeps versioned transaction log information, and they used concepts like aggressive truncation, sort of very tedious about constantly truncating the information in the log that’s no longer necessary, and then they added a secondary log that’s in memory. And so what this does is it allows most of redo and undo to be instant. So that means that if you have big databases or databases with very busy and long transactions, you’re online much faster. And there’s a little diagram here, I’m not going to walk through all of the things. There are links in the notes for the slide that explain – help you explain and go through all of this thing.
But basically, you don’t have to do anything to get this. In SQL Server 2019, any time you need to recover, this stuff just happens automatically. There’s nothing to turn on or turn off or you don’t have to do anything manually to set – to make this behavior work.
So availability group enhancements. So one thing they added was the ability to perform secondary to primary connection redirection. So if you have things like the secondary goes down, even if the connection string says I want to read directly from the secondary, you can override that and say well, redirect to the primary because the secondary is down. So you can actually offload read-only activity for example, to the primary if the secondary has a problem. Previously what would happen is it would just fail. There was no retry or anything like that. It would just fail. So this is useful in scenarios where you don’t have a listener or if you’re on Linux and you’re using clusterless AGs, things like that.
So this one, Bob Ward actually mentioned this at SQL Intersection at the beginning of the month and this was all he said. He said system databases in an availability group. And so I have a lot of questions here. I don’t have a lot of answers because I’m having a hard time seeing exactly how this will work. If I have jobs for example – a typical thing, the things that people want availability groups to handle automatically are jobs, logins and link servers. Those are the three things, the three top items. I’m sure there are plenty of others that I’m not thinking of, but those are the three top ones.
And so if I have jobs on multiple instances, I don’t necessarily want MSDB to just failover to a different instance because it’s going to replace. I feel like it’s going to replace the jobs that I’ve set up custom on that instance. It’s separate from the availability group. So I’m not quite sure how it would work. You would have to have things like conditional properties on jobs and conditional properties on logins even. If I have a login on one instance who’s a sys admin but on the secondary I don’t necessarily want that person to be a sys admin, I might want to keep those properties as well.
So I think – I like the idea of this, I like the idea of making that whole process seamless where you’re not manually fixing things every time you have a failover and you’re synchronizing logins between – if you have jobs and PowerShell scripts that sync logins and jobs between instances when availability groups shift and that’s why I have a lot of hope for this feature. I hope to learn more about it soon.
Let’s move on to performance. So a lot of performance enhancements. One of them is in-memory OLTP for system tables. So the one that Bob Ward talked about this month was tempdb. So in tempdb, you often have bottleneck on – there’s contention on system tables. So if you have a heavy temp table or a table variable workload, you have a lot of contention because everybody’s fighting to register the object they created in sys.objects and things like that.
So what they did in a lab environment is they put all of the system tables that are underneath sys.objects – sys.objects is just a view and it’s over a bunch of system tables. And so they put all those system tables in-memory using Hekaton and it was night and day. It was a completely – it was a different system. Tempdb was no longer a bottleneck for anything and I thought that was pretty cool, but again, I don’t have a lot of answers here because there isn’t a lot of information about this yet. Just that they had tested it and that it’s coming and I thought it was very interesting. I think it’ll be useful for other things like row versioning and work files and work tables.
Imagine if all your work files and work tables, your store operations even that had spilled to tempdb would automatically go in-memory. I think that’s pretty powerful. Anyway, more questions for now than answers but it’s just a little heads up that they are going to be doing further things to improve the performance of tempdb.
Hybrid buffer pool. So this is not your dad’s buffer pool extension. That feature was kind of a joke and people make fun of it that it’s only useful in very limited scenarios. This is using persistent memory for buffer pool, which is pretty powerful. So memory-mapped I/O, they call it enlightenment. If you have this hardware, awesome. That’s awesome. Again, like the secure enclaves, if you have the hardware where you can test this and use it, I mean, all the power to you. I can’t. I obviously can’t demo this because I have a Mac Pro. It doesn’t have persistent memory, but it does pretty cool things. This is super fast because everything will just be on this persistent memory.
The limitations, it’s clean pages only, but it does work on both Windows and Linux. So there is an implementation for Linux as well. I know sometimes some of these underlying architecture things, it isn’t clear how it will work on different OS, but it definitely does work there. If you have this hardware and you want to test it, you need to use trace flag 809.
Intelligent query processing. So Pinal asked about this and there’s a whole family of features in the optimizer that fall under this category and it’s things that people like Joe Sack and Pedro Lopes, these guys are working to make the optimizer and just make SQL Server’s overall execution much more intelligent. The first thing was batch mode on row store. So basically for certain operations that are typically done in serial and operate on one row at a time, these will be able to process 900 rows at a time instead.
And so we’re already enjoying that benefit with batch mode when you have columnstore but now you’ll be able to do that on row store tables as well. And there’s table variable deferred compilation. So the table variable estimates have had a pretty long and storied history. For a long time the estimate was always one, so if you put a lot of rows into a table variable, you would always get an estimate of one and you’d get bad plans because it’s basing everything on that estimate of one. And then they added a trace flag that did sniffing or set the – in a lot of the cases that I tested, it set the estimate to 100. So 100 was better than one but if you put a million rows in there or only one row in there, 100 is not a great estimate either.
So what they’re doing now is you can turn this on where it will defer compilation of a plan that involves a table variable until the first time it’s executed and it will actually sniff and see how many rows did you put in that table variable first time, and it builds a plan based on that. So this is great if you’re constantly putting in the same number of rows or roughly in the same range or same order of magnitude at least into that table variable. But if the number of rows skews widely, that’s not so good because the compilation is only going to happen once. So there still will be cases where you want to use option recompile and I have a blog post where I talk about that, it’s in the notes.
Next one is approx count distinct. So this is a ballpark estimate on distinct values in a table and it uses stats to determine roughly how many distinct values there are in a specific column and this is – it is pretty powerful and it’s very accurate. So I found in all of my tests, and I did a bunch of tests on this. In all of my tests, it was within one percent. Roughly one percent of actuals and it required almost zero memory to do that.
So it wasn’t any faster. In some cases, it was even slower than the explicit count distinct, but it allows for much more concurrency. So if you have a large table, every time you do a count distinct, it’s going to have to read all of that data into memory and so for big tables, you can’t get much concurrency because you only have so much memory where people can do that. And so what this does is it really puts that footprint down to almost nothing, and so it allows a lot of people to be running these. If you have dashboards and a bunch of people are running dashboards against the same data and the dashboard show things that rely on count distinct, now you can use this new function and it will be much better.
And one of my favorite topics, scalar UDFs. So they actually implemented scalar UDF inlining, which means that instead of doing row by row every time you run a query, it calls the function for the number of rows that are returned by the query and hides all of that information from the execution plan and stuff is – it’s really hard to discover where all of your time is going. Now they actually fold that logic into the execution plan so it actually works a lot like in-line table value functions. So it actually just becomes part of the plan and that’s pretty nice. This is just a diagram of things that they’ve done so far. This is the intelligent query processing family.
Some new use hints like disallow batch mode, so if you actually – if you have a plan where you’re getting batch mode and you don’t want it or you just want to test if you have a table where you want to test what it would be like if you didn’t have columnstore on that table or I guess with batch mode on row store, if you didn’t want to use batch mode for some reason, you can test what that would be like without it. You can disable interleaved execution TVFs, you can disable batch mode adaptive joins, and then they also added new use hints that you could apply at the end of a query that say use this specific compatibility level. That’s pretty powerful. They also backported that to 2017, I think in CU10, but it basically allows you to instead of changing the whole database compatibility level for something, you could do that on a query level, and that has to do with more than just the cardinality estimator. There are a bunch of other behavior changes that can happen at certain compatibility levels. So that’s pretty nice for very specific use cases.
Lightweight profiling. So back in 2014, they added a DMV called sys.dm_exec_profiles. What this did is allowed as an execution plan or as a query is running, you can actually peak into the execution plan and see how much progress has been made inside of each operator. It’s very powerful. The problem is it’s expensive and it was only accessible from sessions that actually had showplan enabled when they started. So if you’re trying to troubleshoot, if you have some query that’s taking up all the resources on your system and you’re trying to peak at that, you can only do it if they actually turn showplan on before they ran. You can’t go apply that after the fact.
They added a trace flag. I honestly forget now which version they added this in. It might have been 2016 but they added trace flag 7412, which turned this on for all sessions. And it was a slightly different implementation, it was a little more lightweight, so you could turn this session flag on for your server and allow that so you could peak into any execution plan that was running. Now that will be on by default in SQL Server 2019. You can turn it off at the database level, so if you wanted to only have this running for a specific database, or if you wanted to turn it off for a specific database, maybe you have one where everything is ad hoc queries because it comes from a really bad ORM or whatever, you could turn it off for that specific database.
Better diagnostics when waiting on sync stats. So a lot of times when you’re running a query on Management Studio and you don’t understand why data is not returning, sometimes it’s because you had a synchronous stats operation that it has to wait for. So the query isn’t running because it’s waiting on the stats operation and Management Studio doesn’t really expose anything to tell you that. In Plan Explorer we do show that. If you have a select that triggered a stats we show a statman lying underneath but again, we can’t show that to you until after the query has returned.
So now sys.dm_exec_request will actually show a select stat, a row for select statman operations so you know that what’s going on right now is actually a statman operation in the command so you can see that that’s there. There’s also a new wait type. Wait on sync statistics refresh, so you can actually track this on the entire server or by session.
And some other hidden not yet announced gems like streaming statistics. I don’t want to really try to explain what that might mean but what I do, every time there’s a new version of SQL Server and in fact, every CTP, this is one of the things I look at, sys.messages because oftentimes information about features that are upcoming are put here first because they code these messages into the system and so you kind of get a – not a cheating look, but a peek ahead at what kinds of things are coming.
So I compare what’s in sys.messages to what used to be in sys.messages on an older version or on an older CTP and it gives me access into what are the new messages that are coming and you can see things like snapshot views and the streaming statistics and distributed exchange, and it starts making you think what is that feature that’s coming, what could that possibly mean. So I always find that kind of interesting. Alright, some programmability stuff. So we…
PINAL DAVE: Excuse me, just to break this conversation, there is interesting conversation going on in Slack about compatibility level. So I just thought we will just raise it in between so the question is actually from Lee said why would you want certain queries to execute with a different compatibility level? I would think there would be a risk involved with the query result is not all the queries function the same way. Now, there is already discussion going on, some right answers, some not so great answer but one of them is mine. What is your thought about it?
AARON BERTRAND: My thought is that if you’re doing that at a query level, you’re isolating it and if you have – you shouldn’t ever get different results by adding a hint to a query. So there shouldn’t be any risk on results. It’s just if you get a certain behavior in a specific compatibility level because of some behavior that that compatibility level enforces or prevents…
PINAL DAVE: So more than the result you mean to say it’s more of the performance of the query or how the query returns the result. Maybe the internal than what we see outside.
AARON BERTRAND: Right. If it changes the results, that’s a bug. If you add a hint to a query and it changes the results, that’s a bug and I would be very surprised to ever see that. There’s more about how does it go about optimizing and compiling a plan for that query, and it follows the rules set by that compatibility level.
PINAL DAVE: Fantastic, thank you very, very much. I think that was a summary and you just said it so thank you. Let’s continue with the not had [inaudible] features. Thank you.
AARON BERTRAND: Alright, so a couple things about programmability. So in 2016 we sort of got UTF-8 support. You could provide UTF-8 data through bulk load operations and then you would store it in a Unicode column using BCP or bulk insert and you could do that. in 2019, there are new UTF-8 SC collations, so you can actually create a column that will store varchar data. It will store UTF-8 data but it will store – you can store all of the Latin characters without the Unicode tax. So it’s almost like using Unicode compression, and you can create columns with this specific collation or these specific collations. I think there are 64 or something, a lot of collations that will now support UTF08 by default.
Graph support. This is – I can spell graph but I had no idea what this is. If anyone on the webinar is using graph, hopefully, they understand what these things mean. You can now have edge and node constraints and it also now supports merge in 2019.
And then the last one is Java. So if you are using anything with data science or AI or machine learning, typically most of these things that people have implemented have used R or Python. Well now, if you have some application or algorithm that’s written in Java, you can install a Java engine if you will, just like you could with R and Python in previous versions.
And then the last one, string or binary data would be truncated. This has been the bane of many developers lives for years. Absolutely worthless. I mean, aside from the fact that it gives you information that there was in error, that’s about as far as it goes. It’s really hard to track this down.
So now if you turn on trace flag 460, the message is a lot better. This was also backported to 2017 CU 12 and then I have a quick demo of that. So string or binary data would be truncated, so I’m turning the trace flag off in case I left it on before. I’m dropping the table if it exists. Now, I’m just creating a table with the varchar two column and I’m inserting a value that’s obviously too long, and this is the default behavior today and then I’m turning the trace flag 460 on and then I’m trying to insert that same value again.
And then the output for these, this is what you’re used to. String or binary data would be truncated and now you actually get an extended information in the message. So it tells you the database, the schema, the table, the column, and the truncated value. That’s kind of useless too I think. I feel like they should have at least taken this string and put it in this message instead of taking the output of the truncation. So if you had a varchar one for example, what’s going to happen is all you’re going to get is truncated value of t and now you have to go find all of the values that start with t and figure out – it’s not as good as it could be but this is 9000 times better than it was. And that’s all I have for today with two minutes to spare.
PINAL DAVE: I think it’s fantastic. People are loving these features and I think many people are saying you changed their life with the presentation. A couple of people said they learned something new from the answer. Actually, I learned quite a lot of things about actually I didn’t know about truncating is only showing that one word. I also didn’t know – I tweeted a couple of things I didn’t now. There are 17 new compatibility level. I had no idea about graphs just like you so that’s very…
AARON BERTRAND: I still don’t know what graph is either.
PINAL DAVE: So that is good. Amazing, so I don’t see any other question because you are very kind enough to answer during the time, during the presentation itself. So I think it’s fantastic. I think I have a lot of things to play personally. Anything else? Otherwise, super fantastic. I love it. And then one person has complimented, I think it’s SQL Gene how you have been using the various references or attributes to the images and I think one of the bike image I think has created a lot of conversation clearly. 10 or 12 replies in the Slack so…
AARON BERTRAND: Wow, I have to go check that out.
PINAL DAVE: You must do that. So it’s good. Thank you very much. I’ll just keep quiet because I enjoyed and so does people too.
AARON BERTRAND: Alright, thank you.