Enjoy the Podcast?
Developers and DBAs who want to learn more about SQL Server 2016 features.
By far the most important SQL Server release since the heady days of 2005, SQL Server 2016 represents a huge leap forward in a number of key areas. You’ve no doubt heard plenty about such features as Query Store, JSON support, Polybase, and the other big-ticket items. But there are also a number of very cool features that don’t usually make the top 10 lists. In this talk you’ll learn about enhancements to indexing (including both Columnstore and traditional b-tree indexes), DMVs, compression, string handling, encryption, configuration, and even timezone support. These smaller features may not be highlighted in the latest glossy ads from Microsoft’s marketing department, but they’re the kinds of things that add up to make your life easier. Attend this session to learn all sorts of new and different reasons to upgrade as quickly as possible.
Why I Want to Present This Session:
I’m super-excited about what SQL Server 2016 has to offer! I’ve presented this session at a number of user groups and SQL Saturday events this year. I’ve had great feedback, and the content is still very relevant, but not quite as timely as we get several months away from the release date. So I’m about to retire this session, but I’d love to give it a final big delivery. I figure GroupBy is the perfect place for that.
- Slides, demos, and links: http://sqlblog.com/blogs/adam_machanic/archive/2017/01/13/sql-server-2016-hidden-gems-resources.aspx
- CONTEXT_INFO – more info by Aaron Bertrand
- Tiger Team discussing lightweight query progress monitoring
SQL Server 2016 “Hidden Gems”
Introduction and Session Overview
Brent Ozar: In this session, we’re going to have Adam Machanic, the famous inventor of sp_whoisactive, ladies and gentlemen, and sqlblog.com. He will be talking about the hidden gems in SQL Server 2016. Adam, I’ll let you take it away.
Adam Machanic: Thank you so much, Brent. I’m really excited to be part of the very first GroupBy conference. Pretty cool idea I have to say. It seems like it’s been quite successful so far.
This session is something that I’m really happy to be able to create because there’s so much in SQL Server 2016. It’s an amazing release if you think about it. We’ll think about that actually in just a moment. There’s so much there that I’m actually able to create a session which is almost entirely comprised of stuff that they didn’t talk about in the big marketing sessions and shiny brochures and things like that. It’s pretty cool to actually be able to do this.
Let’s jump right in here. A little bit about me. I live in the Boston area and I am an independent consultant. I help a lot of companies, both locally and not so locally with data architecture, database development, and performance tuning. Let me know if you need any assistance in those areas. I’ve written several different books and some magazines, I have a website that I write for called sqlblog.com which is also where you can find some of the information on sp_whoisactive, which Brent just mentioned.
There’s actually a new website for sp_whoisactive which is whoisactive.com. Do not be afraid when you see the horrible website design. I call it minimalist which maybe is a little bit charitable. Don’t worry, it’s real. It’s my site. I’m going to be uploading new versions of sp_whoisactive to that. I promise I’ll actually be uploading new versions more often than I have before. In other words, you won’t have to wait five years between each build from now on. One of the reasons is there’s so much cool stuff coming in SQL Server and already there in 2016. So I’m going to cut it I think and make just a 2016 version and start working on that from now on. I think that will be a lot cooler.
If you want to contact me, my information is at the bottom of the screen. E-mail: AMachanic@gmail.com. @AdamMachanic on Twitter, which I’m trying not to use Twitter as much these days, so maybe that will be actually good for those of you who are following me because you won’t have to see quite as much garbage go through my stream.
A Brief History Lesson
Let’s talk SQL Server 2016. Before we talk SQL Server 2016, let’s talk prior versions of SQL Server and just look at the lay of the land over time. It all started back in 1989. SQL Server 1.0 and SQL Server 1.1 were released in this timeframe. I can’t say that I remember that. I was I believe in middle school. Probably listening to “You Can’t Touch This” or something like that, around that time. What I can say is that I have Google and I googled and I found not much. These were all old products on OS/2. So I’m going to rank these releases as, okay, whatever. We’re going to see, we’re going to go through the releases. We’re going to see a pattern that is going to form, a sort of a release cadence. That’s 1989 to 1991.
1993 came and we had this thing, SQL Server 4.21. I also was not able to find much information about this release, but I did find one really key thing. That’s that this was the first version of SQL Server released on Windows NT. That’s basically where we still are today. So I’m going to go ahead and call this one major in terms of release. So now we have minor release, we have major release.
Now SQL Server 6.0 comes along and kind of iterative enhancements, little things. Again, I googled, didn’t really find—this is just before I started using the product so I still didn’t hit this one either. So I’m going to call this one a minor release.
Then SQL Server 7.0 comes. Now this one I actually used. This one you can find lots of information on Google. This was a huge, huge, huge overhaul of everything from that old OS/2 Ashton-Tate code base. There was all new on-disk structures for storing all of your data, new memory manager, lock manager, new query optimization. Huge, huge, huge. Enterprise manager and query analyzer both came with this version. So this is a gigantic release. Now we’re seeing minor, major, minor, major.
SQL Server 2000 hits. SQL Server 2000 was kind of a big release in that everyone kind of settled on it, but let’s actually look at the features. User-defined functions—interesting feature, great way to shoot yourself in the foot. Indexed views—interesting feature, great way to shoot yourself in the foot. Cascading DRI—do I have to say it again? XML support. I mean, everyone kind of likes XML, but great way to just break the rules of normalization and shoot yourself in the foot. There’s a little theme. Then the biggest shoot yourself in the foot feature arguably ever released in SQL Server—the GUID, released in SQL Server 2000. So everyone settled on these but I’m going to say not too major. Okay, we’re still following our pattern.
Then we wait five years. SQL Server 2005 comes along and SQLOS introduced, dynamic management views, service broker, window functions introduced in T-SQL, a bunch of huge features. And there’s more. SQLCLR, better XML support for those of who do like that, partitioning introduced into the product. Huge. Mirroring, which has now morphed into availability groups. Huge, huge, huge feature. SSIS and SSMS, both shipped with SQL Server 2005. SSRS, for those of you totally keeping tabs, was actually released as a post SQL Server 2000 add-on pack or something like that. But I’m going to roll it on in here because it was kind of all part of the same SQL Server 2005 release. This was a huge, huge, huge, huge release and follows the pattern.
Then 2008 comes along. What did 2008 contain? Compression, kind of an interesting feature. Resource Governor, good v1 feature. Maybe not game changer, but good. The date and time data types, very useful, no doubt. Transparent data encryption, it’s a good feature for those of you who have to check a checkbox somewhere for that. CDC and change tracking, also decent features. Policy-based management. All in all, not bad features, not great features. We’ll just say it’s decent, follows the pattern.
Next, we were ready for our big release and 2010 rolled around and Microsoft announced SQL Server 2008 R2. There was this new report builder, master data services, which was a complete failure. StreamInsight, which is a very cool product that Microsoft completely mismanaged and never really went anywhere. Then some SSRS stuff, like there was some gauges or something. I don’t even really remember. Ok, where’s our major release? We’re still waiting.
Then SQL Server 2012 comes. Columnstore v1, interesting feature. So much a v1 feature, so full of different places that it couldn’t work that it was pretty much unworkable. Better window functions, that was nice. In T-SQL they enhanced that feature, which was a good thing. DQS, which it’s like, I don’t even still today know why to use it or where to use it. Tabular SSAS, for those of you in that realm I know that this is a big feature, not so big for me because I’m not in that realm. Always On Availability Groups v1, good stuff there. So all in all, not bad. A lot of people settled on this release, still no major release though.
Then 2014 comes around, we’re still waiting for our major release and what do we get? We get v1 of memory optimized. This is kind of a big feature, but again, there were so many limitations, not very useable. We get v2 of columnstore, still so many limitations that it’s not usable by most people. Got some availability group enhancements and this thing delayed durability which if you google for SQL Server 2014 enhancements delayed durability always comes up. I actually still haven’t seen anyone using it. So guess what happened? No one adopted it. Why would they? There’s like nothing there. So we’re still waiting for our major release. The last one is back in 2005, what happened to the release cadence?
Finally, 2016 rolls around. Query Store, huge enhancement. Polybase in the box, huge, huge, huge enhancement for all sorts of interrelation of different types of data. R in the box, very much a v1 feature but still huge enhancement. JSON support which developers have been asking for for years. Always Encrypted, much better encryption. Row-level security, we’ll talk about a couple minutes. Temporal tables as well, another huge feature. v3 of columnstore, now it’s absolutely usable, fantastic feature.
Memory optimized v2, which is actually finally a usable feature as well. Stretch database, which is very much a v1 thing but I think this is going to be revolutionary. I’m going to be watching that space very closely as time goes on. So, yeah, we’re going to call that another major release. This doesn’t even mention what happened in SP1 where Standard Edition now has full access to all the programmatic features, it used to be Enterprise only. So this is a huge, huge, huge release that we’ve been waiting for for over ten years now.
As a result, I can create this talk. The last time I actually had a hidden gems talk was in 2005 for SQL Server 2005. Today, for SQL Server 2016 there’s so much stuff in the release, we’re going to talk about hidden gems in a bunch of different areas. We’re going to talk about programmability, security, query processor, manageability, and data storage. There are hidden gems in all of these areas and we’re going to go through those. Then talk a little bit about the major features as well just to make sure we’re all on the same page.
With that, let’s get started. Let’s talk about programmability first. Some of the major features first, we’ll just kind of touch on them. JSON support, this is very much like XML support was back in the SQL Server 2000 days, maybe a little bit better. For those of you who remember SQL Server 2000 XML, you had to do for XML explicit or for XML raw I think was one of the other options. It was not easy to control it. It wasn’t until SQL Server 2005 that Microsoft added for XML path.
The JSON support here in 2016 is a little bit closer to the for XML path sort of model, but it still has this OPENJSON method that you have to call in order to shred the data. There is no built-in data type for better or for worse. Microsoft is calling it a feature. I will leave that to you. There’s no indexing. I think XML indexing wasn’t a very useful feature anyway but that’s not there either for JSON right now. Still, developers have been asking for this for years so this is a pretty big thing.
Analytics with R
Analytics with R in the database engine. Once again, this is kind of a v1 feature. You have to call into a stored procedure to really exercise your R script. It’s not a first class citizen. You actually have to develop your script somewhere else if you want any kind of debugging support or anything like that. But, I think this is another v1 that’s going to be really huge to watch. Might be even revolutionary as we look over the course of the next few releases. So I’m pretty excited about this.
Temporal tables. I’m actually going to show you a quick demo of this because this is my favorite feature in the product, period. Love temporal tables. Every single database out there that I work with has some form of history retention requirement. People want to be able to look back and see what changed. We’ve had change tracking, then change data capture in the products, both of these features were full of issues. Temporal tables fixes most of these issues. It has some v1 limitations yet again, just like everything else, but I am really hopeful that this is going to be a foundation for the future.
With that, let’s jump out to Management Studio and look at some hidden stuff. So I’m jumping over here. Going to go into tempdb. The first thing I want to ask is, who out there has one of these? So, this is a stored procedure, dbo.get_products. It takes a list of products as its input. Inside of the stored procedure, we say select * from dbo.products where the product ID is in select value from dbo.split. Dbo.split is some split function that someone wrote. It splits of your comma separated list and outputs all the products that are in there. A lot of databases I look at, they don’t have just dbo.split, they’ll have like dbo.split, dbo.split2, dbo.splitbig, and all these other weird variants.
The reason is everyone first of all has written their own version of this. I’ve written like five of them myself, four in T-SQL and one in SQLCLR. Different people out there have written various versions, optimized for different scenarios, and blah, blah, blah. So in every database, it’s just a bunch of garbage. All kinds of string splitters. This is not a problem that as database developers we should be solving. It’s ridiculous. Finally, Microsoft got the message. Years and years and years later and we have this thing now.
Brent Ozar: Vivek says at work he’s got several, including realsplit.
Adam Machanic: Realsplit, so the fake one will just give you like garbage data, maybe some GUIDs. Realsplit gives you the stuff. Anyway, select * from string split built right into the engine. It just works. That’s it. Love that. Not only is it there and does it work, it works really well and really fast. If you look at my resources page that you’re going to see on the final slide when I’m done, you’ll see a link to a blogpost by Aaron Bertrand. He performance tested this thing. It’s way faster than anything else out there, including the CLR version that I wrote. So I’m not going to use my CLR version anymore. I’m going to use this.
Now, that said, a lot of string splitters, like when I write a string splitter what I’ll do is I’ll have it eliminate consecutive delimiters, get rid of white space, and things like that. This one isn’t really quite that up to par quite yet. Maybe they’ll get there, maybe not. Luckily, it’s T-SQL. We actually have the ability to do this ourselves. If we want to get rid of white space, we have RTRIM and LTRIM. If we want to give it consecutive delimiters, we just get rid of empty results. So we can do that.
The one thing that isn’t here, that we can’t do quite as easily is row numbering. I can already sense the T-SQL purists are going to kill me for this next demo but I’m going to do it anyway. If you want a row number, this is a trick you can use. What this says is row_number() over (order by (select null)). What this basically tells the query engine is order the rows, generate a row number based on however the rows happen to fall, however they happen to come out of the function.
Now the T-SQL purists will say, “There’s no guarantee that the data will come out of string splits in the right order.” That’s true, but is it really not going to? Can you imagine that Microsoft actually came up with a string split algorithm that’s going to go process the last element in the list first and then come back to the first element and then skip around or something? No. It doesn’t do that. It goes from the front to the back, therefore this works. This will generate our row number. That’s what we want. If you’re not confident, don’t use it, but I’m pretty sure it works. So enjoy. It just works. Anyway, that’s good stuff. That gives us everything we need from row numbering. Hopefully Microsoft will build in some of this stuff in future releases but for now, we have a totally workable solution. I’m quite happy with this.
Brent Ozar: You even have somebody clapping with text. You had someone type in, “Clap clap clap clap.”
Adam Machanic: Excellent. I like to hear that. So, here’s another feature. This is an old feature that’s been in SQL Server forever. It’s called context info. What you can do is you can declare something up to 128 bytes, good stuff, and stuff it into varbinary and you could set it. So here I’m putting the string “hello” into varbinary and then I’m stuffing it into this thing called context info by just saying SET CONTEXT_INFO @info. Now it’s set. Now it’s in my session. It’s available and I can pull it up elsewhere. There it is. That’s actually the string hello in varbinary. Didn’t keep my data type of course, which is kind of annoying, and it’s only 128 bytes.
Nonetheless, I’ve had to use this a number of times where I’ve had situations where I’ve had to call like one stored procedure and then later in the session call another stored procedure and have some data shared between them. You know, there’s plenty of ways to do that. I could have used a table or maybe a temp table, well maybe a global temp table, depending on the scenario that was occurring. But what we really want is a place in memory to just store some stuff sometimes if we’re doing really complex development in SQL Server. Obviously, over time SQL Server has become almost an application server versus just a database. So things are getting more complex and we need better ways to store data. Now we have that.
In SQL Server 2016 what we can do is we can call sp_set_session_context. We can set a key, so rather than just setting a single value in a single place, we now have a key value store. Here my key is called “my favorite password.” My value is my favorite password. So you can go ahead and hack my Google account. Now, when I retrieve it, I do select session context. I ask for the key and I get it back. That’s it. Notice it actually says password. It’s not some binary thing. The return type on this is SQL variant. So it really it’s not really a string but it’s convertible into a string quite easily. What we can store in this new system is not just 128 bytes of data but actually 128 kilobytes of data across all of the keys that we want effectively. I don’t recommend that you necessarily store 128 kilobytes of data, but you can if you want to. So, you know, rules sometimes are meant to be broken. Do you what you need to.
This feature has another option and it’s this guy right here—read only. This is a really cool addition to this feature. What this does is it allows you to set a value and then you won’t be able to override it. So here I’m saying my key is called access control key. I don’t have access, 0x00, and it’s read-only. So let’s pretend that my application has connected, has set this value. I’m going to go ahead and set it. Later, I can ask for that value and get it back.
I don’t have access, but if I somehow, if I’m an attacker and I somehow gain access to the database after the application has connected me, maybe I want to try and change my access control key and I can’t do it. So that’s a pretty cool little feature that was added. This feature by the way, session context was added for row-level security. That’s why it’s there. But, there’s so many other places we can use this. It’s pretty amazing. I’m looking forward to doing some development in 2016 when I need something more complex just so I can play with this.
Brent Ozar: James says—oh man, you’ve got all kinds of questions coming in now—James says, “How do you clear it if it’s read only?”
Adam Machanic: You don’t. That’s the whole point. It will clear next time someone logs in with that session. So as soon as the user logs out—assuming that you’re using connection pooling, the connection will go back into the pool and then when the connection comes out of the pool, there’s this stored procedure that gets run behind the scenes which is called sp_reset_connection. That will reset it and clear it out. So it’s up to your app to take control of it and set it as soon as it connects if you want to use this from an app, which is kind of where it’s designed to be used.
Brent Ozar: Sean Kelly also asks, “Is the key data included in the 128k or is it only for the value?”
Adam Machanic: I don’t know the answer to that. I believe actually once again Aaron Bertrand has a blogpost on this where he tested it on sqlperformance.com. So go search over there. I couldn’t find it when I was actually setting up the resource page but if you find it, let me know. I’ll put it on there. I’m pretty sure—actually, I have no idea. But, I’m going to ask does it matter?
Brent Ozar: Yeah, getting pretty fancy there. Gary asks, “What is the scope of session context?”
Adam Machanic: The scope is the session. So you connect and you have some keys and they just hang out until you log out. That’s it.
Brent Ozar: Cool.
Adam Machanic: So it’s just there. That’s what’s cool about it. Makes SQL Server into an actual kind of programming environment.
Brent Ozar: All right, cool. I’ll shut up now. There’s other questions but we’ll hold them until the end.
Adam Machanic: Fair enough. Now, let’s go and talk about temporal tables a little bit. Temporal tables were listed as a major feature but as I said I’m so excited about it, I want to actually show a quick demo of it. Then I want to show one of the things that was added for temporal tables which is an absolutely gigantic feature, also kind of hidden though.
Here I’m going to create a table, it’s called temporal test. Temporal tables are really quite simple. The goal here is just to track history as we change rows in our table. Our table really only has two real columns: I and J. Both of them are integers. Then I have these two other columns that I’ve added here: start datetime and end datetime. Both of these are marked up in a very specific and very special way. Both of them are datetime2, that’s required. Both of them are marked as generated always. What that means is that SQL Server will be in control of the values of these columns. You will not. That’s key because we want to let SQL Server automatically handle the values that go in and out of our history. This one is marked as row start. This one is marked as row end.
Just in case we weren’t specific enough by marking row start and row end, we also have to say the period for tracking time is between start datetime and end datetime. The ANSI standard guys are very explicit it seems. This is an ANSI standard kind of feature. Now, I’m saying with system versioning = on. That says go ahead SQL Server, track history for me.
I’m going to go ahead and create my table. Now I’m going to insert some rows. Not too interesting data-wise but what we’re going to see when we select from the table is that we have a start datetime and an end datetime populated. The start datetime is 2017-01-13 at 6:55 p.m. Local to me right now it’s only 1:55 p.m. so this is UTC. When we work with this, we’re going to have to get used to seeing and working with UTC data. That’s what we’re going to talk about next, but let’s talk about temporal for another moment first.
So we have a start datetime and an end datetime. The start daytime is UTC. The end datetime here is 9999-12-31, which as far as SQL Server is concerned is basically the end of time. What that means when we see that from temporal tables is that this row represents the current version of the row in our data. It’s kind of an open interval. Now I’m going to go ahead and I’m going to update J for I=1 and then I’m going to go ahead and delete a row where I=2. Now I’m going to select from temporal test. You can see that my row has disappeared. My value, my start time, if you remember the last one was 6:55, not it’s 6:56 so it’s kind of updated itself a little bit. Now I can start querying it.
This is the wonderful thing about this feature that makes me so excited. I can select * from temporal test or I can simply add this new T-SQL for system time. There’s a number of predication options, I’m just going to show between for now. This says give me all the rows that were valid anytime between January 1st of 2016 and that are currently valid. I can run this and I can run this. It’s the same query. It returns the same exact columns in the same exact format but one returns history, that one. And one doesn’t return history. Notice that my order by will work in either case. The cool thing here is my users, I can give them access to the history if I want and they don’t have to know where it lives or anything about it or how to properly write historical data queries. All they have to know is how to use for system time. It’s very simple. For system time between or for system as of, is a couple of different options. That’s it. This is just huge in my opinion.
The data is automatically maintained for me. I don’t have to understand how to write a proper temporal query. It’s not that difficult but one less thing I have to worry about and it’s all completely automatic. That’s kind of why I’m so very very excited about this feature. By the way, if you’re wondering the data is actually being stored in this table dbo.MSSQL_TemporalHistoryFor… blah, blah, blah. That blah, blah, blah is the object ID of the primary table I created. This is a table that was created behind the scenes automatically for me by SQL Server when I turned on the feature. I could create my own table too. That’s another option.
The important thing to understand is that this is just a table. You can go ahead and attack it all you need to in order to make this perform well. You can index it. You can partition it. You can move it around to a different file group. You can rename it. I highly recommend renaming it, obviously. So this is good stuff and it’s all, as I said, pretty much completely automatic and just kind of built in. So very, very excited about this feature.
Brent Ozar: Do you want to tackle questions on that now or do you want me to hold them until the end?
Adam Machanic: I’ll take a couple. Why not?
Brent Ozar: J.D. says, “How do you keep the table size from getting too big?”
Adam Machanic: Well, there’s no built-in pruning but you can prune it. So google and you will find out how to do that.
Brent Ozar: Mark is going to ask a question that is probably near and dear to your heart. He says, “Temporal tables don’t seem to tell me who made the change. Is there a way for me to find out who made the change?”
Adam Machanic: No there is not. I have a—well, in theory, there is—which is you create a column on your table and you keep it up to date but then you have this issue which I refer to as the trust issue, which is you have to trust everyone to follow the rules. I don’t buy that. Do you?
Brent Ozar: No.
Adam Machanic: So what you do is you go to my blog, sqlblog.com, I have a list of connect items for temporal tables. You go find the one where I said, “I need the ability to find out who made the change” and you vote for it please. Thank you.
Brent Ozar: Last one we’ll take. Dmitri asks, “Are we supposed to pronounce it tem-pore-al or tem-per-al? I don’t know the answer to that one either.
Adam Machanic: Where I am it’s tem-pore-al, how about that?
Brent Ozar: My wife calls them tempura tables, like as in tempura.
Adam Machanic: I wish we had that in the product. That would be great. A crispy snack while we’re working. Anyway, as a quick aside, this is one of my favorite parts of the feature actually, which is kind of pretty small but I’m going to go ahead and drop the main table and it doesn’t work. I love this. Let me explain why. A lot of features in SQL Server will automatically kind of try to do the right thing for you behind the scenes. So the expectation would be it would go ahead and drop the history table for me and then let me continue on with my life. This doesn’t do that. I love that because it means Microsoft understands that the history has value and I don’t want to accidentally drop it. So what we have to do if we want to drop the table is we have to turn off system versioning, then we can go ahead and drop the table. Now the history table is still there actually. So I can keep it around if I want to. I can go delete it if I want to, whatever. But it’s just great to see Microsoft actually not auto-deleting my history for me, for once.
Anyway, we were looking at the temporal data and it was all in UTC. Now we need to be able to deal with UTC and not just in temporal tables but in any kind of international application and that’s the next thing that I want to talk about. Back in SQL Server 2008, we received this data type called sysdatetimeoffset—I think it was 2008. Was it 2005? Doesn’t matter. It’s called sysdatetimeoffset. Sysdatetimeoffset gives you a datetime plus an offset from GMT. That’s kind of sort of a time zone, but not really because it doesn’t know anything about daylight savings time and it doesn’t really know how to convert all the time back to UTC. So this gets us sort of fake there.
It turns out in the Windows registry for years and years and years now every time you install Windows, if you’ve installed it recently you might remember, it actually asks you what time zone are you in. It gives you a list of time zones and then Windows automatically changes your clock for you. Well it turns out all that data is sitting there in the registry, has about 15 years’ worth of time zone history as things have changed, and now it’s all exposed by sys.time_zone_info.
So these are all of the time zones that my Windows registry knows about. This is information on the current offset. That’s the important thing there is that offset changes over the course of the year. It changes based on whether or not it’s daylight savings time. Well Windows knew it, C# has known it for maybe five years now, and now SQL Server knows this information. And now there’s a function to actually manipulate this information.
This is the name of the function—now, I know that doesn’t look like a function but it’s actually considered to be a function and it’s based on the ANSI spec. What this query is saying is take the sysdatetimeoffset, that gives me my local time, and then convert it to Central Standard Time. So we find out it’s 2:04 p.m. right now here on Eastern Standard Time and it’s 1:04 p.m. Central Standard Time. So the function seems to work and I can ask what time is it in London. Over in London it’s 7:00, time for a beer. And, daylight savings time awareness.
On March 26th at 5:00 a.m. Eastern Standard Time—and notice I’m stacking the calls to the functions. So here I’m converting March 26th 5:00 a.m. I’m converting it into Eastern Standard Time. Then I’m going from there over to GMT. So on March 26th at 5:00 a.m. it was 9:00 a.m. in London, or in GMT at least. On the 27th it was 10:00 a.m. Notice that the offset changed, plus one there, plus zero there. That’s what makes this feature amazing is that now we have full daylight savings time support. As the rules change, Microsoft can issue patches, update your registry, and everything will just continue to flow and that’s a wonderful thing. So I’m very excited about this feature.
This is added to the product for temporal tables but it’s not just for temporal tables. Literally any internationalization you’re doing, you should be using these functions. In fact, even if you’re not doing internationalization I would recommend starting to use these functions. If you’re just calling get date to get dates in your database, do you actually know what that means? If your server is in southern California today in some data center and then you move it over to Florida, is get date going to do the same thing? I don’t know. Depends on how you configure it and who is managing it and all kinds of stuff. So be careful there. Brent, did you have a…?
Brent Ozar: Yeah, Hugo asked, “Can you ask standard time zone abbreviations with at time zone?”
Adam Machanic: I don’t know much about standard time zone abbreviations.
Brent Ozar: I have no clue.
Adam Machanic: These are what I believe you can use. There’s probably some ISO standard or something but I’m not aware of that, I’m sorry.
By the way—I’ve been using literals but you can use variable, doesn’t matter. You can do whatever you want there. So best practice, please start storing your data in UTC. Instead of calling get date call this guy, sysutcdatetime, and eventually when your application succeeds beyond your wildest dreams and you have to internationalize it, it will just work. That’s my advice to you. Any more questions for now?
Brent Ozar: Colin asks an interesting one. If you know the answer to this, I’m going to be shocked. “How does the offset interact with leap seconds on historical data? Does it retroactively add the clock drift?” Exactly.
Adam Machanic: I’m not able to shock you, Brent. I’m sorry. I have no idea. Leap seconds, I couldn’t care less to be honest. Colin must do some kind of science or space thing or something because that’s well out of anything that I’ve ever had to work with.
Brent Ozar: All right, cool. Well, you’re clear.
Adam Machanic: If you have a blog, figure it out and blog it. That would be a very interesting blogpost.
Brent Ozar: Or ask it on Stack. Put together the answer in text. Ask it on dba.stackexchange and then post your own answer. You get reputation points for that as well, it’s considered good form to post good answers.
Adam Machanic: Let’s go on here. Let’s talk about security a little bit. Security was a pretty big theme in SQL Server 2016. There’s a few big features there.
The first one is Always Encrypted. Always Encrypted is basically and ADO.NET integrated encryption feature. What it does is unlike TDE, unlike transparent data encryption which keeps the keys in the database, with Always Encrypted the keys are kept elsewhere, like in the application. So the data is encrypted over the wire as it travels. It’s not just encrypted at rest. This means that the data is far, far, far safer than it ever could be with transparent data encryption.
Whereas transparent data encryption was designed just to meet regulatory requirements, Always Encrypted is actually designed to give you real security. So this is a pretty exciting feature. You’re going to have to change your application a little bit to use it and you’re going to change the way you think about data a little bit to use it but if you need real data security, this is huge. I’m so happy that we can finally get some real data security in SQL Server because we’ve needed it for quite some time.
Another big security feature is RLS, row-level security. This is something that’s been available in Oracle and some other database management systems for quite some time. It allows you to define a filter policy based on a user-defined function. You can bind that function to your table and at run time when the user runs a query it will automatically call into the function to find out on a row-by-row basis whether or not the user should be able to see that row. Pretty cool stuff, very useful, especially if you’re doing multi-tenant work. There’s a couple of little holes here and there but Microsoft has plugged most of them up. So I’d say it’s pretty secure. It’s not like the most secure thing on earth but it’s pretty good. So Always Encrypted, quite secure. Row-level security, pretty good, as long as you understand the limitations you’ll be fine there.
Dynamic Data Masking
The final thing that Microsoft is calling a security feature in 2016 is this guy—dynamic data masking. What this lets you do, it’s a cool feature, in theory. Row-level security allows you to actually get rid of rows using your function that you write. Dynamic data masking, instead of getting rid of the row, it hides the data behind a mask, a string that you actually provide the format for the mask. So you can turn phone numbers into first digit plus all zeros, for example, and hide the phone number. That’s pretty cool in theory. Unfortunately, this guy is absolutely full of holes. I actually defeated it, and I’m not even bragging because it wasn’t difficult. I defeated it in about two minutes after I first turned it, because that was the first thing I did.
With a security feature, I want to see if I can beat it. No way you’re keeping users out of the data with this thing. So I would use this only if you have an application that completely covers the database, that doesn’t give the users any access to ad hoc queries. In that kind of a situation, this maybe will be handy because you can define things nice and declaratively in the database. But until Microsoft makes this into an actual security feature, I would not trust your data on it. So just a word to the wise there. Even though marketing does call this a security feature.
With that, there is a little hidden gem that I’m quite fond of. Let’s jump back out to Management Studio. This one is pretty simple. I’m going to go over to SQL Server 2014 and what I’m going to do is I’m going to hash some data. I’m going to hash it with SHA2. There are the bytes. That’s great. I just hashed—this is some data to hash—good stuff. Now, I’m going to go ahead and hash data to hash repeated 10,000 times. That’s not so nice. It turns out in SQL Server 2014 and below hashbytes has a limitation of 8,000 bytes and that’s it. You hit it, you’re out of luck. So if you need to hash a big document which is actually more common than you might think, you can’t do it. You have to write a CLR wrapper or something.
Going back to 2016 and the hidden gem is quite simple. They’ve magically removed the limitation. Hashbytes now just works for anything up to 2 gigs. So good stuff. Of course, as soon as I saw that hashbytes was fixed, the very next thing I decided to try was encryption, encryptbypassphrase. And, they didn’t fix it. So I don’t know why. But I’m ever hopeful that whatever nice person fixed hashbytes will go fix encryptbypassphase. Until then, yeah, it’s still CLR for that one. So simple hidden gem that will help a little bit if you need to do some hashing work, which I’ve actually had to do quite a bit recently. So I’m pretty happy with this one. Any questions or comments on that, Brent?
Brent Ozar: No, you’re good there.
Query Processor Insight
Adam Machanic: All right. Sounds good. Let’s go back here. Let’s talk about the query processor here a little bit. Some really nice stuff here in SQL Server 2016.
The huge one of course is Query Store. That would take like a full day, maybe not a full day, but half a day of session to cover. So I’m not going to talk about that too much right now but this is a big, big thing. If you are managing any kind of server at all, you need to look into Query Store. Start getting used to it. V1, it has some limits and maybe doesn’t do everything perfectly today, but it’s huge in terms of being a foundational element for capturing query information as it rolls through your server and allowing you to act on it. Very exciting, very big feature.
Live Query Statistics
Another one is live query statistics. This is a feature, and actually I’m going to do a quick demo of this one in just a moment. This feature allows you, instead of just looking at a static query plan, you can actually look at the query plan live as it’s rolling. So cool to be able to see that and can insight in it. There’s something I didn’t even know until recently which is you can attach to live query statistics, not only on SQL Server 2016, but also back in SQL Server 2014 and you can attach to someone else’s session if you use Microsoft’s activity monitor to do it. You can right click and click view live plan information if they have turned on an actually query plan. That’s the big, big limitation, which is kind of tough to swallow with this.
In SQL Server 2016 SP1, there’s a trace flag that you can turn on now that enables a lightweight form of data collection so that this will work and so that you can attach to other people’s live plans as they’re in process. Why would you want to do that? Well if you have a runaway plan on your box and it usually runs in let’s say a second or two seconds or whatever and today this query is running for twelve hours, you might want to attach and see what’s happening before you kill it and deal with the problem. So good stuff there.
There’s a couple of very cool hidden gems. So let’s jump out and check those out. First of all, wait stats. Sys.dm_os_wait_stats, this is nothing new. Captures information as the query processor is processing your plan. It goes in and out of wait states. Wait states mean that it’s waiting on some resource that it needs in order to continue. Obviously, if it’s waiting, it’s not processing your plan data. Therefore, your query isn’t running as fast as it could be. So our goal as performance tuners is quite often decrease the wait states. This is all good information to have mostly, however, historically we only had it at the server level, at sys.dm_os_wait_stats. If you’ve consolidated 27 different applications onto your single big, huge server and you see a shared lock wait, what’s that coming from? Who’s doing it? How do you act on this? This is at this point not useful data.
We had to get a little lower level. There’s a couple of other DMVs that were created. There’s a session one, a request one. What we’re going to see is something even better that’s been added to SQL Server 2016. I’m going to run this script in another tab and what it’s going to do is start creating some really heavy I/O waits on my box. Here we are in sys.dm_os_wait_stats. You can see, I’m getting these I/O completion waits. Heavy, heavy waits. You can see that waiting tasks count is going up every single time and my wait times are going up.
There’s this other one called sys.dm_os_waiting_tasks. That’s supposed to show me what’s happening right now, who’s waiting on I/O completion. But when I run it, I don’t get anything. Now if I keep running it here, we’ll see what happens. Sometimes I get lucky and the issue is if these I/O completion waits are very, very brief, I’m getting lots of them, but they’re only lasting for—there we go. I finally caught one. They only last for about a millisecond each. So unless I get lucky, I’m not going to see who’s running it and I’m not going to have enough information to figure out how to go debug it.
In SQL Server 2016, there’s a new DMV, sys.dm_exec_session_wait_stats. This one collects information similar to the instance level wait stats DMV. It records the information over time but instead of doing it on an instance level, it does it on the session level. So this actually is not saying who is encountering I/O completion, this is who has encountered I/O completion. I see session 58 has encountered I/O completion. You can see the waiting tasks count is increasing. That data is kept as long as the session is going. So that’s huge from a monitoring perspective. Really excited about that DMV. At some point, that will be showing up in sp_whoisactive. I’m going to stop this before it fills my entire hard drive.
We’ll come back over here. What I’m going to do is I’m going to take the data that I just inserted, I wrote this completely meaningless query. The reason I wrote this big, huge meaningless query is to have it do some work so that I can show you live query statistics. I’m going to go, query, I’m going to do include live query statistics. I’ve just turned that on. What that’s going to do is Management Studio is going to go get an estimated show plan. Then it’s going to run the query and it’s going to start collecting every 250 milliseconds it’s going to ask what’s going on in the plan. It’s going to update my plan on the screen for me so I’m actually going to be able to see the numbers flowing through.
There’s my plan. If I scroll over here, what we can see, notice the numbers increasing. Those are the actual rows, the row counts. As those rows are flowing through that hash match iterator. This is like one of these big twelve-hour plans. I could go attach to it. I could go look and I could say, “Where’s the bottleneck?” Then I can just see it right there on the screen. There’s no more clicking around, guessing, trying to figure out what’s actually happening. This is a huge feature from an insight perspective. Very, very exciting for those of us who do a lot of query tuning.
Another related thing they’ve done, I’m going to turn off live query statistics. I’m going to turn on actual execution plan. I’m going to hit F5 there and we’ll wait a couple seconds for this to complete. What we’re going to see in the actual plan is a bunch of new information that’s been added. Hopefully this will finish fairly quickly. What is now exposed on a per iterator basis is—so previously what was exposed on a per iterator basis were things like number of rows.
Let’s go find a good one here. Let’s go grab these I/O waits. Let’s hit F4 to get some information. Previously, what we had was actual number of rows, 1172769. It doesn’t really tell us much about how long it took to read those rows but what we can do now is come over to this, called actual time statistics. Open it up. It actually took 236 milliseconds to read and process those rows on behalf of this iterator. I can come over to actual I/O statistics. This iterator actually did this many logical reads on behalf of my query. So this gives me kind of unprecedented insight into what’s actually happening in the plan. If the plan takes an hour to run, I can come look in the actual plan, come down and see which iterators took the time and which didn’t. Now I can focus where I need to focus. So huge, huge insight for, again, query tuning. I’m very excited about these features.
Brent Ozar: Michael asks, “For the attaching the live stats, does that put significant extra load on the SQL Server?”
Adam Machanic: That’s a great question. The answer is—if you turn on actual query plans, yes. If you turned it on for everything that would have a huge, huge, huge impact. I believe Jonathan Kehayias actually tested that in extended events and it was like 25 percent or some ridiculous amount. So you don’t want to do that. But this new trace flag that they’ve introduced in SQL Server 2016 SP1, there’s a blogpost on the SQL—what are they calling themselves these days? SQLCAT still or something similar?
Brent Ozar: Yeah, it’s something else.
Adam Machanic: Anyway, whatever they’re called, there’s a blogpost on their blog and they’ve actually tested it. They said it’s about two percent overhead.
Brent Ozar: Tiger team.
Adam Machanic: Tiger team, that’s it. The Tiger Team blog. So go look for that post and they talk about how they tested and what they tested. Seems like they didn’t see much overhead at all. They said it was basically almost negligible given common workloads. Like two percent doesn’t really mean much for most workloads.
Brent Ozar: Hugo asks on that wait stats by session, he says, “If the session disconnects and a new connection comes in, does that reset?”
Adam Machanic: Yes, the data will be reset if a new session comes in. The idea is it’s just that session that’s right there. There’s no history or anything. There’s no retention.
Brent Ozar: Okay, perfect.
Adam Machanic: One more thing I want to talk about. Let’s go ahead and create this proc. I’m going to create this proc, it’s called myproc. What myproc does it takes some value. We’re going to pretend that for different values, sometimes myproc runs for a long time and sometimes it doesn’t run for so long. So we’ll just pretend that it’s running for a long time. We’ll do wait for ten minutes. What we want to do, the problem that we have is that we want to figure out why it’s taking so long when I pass in a certain value. I’m going to go over here. I’m going to run it for 1, 2, 3, 4, 5, 6, 7, we’ll pretend like that’s the bad value. Why is it taking so long? I don’t know. How am I supposed to debug it?
Furthermore, if it’s taking a long time, I want to know which parameters cause it to take a long time and which parameters don’t cause it to take a long time. How do I get this information? Well, you can’t always get the information but sometimes you can. This one is running on session 58 so I’m going to throw a 58 into here. What I can ask for is sys.dm_exec_sql_text. What that does is it gives me this. It gives me the text of the proc that’s running. Every time I update Management Studio, it gets rid of the line breaks unfortunately, and it’s done it again. So that just gives me the text that’s running, that doesn’t tell me what argument I passed in. That’s kind of annoying.
To get the argument, you have to call DBCC INPUTBUFFER. That given the way that I called it here, shows me 1, 2, 3, 4, 5, 6, 7. So if I was trying to debug the situation, I could call DBCC INPUTBUFFER and I could get this value. That’s great, that works. But what if I want to write a script? What if I had a monitoring script called sp_whoisactive and I wanted to put this functionality in there? Well what I’d have to do, if you go look in sp_whoisactive, it’s annoying. I have to write a cursor. I hate cursors and I have to write a cursor. It has to go out and loop row after row because this DBCC INPUTBUFFER, I can’t join to it in a query or anything. It’s awful.
Anyway, that’s fixed now. There’s this new DMV, sys.dm_exec_input_buffer. If you write a monitoring script, put it in there. You pass in a session ID, pass in the request ID, and it gives you exactly what we wanted all along. It gives you access to the buffer. As simple as that. Very small feature but it is just going to make life so much easier for those of us who need to monitor servers. So good stuff. This is what hidden gems are all about is just making your life that much easier. Any questions on that, Brent?
Brent Ozar: No. Two people said that’s a great idea, very excited about it.
Adam Machanic: It would be even better if it worked for RPC but, anyway, that’s another topic entirely.
Brent Ozar: Folks, if you are interested in lightweight query progress metrics, look in the Q&A. One of the attendees posted it as a question. I just put it in for an answer there.
Adam Machanic: Great. Thank you.
Adam Machanic: Let’s go back here. Let’s talk about manageability a little bit. Some really nice manageability enhancements in 2016.
First of all, stretch database. This allows you to define a rule that will let your data automatically get archived off into Azure. Very cool from a technical perspective. Super cool feature. Archives it automatically but when you query it, it will automatically go out and get the history if needed. So if you have really old data that never gets queried, idea is that you put it on cheap storage. Pricing-wise they didn’t quite get the right message there. I think that’s a work in progress, but, I’m watching this really closely. As I said before, I think this is going to be a revolutionary feature. We’ll see what happens.
Decoupled SSMS Install
Decoupled SSMS install. This is a huge, huge, huge one. All of you who are—even if you’re not running SQL Server 2016 yet, doesn’t matter—you can go out, right now, today, download for free Management Studio and install it on your computer. The wonderful thing here is you can install just Management Studio and download just Management Studio without downloading the rest of SQL Server. I mean, I don’t know why that wasn’t there forever because every time you needed Management Studio on one system, you had to kind of share out the entire two gig SQL Server installer and have the user just install the management tools or whatever. You don’t have to do that anymore. Now it’s completely decoupled.
Not only is it better from an install perspective, the SSMS team is also releasing fixes and updates to Management Studio on a very regular basis. You can get them without waiting for the full package. It’s amazing. This is a wonderful enhancement.
Basic Availability Groups
Another one, for those of you on Standard Edition, there is finally a non-deprecated way for you to maintain high availability, basic availability groups. Mirroring has actually been deprecated for several releases now. Availability groups were only in Enterprise Edition. So now there’s this middle version of availability groups that you can get in Standard Edition which is another huge feature. Then, there is a little hidden gem that I absolutely love. Let’s go out there.
Brent Ozar: Raul points out that stretch db in the UK is a minimum of 1800 pounds per month for the cheapest tier. Ouch.
Adam Machanic: Well, in better news, there’s this view now, sys.database_scoped_configurations. Let’s select from it and see what it says. What is says is that for tempdb, that’s the database I’m in right now, or whatever database you’re in at the moment, I have these following configuration options that I can set at the database level. This feature is called database scoped configuration. I can set MAXDOP at the database level. I can set cardinality estimator to use the legacy, the pre-2014 cardinality estimator, if that’s better for my workload. I can turn on and off parameter sniffing. I can enable various hotfixes in the query optimizer which I do recommend turning on.
Now, why is this cool? Imagine that you’re consolidating a bunch of databases onto a single server. One size does not always fit all. For example, quite often, I like to use parallelism for my big queries but certain databases don’t allow me to have parallelism enabled. Brent, do you know of any?
Brent Ozar: SharePoint.
Adam Machanic: SharePoint, thank you. SharePoint is the famous one. SharePoint you have to set MAXDOP to 1. So if you consolidate everything to one server, you have to have MAXDOP 1. If you have another database for which MAXDOP greater than 1 might be a good idea, you are out of luck. No more. You can now set a scoped configuration. Here’s how you do it.
This says alter database scope configuration, works for whatever database you’re in right now. So this would be for tempdb. Here it would set MAXDOP to 4. Now my MAXDOP for tempdb is 4. Then there’s this really cool thing right here, this is what makes this truly amazing. For secondary. What does this is actually sets a value that will be used on an availability group on the secondary side, the read-only side. Why is this so cool? Because what if I have like an OLTP app and some of my users want to run some really big reports against it?
On the OLTP side, maybe I want MAXDOP for example set to 1 or 2, something really small. But on the reporting side, I want it set to some big number so the queries will actually finish in a reasonable amount of time. Now I can do that. I can set one value locally and have the other value remotely. I can send my users off to the read-only side to do their queries. They don’t have to be constrained by the same rules. So love this feature. Very clever I think and very useful, especially as we bring more and more databases together. Any questions on that?
Brent Ozar: Yeah, a couple things. Does that keep when you fail over? When you fail over from one node to another in an availability group or a cluster?
Adam Machanic: It will keep itself up to date as the nodes change.
Brent Ozar: One that is slightly related to one demo ago but it’s so good. “Can I get the list of current values for internal variables and input/output parameters inside a stored proc from DMVs dynamically?”
Adam Machanic: The answer is no. You cannot get any of that information, unfortunately, not of it is kept. My hope was that Query Store would have that but so far they haven’t. Actually, though, I will advertise that I will have a blogpost related to this in the next week or two, maybe two weeks.
Brent Ozar: I thought you were going to advertise that you had a connect item open for it. I was like…
Adam Machanic: I actually have a way to get the information that I’ve been working on for quite some time. So I will publish that out soon. I’ve been meaning to publish it for about three months. I’m actually trying, I’m saying now live on the air for the record that I’m actually going to blog again.
Brent Ozar: Oh my god. You put a webcam in front of him and all of a sudden he starts making promises.
Adam Machanic: Let’s talk about one other area which is data storage. Data storage is obviously a big area for us in SQL Server. There’s been some great things in 2016.
First of all, columnstore. Oh my god. Columnstore v3 is amazing. It really is. I’ll show a couple things about it too when we go out to the demo. Basically, no more of the limitations. The query processer support is great. It just works. It’s amazing. It’s great technology, I’m stoked. Well, in some ways I’m stoked. I’m a professional performance tuner and it’s going to put me out of business, but other than, I’m really stoked about it. No, it really is an exciting feature.
Memory Optimized v2
In memory optimized v2, a huge number of limitations gone. This thing is finally something I would consider actually using in a production thing without having the SQL Cat team onsite all the time. Good stuff here.
Then there’s a bunch of little gems so let’s jump out and talk about some of them. First of all, I’m going to jump back to my SQL Server 2014 instance. I’m going to go into tempdb and I’m going to create this table. Brent, you are a Microsoft Certified Master.
Brent Ozar: That’s what they say.
Adam Machanic: Can I create this table?
Brent Ozar: Primary key nonclustered—primary key on a 1700… you shouldn’t create that.
Adam Machanic: I shouldn’t, but can I?
Brent Ozar: I don’t know if you can or not. I would bet that you probably could, sadly.
Adam Machanic: I can’t.
Brent Ozar: Oh, thank god.
Adam Machanic: The reason is because J is a char 1700. It’s going to use 1700 bytes per row. In SQL Server 2014 and back, I can only use 900 bytes per row in a key column. But, in 2016…
Brent Ozar: How the hell is this a gem?
Adam Machanic: How is this a gem? That’s a good question. I don’t know actually. I don’t know why this is here. I have a link. This isn’t a bug or someone messed up. They did this on purpose. For some reason I don’t know, but it’s there. So anyway, if you need a 1700 byte key, but hold on, it gets better or worse, depending on your perspective. Let’s jump back to tempdb here. Now I’m back on my 2014 instance. Brent, here’s a table.
Brent Ozar: Oh, god. I know where you’re going.
Adam Machanic: It has 32 columns. Can I create this table?
Brent Ozar: You don’t have J on it, right? It shouldn’t still work. I don’t know what the max is but it can’t be 32. Please, for the love of god.
Adam Machanic: In 2014, it’s only 16.
Brent Ozar: Wow.
Adam Machanic: 16 Gs. Let me go back to tempdb here, in 2016, this works. So once again, I don’t know why this has changed or what the idea here is, but I know you can do it. If you need to, it’s there. So I’m going to just put that right…
Brent Ozar: Yeah, several people ask, “What on earth is the use case for that?”
Adam Machanic: I have occasionally, very rarely, had maybe the need for like 18 or something, like just above 16. I’ve actually hit that before in some really complex data warehouses where there were like a ton of dimensions. Even then, you kind of want to merge those dimensions and deal with it another way, which is exactly what we did and we probably got it down to about six. So there’s techniques for working around this but somewhere out there, obviously, there is someone who needs this. If you’re that person, congratulations. Go crazy and then call me for the performance tuning you’re undoubtedly going to need.
Brent Ozar: Louis Davidson is rolling over wherever he is right now.
Adam Machanic: Next, let’s talk about compression for a second. Let me make sure I’m still on my 2016 instance. I am.
Brent Ozar: Darlene says it’s to match Oracle. Oh, I could believe that.
Adam Machanic: Okay. Oracle is an evil, evil product. I’m going to create a table called dbo.compress_me. It has an integer primary key. I guess there’s no key, doesn’t matter, though. It has an integer identity and it has a varchar max column called J. What I’m going to do is I’m going to insert 100,000 rows of the letter A replicated 2,500 times. So now I have a 100,000 rows in compress_me. Let’s say my storage admin is kind of angry. They need to me to get rid of some of that space utilization. So I decide to go ahead and use data compression and I’m going to use page compression. I’m going to run sp_spaceused, then compress it. What we can see is the table used to take up 266 megs, now it takes two megs. Data compression is good stuff. You’re never actually going to see real ratios like that because you won’t be compressing the letter A 10,000 times. But, you know, some of us do weird stuff.
Anyway, this has been around since SQL Server 2008 I believe, so nothing new here. I’m going to drop my table and we’re going to do something else with a slightly different setup. We’re going to recreate the table. Instead of inserting 100,000 rows that are 2,500 bytes each, I’m going to insert 25,000 rows that are 10,000 bytes each. So it’s the same data volume, just a slightly different setup. My storage admin is angry with me, so I’m going to do the same thing again. What happened? The data didn’t get compressed at all. As a matter of fact, it grew just a tiny bit. The reason for this is the page compression does not work for any values that aren’t on the core data page, which is 8192 bytes. My row size, the rows that are taking up all this space, are 10,000 bytes, so they don’t fit.
My storage admin is going to remain angry. What can I do about this? Well in SQL Server 2016, there’s actually a new feature that perhaps can help me. It is built in gzip in the database. There’s a new function called compress. What I’m going to do here is I’m going to replicate xyz 100,000 times. I’m going to run it through gzip and then we’ll see what we come up with and what we’re going to see. It works quite nicely. If you have the string xyz, you’re going to get really good compression ratio. In all seriousness, this is just gzip, just like you’d have in Windows or wherever. It’s in the database and you can use it there if you need to, which is good stuff.
Obviously, if you do use this in the database and for this to equate to the page compression that I mentioned before, it needs to be seamless. So what I’m going to show you quickly is how to take this compress function and actually put it into your database in such a way that your users won’t notice but your storage admin will notice and will be happy. Here’s what we’ll do. First, we’ll alter the table and I’m going to add a column called j_compressed which is going to be J, but compressed. I’m going to go ahead and compress J. That takes a moment.
Compression and decompression by the way take up a lot of CPU time, so make sure you have enough before you put this thing in place. I’m going to go ahead and drop uncompressed column J. Then I’m going to rebuild my index which is going to reclaim all the space. What we’re going to see is we’ve gone from 200 megs down to almost nothing because it’s just one letter. But in your case, you’ll probably still get pretty decent compression ratio for any kind of document you have. gzip is a pretty good algorithm.
The problem here of course is it’s not seamless. The compressed data is compressed and I’d have to go change my application. I don’t want to do that. So let’s actually expose it. I’m going to rename compress_me to compress_me_internal. If you do this a lot, you might want to have a schema called the internal schema or something like that. Either way, get it out of there. Now create a view that looks just like compress_me but calls the other function, which is of course, decompress. That’s what this does. Now I have something that looks just like the original table but it’s decompressed now.
I’ve hidden the data behind a view and the application can now talk to the view. Of course, we can’t write any data yet. So let’s fix that. A very underused feature in SQL Server, this has been around for a long time, it’s just not very used is the instead of trigger. You can actually put an instead of trigger on a view, which is very useful in this scenario. So here I’m creating a trigger on my view instead of insert. So the application is going to go try to insert into the view. Instead of doing the insert directly, it’s going to call into the instead of trigger. The instead of trigger internally will insert my data into the internal table and compress it on route.
So now I can go ahead and write data. I can read data. As far as the application is concerned, nothing has changed but my storage admin is happy. So pretty good feature for those edge cases where you are storage constrained and you really need to reclaim some of that space without going and rewriting your applications. This should be really handy. All in all, I don’t recommend heavy use of this in your databases because it really is heavy from a CPU perspective. But I think it’s really useful stuff to have around.
Brent Ozar: Hugo asks, “Are you going to need an instead of update trigger as well?”
Adam Machanic: Yes. I’m going to need an instead of update and an instead of delete and I will leave that as an exercise for you, Hugo, to blog about yourself.
Brent Ozar: There we go.
Adam Machanic: All right. Final thing I want to talk about is columnstore. Let’s go over to 2014. As you probably know if you’re watching this, columnstore is this great new way to store data that is also hooked into this batch mode for data processing that allows everything to run a lot faster in most cases. Good stuff but in 2014 it was so limited.
Let’s go create a table called non-clustered columnstore. I’m going to put an index on my table, which is a non-clustered columnstore index. That was like the big thing that was added in 2012 when it was introduced. It was clustered columnstore and you couldn’t write to the table at all after it was created. So it was almost completely unusable. In 2014, we can create this non-clustered columnstore index. Then we still can’t write to it. So that’s not too good.
What I can do in 2014 though is I can create a clustered columnstore. Now I have a columnstore index that I can write to but now—I’m a data guy. I’m a data management guy. I really like data integrity. I know it’s crazy but data quality is kind of important. I want to add a primary key to my table so that I don’t have bad data and I can’t do it in 2014. It’s kind of awful. So, yeah, that’s 2014. Kind of rough limitations there.
What I’m going to do nonetheless is go over and run a little test here of some performance. I’m going to stick 20 million rows into my table. I should have probably run that before the talk, shouldn’t I? We’re going to let that run in the background for a minute. While that’s running in the background, we’re going to go to 2016. There we go. That just finished but we’ll go back to that in a second.
I’m going to go ahead and create my table, this is 2016 now. I’m going to create my non-clustered columnstore. I just created the index and I can write to it. I can write to it all day long. Writeable everywhere. I’m going to create my clustered columnstore table. My clustered columnstore index and I can write to that all day long. I can even enforce data integrity. Beautiful. So now I get all the benefits. What are those benefits? Let’s go ahead and run this same thing on 2016 that I just ran on 2014. That will take a moment to run.
While that’s running, let’s go over to 2014 and run this guy. I’m going to actually read 20 million rows here. My laptop is not the fastest thing in the world but it’s pretty much instantaneous against columnstore. That’s kind of the benefit of columnstore, aggregations are just lightning fast. Now over on 2016, aggregations are also lightning fast. Good stuff. Now let’s go back to 2014 and I’m going to just put a row number into my query. When I run this on 2014, we kind of sit around and wait for a while. Suddenly columnstore is not so nice. We can wait and wait. I’m not going to wait. Well, actually, it finished. So I didn’t have to wait too long. It took 9 seconds to run.
When I go over here on 2016, that query flies. So on 2016, columnstore, writable all the time. You can constrain your data the way you actually need to in order to enforce data quality. And, it’s lightning fast, even for cases where in 2014 it wasn’t. That’s why I said earlier I’m super excited about this feature and a little bit scared about this feature. Luckily, they’ll always need data architects, so I’m still kind of good there. Anyway, any questions on that?
Brent Ozar: No, you’re good there.
Adam Machanic: Okay. Let’s go and just finish up here. Looks like we’re right on time. Hopefully I’ve shown you that 2016 is finally the major release we have been waiting for all this time. Lots and lots of good features in there. Some of them a little v1 but good foundations for the future. A lot of the smaller features, in my opinion, are just as valuable if not more because you can start using them right away and they make life easier and I love that. So hopefully we’ll see lots more hidden gems in future releases of SQL Server.
I want to thank you all for joining me today. If you want to get the deck and the demos for this session, you can grab those from the URL right there, tinyurl.com/2016gems. If you want to contact me, if you have any questions or anything like that, email@example.com. The session’s link on GroupBy is at the bottom. So thank you so much.
Brent Ozar: Thank you for volunteering to teach everybody here today. That is totally awesome. Big round of applause there. Several people have put in the Q&A “great session.” “Adam, you are fantastic,” George says. Samuel says, “Clap, clap, clap, clap.” Alexander says he thinks we need another 30 days of sp_whoisactive feature…
Adam Machanic: I said I would blog, I didn’t say I’d blog like a madman.
Brent Ozar: Chrissy asks a good one. She says, “Out of the features, what do you consider version 1-ish in nature?”
Adam Machanic: Like temporal tables, for example has some limitations. Query Store, although very cool, has some pretty major limitations. Dynamic data masking.
Brent Ozar: Yeah, yeah.
Adam Machanic: That was the one I was able to defeat in like two minutes. So there’s various things.
Brent Ozar: Raul says, “What’s your opinion on new features coming out in service packs?”
Adam Machanic: I love it. Absolutely love it. I don’t know why they had that prohibition on new features in service packs previously. It’s like, in my opinion, people want new stuff. Get it to them as quickly as they can get it. As long as that new feature doesn’t break something, and if it’s truly a feature, why should it break anything? If it’s like a new command that you can run, as long as your old code doesn’t run it, your old code isn’t going to break. So it seems like a very low risk, high reward kind of thing to me.
Brent Ozar: If you’re changing data file formats, then I kind of get it, that’s kind of riskier, but as long as you’re not doing that, it’s new features for free.
Adam Machanic: Yeah, I love it.
Brent Ozar: Josh says, “First time seeing you live. You present very well, thank you.”
Adam Machanic: Thank you.
Brent Ozar: Yes. I sat through the MCM rotation in Redmond and Adam was one of our teachers. I was like, “Good job, maestro.” Smart guy.
Adam Machanic: Thanks a lot, Brent. And thanks, everyone, for joining.
Brent Ozar: Thank you.