Photo by Jakub Jankiewicz
Enjoy the Podcast?
This session is for users who are new to writing T-SQL.
DBAs and developers who want to work with JSON efficiently in SQL Server 2016.
The war between DBAs and developers has been raging since the dawn of relational databases. One reason for disagreement comes from developers who want to store their data in JSON because it is fast, standard, and flexible. DBAs cringe when they hear of long text strings being stored in their SQL databases; they cry with concern, “No data validation? No schema binding?”. Is there any hope for these two warring factions to see eye-to-eye?
This session will explore the new JSON functionality introduced in SQL Server 2016. We will use T-SQL examples to learn how these functions can be used to parse, create, and modify JSON data. More importantly, we will discuss how to optimize performance when using these functions.
By the end of this session DBAs and developers will know how to efficiently work with JSON in SQL Server 2016. It will also usher in an era of peace between DBAs and developers…
… at least until someone brings up the topics of cursors, NOLOCKs, or Entity Framework.
Why I Want to Present This Session:
As a developer, I love JSON. However, I started my career in the world of SQL’s take-advantage-of-relational-databases-instead-of-row-by-agonizing-row-processing and I’m aware of the terrible SQL queries that developers are capable of writing. This session is exciting to me because with SQL Server 2016 it is now possible for developers to use JSON in SQL while DBAs can be happy that those queries aren’t killing the server.
For additional resources, please see my blog post series on the topic of JSON in SQL Server 2016:
When Is It Appropriate To Store JSON in SQL Server?: https://blog.bertwagner.com/when-is-it-appropriate-to-store-json-in-sql-server-8ed1eed1520d
Part 1 – Parsing JSON: https://blog.bertwagner.com/json-support-is-the-best-new-developer-feature-in-sql-2016-bbb3ced89ce8#.s7zd2uo1u
Part 3 – Updating, Adding, and Deleting JSON: https://blog.bertwagner.com/json-support-is-the-best-new-developer-feature-in-sql-2016-part-3-updating-adding-and-deleting-24cea5fc0ab5#.6uedy635g
Part 4 – JSON Performance Comparisons: https://blog.bertwagner.com/json-support-is-the-best-new-developer-feature-in-sql-2016-part-4-performance-comparisons-bd04dc7635f3#.xtqnqjpam
Part 5 – JSON DateTimes: https://blog.bertwagner.com/who-stuck-these-letters-in-my-datetimes-b42cbd6e987d
Brent Ozar: So in the first session of this morning’s GroupBy, Bert Wagner is returning to GroupBy, and this time he’s going to be talking about DBAs versus Developers, starring JSON in SQL Server 2016. So take it away Bert.
Bert Wagner: Yes, great. Well I am excited to be here again, and I’m excited to have so many SQL professionals online this early in the morning, who want to talk about storing large strings of data and their databases, and the title for this comes from DBAs versus Developers. There’s this war going on, everyone’s always a pain in the butt to everyone else. As a DBA, the way I look at DBAs is kind of the heroes maintaining the servers, they just want to keep things running. They don’t want to get these firefighting calls early in the morning. Then you have developers who want to use the latest features and make their jobs easier, be able to develop new features and deliver to the business as soon as possible, and the problem with that becomes – when you have a feature like JSON and you have a developer who just wants to use it right, suddenly they come in and they just want to go into SQL Server and start building tables like this everywhere, right. JSON’s a great thing, let’s convert all our tables to JSON. And I just want to be upfront – that’s not the discussion we’re going to be having today. I think JSON’s a great feature in SQL Server 2016 and there are great uses for it, especially in a hybrid kind of environment where we’re utilizing the relational database functionality that’s existed for many years now, as well as the new JSON functionality and there’s a happy medium to kind of be able to satisfy both DBAs and developers, and that’s what we’re going to be talking about today.
Brent Ozar: I just have to say this. So I muted my thing and I was just going to watch and then as soon as I saw this slide I actually freaked out so much that I spilled coffee all over my desk. I’m like oh my god – alright, so now carry on.
Bert Wagner: Yes, I’m sure everyone’s kind of known some developer I’ll say, that’s kind of like this, that just go to the extremes. But we’re not – this isn’t going to be some NoSQL is taking over the world kind of presentation. We’re going to be talking about SQL Server and how to make JSON fit into that world and make everyone happy.
And so I just kind of want to start out here with an overview, what JSON – what it is, in case people are unfamiliar, when is it appropriate to use in SQL Server, when is it not appropriate? Like that first slide I showed, and then we’re just going to dive right into the demos, take a look at some real life kind of examples of how I use JSON in SQL Server and probably what a lot of people are interested in, what does the performance of that look like, both using the JSON functions and JSON in SQL Server, as well as how does it compare to the existing technology. So we’ll take a look at XML and JSON .NET to see does it make sense for us to even bother with JSON in SQL Server or should we keep all that in our application layer.
So Volkswagen’s a string, we can also have other values that are non-strings, like 2003 for the model year of my car. So we have strings and integers, and we can also have more complex objects. So here for the model property, we’re going to – instead of having just a simple scalar value, we’re actually going to model another JSON object. It’s going to be a child object here. So our model has maybe multiple properties, one of which is BASE, so it’s a Volkswagen Golf, and then the trend is GL, which is – it’s only the baseline but to kind of make myself feel better about it I tell myself it stands for grand luxury. It’s way better that way. And so we close that with another closing curly brace there, and so we have strings, integers, we have complex JSON objects that we can store as values in JSON. We can also have a RAISE. So for colors, obviously my car is white, but you might see if you look closer, some auto store spray paint touchup I needed to do. I wanted to give the car a little bit of a glisten, so we’ve got pearl on it as well. And if you are from some of the more northern snow affected states, like Chicago in winter or Cleveland in winter, you might also get some rust with all the salt on the roads. And don’t mind the giant hole in my back window there. We’re focusing on the rust, that was from a totally different incident, but basically here are the three colors of my car, and we can store those just as a string array. You can also have number arrays and anything else, JSON object arrays, it’s very flexible.
Brent Ozar: What – I have to know – what happened?
Bert Wagner: Yes, so we’re driving – this was two, three years ago. It was whenever that big polar vortex was in the region and I don’t know, it was like -20 that day and we’re driving down the highway, my co-workers and I were filled, we’re in this two-door Volkswagen right, so there’s four or five of us crammed in there, we’re driving to our volleyball match and we’re driving down the highway and it just sounds like a gunshot. This window just explodes.
Brent Ozar: Out of nowhere.
Bert Wagner: Yes, out of nowhere. There are no rocks flying, it’s not like we were going under a bridge or something, it just explodes. My best guess is the rear defroster was on and maybe the temperature difference caused something to happen, but yes, that was not a fun day because I couldn’t get it repaired for two days. So even the next day it wasn’t -20 degrees but it was you know -15 degrees and that was not enjoyable to drive around in with.
Brent Ozar: Wow. Alright.
Bert Wagner: But it’s all fixed now so no worries. So my purchase date here, I bought it in 2006 and we have that stored as a date time in a little different format than maybe what you’re used to if we’re just used to looking at date time in SQL, and we’ll explain what all those extra letters there mean. But the key thing to kind of be aware of is that date time is just a string. There’s no official date time type with JSON data. It’s a string so we’ll have to handle it, convert it to a date time if we want to do anything with it in SQL. And then finally, our closing curly brace. So that’s how you kind of model JSON data, that’s what it looks like. It’s not really difficult, it’s very flexible. I will be using my car here as an example throughout the demo, so you’ll be seeing this very often.
I mean, additionally, like I mentioned before, APIs – I don’t think there’s any de facto standard of you know, what the response type for APIs are, but basically every API I’ve looked at uses JSON nowadays. XML is definitely falling out of fashion and if you want to be able to use these different social media APIs or if you’re working with vendors who are using APIs, so probably in JSON, so if you haven’t started encountering JSON yet, you probably will some point in the future. It’s good to be able to – you don’t need to deal with this JSON data coming from an API in an application necessarily. You can handle it now in SQL Server.
Storage size, so just looking at this – my Volkswagen data here, comparing just what it looks like in XML versus JSON, you can see that JSON’s way shorter. It doesn’t have all those repetitive tags of opening and closing element tags that XML has, so if you’re transferring around data over a network, the JSON data’s way more compact, and that’s the network size. We’re talking about the XML data type in SQL Server, it’s not always the case. It can actually be smaller sometimes. I should have added an asterisk here, but we’ll take a look at how that compares in SQL Server later. But in terms of just size for storing this data in applications and transferring it, bandwidth usage and speed, JSON wins hands down. It will always be shorter than the XML equivalent.
So what are some appropriate uses of JSON? We saw our developer example in the beginning. We can all agree that’s an inappropriate usage of just converting everything to JSON, so when would you want to use this in SQL Server? So if you are working with an API or if you’re working with some vendor who’s sending you data and it’s in JSON format, why not just store that JSON data directly in your SQL Server table, right? Especially if you’re just staging it, it’s going to be used by some process later on. Could’ve always done this in SQL Server, even before 2016, like there’s nothing stopping you from dumping JSON data into nvarchar column, and that’s actually a really cool feature. So in SQL Server 2016, there’s no JSON data type. It’s just nvarchar is what you use. So even if you’re not on 2016 yet, you can kind of prepare for it by – if you are using JSON, don’t store it in varchar. Store it in nvarchar. Then when you do upgrade your server, you’ll get all these functions working right from day one without any kind of conversions having to happen.
But yes, if you’re receiving JSON data from somewhere, why not store it in the database? And then you can either send it straight to the application that way if you want to start doing some validation or transformation on the data, you can do it right there in the database before you then send it across the network to the next step in the process.
One of my favorite uses of JSON in SQL Server is for error logging. So if I have an application, lots of different moving parts and there’s an exception that occurs somewhere and I want to log the – maybe I want to log the data that was being passed around at that time, I can easily store you know, however many properties of my model data that I was passing around into a JSON column. Then in another part of the application, if there’s another error or something, I can save a totally different format of data. Now, you could have always done this, you can just build a very normalized SQL table and you know, put our – have a property type column and a property value column and you have 10 or 20 rows of data represent one exception message. But that’s not always the easiest thing to query, especially if you’re just using it to look up error messages. That’s where JSON really shines. You’re not bound to any particular schema and you can store all that data, easily access it to look at it at a later date.
Alright, so another great use is non-analytical data. So I have a web developer background, so a lot of times I’m writing session data to – I need to save it, if I need to persist it on another database, I’m going to sort in JSON because that data is never going to be used in SQL Server for anything. It’s not going to be used anywhere else. It’s just going straight back out to the web app. So that’s a great time to be just keeping that data in JSON instead of having to transform it back and forth, map it to some table or use some ROM to map it for you. Other types of data just like user preferences right, if your application lets the user choose the background, the color in the application, do you really need to know if it’s red or blue or are you going to be running analytics on that? Probably not. So just keep it in a format that’s easier to use. Same thing with just variables like administrator emails or static content with the flexibility of not being bound to a certain schema, it’s easy to maintain that kind of data and then change it later on without having to change the structure of your SQL table.
So those are some of the appropriate usages, and now let’s talk about what’s inappropriate. So obviously high-performance right, I don’t think this comes as a surprise to anybody. Anytime you have to run a function to parse into string data, it’s going to be a row by row, painful experience. There are ways around that, which we’ll talk about, but if you have an application that you know is just going to be getting hits thousands of times an hour, don’t have queries that are going to have to parse JSON data for you. Just break it down into individual tables, columns for that ultimate performance. You’re not going to get it, you’ll never get it if you’re just keeping everything in JSON.
Also, there are some ways to validate data in JSON format in SQL Server, but it’s definitely not – there are not as many options as if you were specifying different constraints and data types on the SQL data table. So if you’re able to do all your validation client side and then the back server side, a validation doesn’t matter that much. JSON’s okay, but if there is a need to really keep your data consistent and to follow certain constraints, JSON is not the tool. There are ways around it, you can manually code lots of checks and making sure that the data is valid, but why go through all that work if SQL Server will do it for you automatically. It will tell you if the data is a date time without you having to do any extra work.
And so finally, the last inappropriate usage and it’s probably the biggest inappropriate usage that I am guilty of, too, is just being lazy. This is my cat Charlie, he is very lazy, he likes getting in the way of everything. He’s actually laying down on my keyboard, I don’t think you can even see it underneath him there. He doesn’t like me working, but that translates to me also being lazy sometimes, so especially in the morning if I haven’t had enough coffee, I start developing something, immediately my like lizard brain reaction is just to – let me take this data and just dump it into a JSON column in SQL. I’m not going to parse it out, I’m not going to do anything with it, forgetting about all the requirements of does performance matter, am I going to need to be running analysis on this later on. Just dumping your data into a table and nvarchar MAX column is not a great idea. You’re not thinking it through right if you have requirements to make it perform fast. That’s just being lazy, that’s just adding technical debt to your project that you’re going to pay for down the road at some time. So don’t be lazy.
James Anderson: Cool.
Bert Wagner: Cool, alright. So for the rest of the presentation, we’re just going to go through some real life demos here of how you can use the different JSON functions, how they work and how to get really great performance out of them, because they are amazingly quick I think. I was surprised by how great they are, so I’m just going to jump screens here. There we go, and so the first demo I want to kind of go over is the example of – we’re getting data from an API, a vendor, and we want to just get that data in SQL, maybe we want to parse it out a little bit into different columns so that it’s then ready to be processed by a report or something else. So we’re going to take that data that we saw earlier, and just insert it there.
And so like I mentioned, there’s not JSON data type in SQL. If you’re inserting JSON data and you want to be able to use these JSON functions, you’re just going to put it into a nvarchar column. Here I’m doing nvarchar MAX, and so there’s nothing special. It’s not like you’re used to using the XML data type in SQL where you need to create special columns, there are limitations with those columns, none of that, just nvarchar. And so when we insert that, we get our data right here inserted into SQL Server. And now since for the rest of these examples, it’s going to be kind of useful to refer back to this. I have this app here that will – can you guys see that? Can you see the app on top with the code there? I just want to make sure everyone is able to see it. I’ll take no response as a positive response, but we’ll keep this in the top right corner here so everyone can keep in mind what our data looks like as we go through. So the first thing we’re getting – you can or cannot?
James Anderson: Can, that’s fine.
Bert Wagner: Excellent. And that software, by the way, it’s called On Top Replica. I mention it because when the first time I presented this at a user group, a local user group, everyone flipped out about wow, this is such a cool tool, you get to take a screenshot and keep it on top of your data, and it is really cool, it’s really nice for presentations. So I have an appendix in my slides where there’s a link to that, it’s – used to be on Codeplex, now it’s on GitHub, so if you want to use it for your own presentations, just go ahead and download it.
But with this data, we have – the first thing we want to do if it’s coming on to SQL Server is just check that it’s valid. JSON syntax right, so there’s this IS JSON function, and that’s all it does. It returns a bullion, 10, is this in a valid format. That’s the only specific kind of validation JSON function that’s available. It’s not going to tell you if your JSON data is bound to a certain format or a schema. It’s just telling you whether there’s a syntax error or not. So it’s always good as a first step. There are some other validation things that we can do which we’ll take a look at in a minute, but the next JSON function we’re going to look at is JSON_VALUE. So JSON_VALUE is just going to return a scalar value from your parse JSON_QUERY, and from JSON object, excuse me. So we’re just passing in our JSON field here, just called Garage, and then there’s this JSON path selector, similar to the X query path selector if you’re used to the XML. This just has a little more dollar signs and dots, and so we can use that to parse out different parts of our data. So for something like Make and Year, does exactly what you would expect to do, it’s able to just take that string data and parse out the values that are representing by those keys, really straightforward and simple. Now if you try to pass in something like model, if you remember, model is this complex JSON object which has two sub-properties, child properties, if you try to run JSON_VALUE on that, it’s not going to work, it returns a NULL because it’s not a scalar. It’s a complex object. So the way to kind of handle that, one way is to use JSON_VALUE and to add some additional dots and property names in your JSON path expression there, and if we do that, we’ll get down to the data that we need.
So using these JSON path expressions, you can kind of navigate down the JSON tree as far down as you need to go and get the data you need. Now, that’s maybe not the most efficient way, or maybe you don’t know – maybe you don’t want to parse out that data, you just want to return the JSON object itself that’s stored as the value, so that’s where JSON query comes in. It returns an actual JSON object, so selecting the model property, it returns that whole JSON object that has both the base and the trim properties in it.
Combining JSON_VALUE with JSON_QUERY together, we can create queries like this where we are returning the JSON object in a sub-query and then we’re parsing out the base property from that query. So we’re nesting functions. It’s not the most efficient way to do it, we’ll take a look at a more efficient way on the next line, or a couple lines down, but you will run into this scenario sometimes where you need to maybe extract a fragment of JSON complex, JSON object from a bigger JSON object and then you want to get specific values, so it’s definitely possible to just nest these different functions together to get the data that you need.
So continuing on, we’ve got our colors array in our data. White, pearl and rust, and like many languages, you can refer to array values just by their index number. So if we use JSON_VALUE with those index numbers and the JSON expression, we get the different values out of an array, and that’s really good if you know how many array values you’re supposed to get out of that data. If you don’t know how many values are in your JSON array when your data’s coming in, you can use OPENJSON. So OPENJSON is a function that’s going to return results that’s on your data. So if we pass in our Garage data, and we say give us our colors array, this will actually return three rows of data to us, which is really nice. It’s kind of like a split function in SQL. It’s taking that element of array and returning everything on it on a separate row, and there’s some additional information that makes it really nice. We know the index of our values, so we know the order that they came in from and the array. Sometimes that matters. We also have this type, and so all three of these colors, they’re all strings, they’re all type one. But this will tell us if it’s an integer, if it’s something else.
And so we finally get to the date. So I have purchase date, and if we do our selection there, it parsed that data out, no surprise because it’s a string. Like I said, it doesn’t know that it’s a date time. It’s just parsing it out as a string, and so JSON dates – this is following ISO standard, I forget the number, but that’s where this T and the Z comes in. The T indicates that the data coming after that is the time stamp, so if you want to just date in JSON you can just pass in a string. The T indicates a time stamp and the Z is actually really useful notation to indicate that this time is in UTC. And JSON data is totally flexible. If you wanted to write your data in some other format, since it’s all a string, you can store it however you want. This is just the standard. So if you’re getting data from APIs, if those APIs from big companies, they typically like to follow standards, that’s what you’ll be used to seeing.
So right now, just using JSON_VALUE on purchase date, we get a string that’s representing a date time. So we want to convert that to SQL. Fortunately for us, with the convert function, if we want to convert to date time two, there is a display value property of 127 which will correctly convert that string that’s in ISO JSON format into a SQL date time, like getting rid of those T’s and Z’s very effortlessly. The only thing to be aware of is – since we specified date time two, it goes out to seven decimal places precision. Original data only had three. Maybe that’s a problem, maybe not, it just depends on your application and what you’re doing, but that’s easily solved by just specifying the precision for date time and it’ll correctly go out to only those number of decimal places.
One other thing, and this is just a SQL Server 2016 feature that I think is really cool, so because JSON store is that Z indicator that this is in UTC, we can use this new at time zone function in SQL that actually make working with our dates a lot easier, and JSON is the perfect use for it. So we know that if our data comes in with Z at the end of it, that date time is in UTC format. So what we can do is we can say okay, our data coming in is in UTC, and we’re going to want to convert it to, let’s say, Eastern Standard Time. So if we do that, you’ll see we get our SQL date time, convert it, so it shows 11 am, our original was three in the afternoon, so it’s four hours back from UTC, which is Eastern Standard Time, and we get this little indicator here saying it’s four hours back. So this isn’t really a JSON function, it’s just a nice tie in with JSON since with JSON most of the time you’re going to know if people are following that standard whether your data is in UTC or not. And so if it is, it makes it really easy to convert your data. No longer do you have to guess around with what time zone was my data in, was it daylights savings time or not, so that’s just really cool.
And so finally, what if we do something like purchase price, right? So our original JSON data here doesn’t have a purchase price in it. So what happens if we try to run that? By default, it’s not a problem. You’re just going to get a NULL, doesn’t exist. So if you’re okay with that, that’s fine. There’s this modifier that you can add, lax, which does the same thing. So by default JSON value has this lax modifier on the front of it, which means if it doesn’t find certain property, your JSON expression, it’s just going to return as NULL. It’s not going to break your SQL query. It’s not going to error out.
If you want to be able to validate a little bit better, and you know, hey, I’m getting this data in from a vendor and it needs to have this field and I want my process to continue if it’s missing, you can add this strict modifier to the front of your JSON expression, and that will then give you an error message saying that property isn’t found. So even though validation in JSON isn’t elaborate, you do have some options to make sure that the data you intend to get is actually there.
And so we’ve looked at these JSON_VALUE and JSON_QUERY functions and they’re really great when you need to just kind of select out individual pieces of data. Most of the time though when I’m working with this stuff, I want to take all that data and transform it at once, and that’s where OPENJSON comes in, so we saw that earlier with the arrays a little bit. If we run that on our whole query string, OPENJSON will return scalar values to us, like the Volkswagen and the 2003, along with the key names that they came from, as well as their data types. So this is one is a string and two is our 2003 integer. It will also parse just like JSON_QUERY. Our complex JSON value, so it will return a JSON object, which then we could do something else with if we want to, as well as arrays and our date times which are really strings. So OPENJSON is probably the most efficient of the functions for just parsing data out into a result set. That’s the one you want to use, that’s what you’re wanting to do.
It also has this option of adding a WITH clause, which will actually transform this data into a set of rows, and so we can define all the different … to a set of columns. We can define all the column names and their data types, so once again, a little bit of validation, as well as using the strict relax modifiers to be able to create our data into one row. So this is far and large the most used function that I use with the JSON data, because most of the time I’m getting something in JSON and then I want to convert it into a more regular table format, and this will do it just in one step, very quickly. So we get that and we can have NULLs and all that, so that’s how I work with JSON when it’s coming into the server and I want to get into a more useful format in a SQL data table.
I’m going to minimize this now because I don’t think we’ll need it anymore. We’ll jump ahead to this second example. So now imagine you already have this data in SQL Server. We’re trying to do some maintenance to it, maybe we’re updating historical data, and so we want to modify all the values, all the JSON values in the column in our table or maybe we’re getting rid of something we don’t want that data in our history anymore, how do we delete that data. So in this example, we’re going to take a look at how we make modifications to JSON that is already in SQL Server. So let me create this test table here and we’ll take a look at it.
Alright, so first row here, this is my garage here at home, right out back. We’ve got my 2003 Volkswagen Golf, lots of rust, 15 years old almost today, and then we have my wife’s nice new Subaru. No surprise there, she gets the nice new car, I get my old piece of junk, which I do love. I do like working on it. I know some people probably don’t like old junky cars because they’re just full of problems, they keep needing to be taken to the mechanic. I kind of look at it as a fun challenge. Gives me something to do – sometimes it’s inconvenient if I need to drive to work the next day and you know, crank shaft sensor decided to die on me so my car is in limp mode and it’s you know, maxing out at 35 miles an hour, but I do like working on it. But we have here as a second car, the white Subaru Impreza.
And so let’s just add a JSON column to every row of data. So we have a table, and we want to now update it with a JSON column, so we could use this for JSON path function, very similar to FOR XML path, that will simply create a JSON string from all of those rows of data. And what we can then do – so the downside to this one function, sorry let me step back, is if you see our colors here, it looks a little different from our previous example, right? Instead of just listing an array of color values, of white, pearl and rust, we’re actually seeing this for JSON path function automatically added these value properties. Just kind of annoying, it’s unnecessary. JSON allows just storing values in an array, so why add all this extra information to it. So by default, that’s – you can’t create value only arrays. But there’s a work around Microsoft provided in Adventureworks database, there’s this function, ToRawJsonArray, so I’ve included it here in case you don’t have AdventureWorks database installed. But basically, if you use that, it’ll take those values – there we go. And so using that function ToRawJsonArray, we – it gets rid of those extra value properties that we don’t need.
So we can just store a true JSON array of string data. And so let’s add a table for this JSON data – or let’s add a column for this JSON data to our table, so I’m going to add a nvarchar MAX column because I don’t know the size of my JSON data coming in. If I’m always working with small JSON data, obviously I would choose something that’s not MAX. Choose 4000 or less, but in this case I don’t know, so I want to leave it flexible, and we’re just going to update our table, setting that new JSON data column that we added to all of that data there. So if we run that, we should see our new JSON data column that is added there. That’s not going to get updated automatically or anything, I’m envisioning some kind of process where maybe there’s an app that’s going to start populating this data going forward with JSON data so this is like a one-time run that we wanted to do here.
So let’s take a look at modifying this data right. So I like tinkering with my car, I’ve made lots of updates to it, so even though it’s a GL model, I’ve really improved it to the point that it deserves the GLS marking, which is one level up in terms of trim. So I think it’s good enough to be marked as that so I’m going to actually update my garage and I’m just going to call this thing a GLS model Volkswagen Golf. And so to update that JSON string here, we use the JSON modify function, and the easiest way I remember the syntax for this is it works just like a SQL replace. We’re passing in our string or our JSON object here, and then we’re saying go ahead and find this property and then replace it with GLS. So we’re going to take our model trim property, which is previously GL, and update it with GLS. So if we go ahead and do that, we will see right, our trim got updated to GLS.
So very easy to update JSON code in the database. Now, let’s pretend I’m ready to sell my car and I want to sell it, so I’m going to list it, I’m going to add a whole new JSON property that never existed before, it’s going to be list date. We’re going to do the same thing as we did before with JSON modify. JSON modify is going to be the function we’re using for any kind of JSON manipulation in SQL Server. We’re just going to update our Garage table, we’re going to add a list date column with the date right now and we’re going to update our JSON data column, adding that list date property. Once again, find replace, so if it doesn’t find list date, instead of updating it, it’s going to create a brand new property and then we’re going to put our date time into there.
So if we run that – it would help if I didn’t skip the line where I add the list date column. So there we go. If we go to the end here, we’ll see we now have a new JSON key value pair for list date. So my car’s for sale, everyone watching, if anyone wants to buy it, I’ll make you a great offer on it. Just email me afterward, be happy to sell it to you. I can tell you exactly what’s wrong with it and that might take a whole hour demo presentation in itself to go through everything.
So let’s say I list this car, I list it as a GLS, even though the vin is really showing it as a basic GL model. I don’t think I’m getting lots of buyers interested in it because it doesn’t really match – the description doesn’t match what Carfax is saying it should be. So I want to go back – let’s just get rid of the trim because I don’t want to list it as a GL, I’m not going to get the price that I want for it because I have made so many improvements to it. Let’s just hide the trim. Let’s not say, it’s just a Volkswagen Golf, we’re not going to include the trim. So if we want NULL out this trim value, if we want to delete this trim value, I ruined the surprise, all we have to do is NULL it out.
So we’re going to find the model trim property, we’re just going to set it equal to NULL and the JSON function in SQL is going to be smart enough to get rid of it. So now we have our model properties and trim is no longer there. That only works with properties. If we want to do something like get rid of a value and an array, let’s say before I sell this car obviously I clean it, I vacuum it, I also decide to go to the auto store, get another color can with color pearl to paint up, touch up all that rust so it’s not visible, increase the sale value here. So I’m going to get rid of that rust value from our colors array.
I opened up a connect item at one point, Microsoft said that’s an intended feature there because it would have to shift the array around and it didn’t want to do that. So how do we get around this? So I’m just going to reset our data here so we get our rust value back in the array, and so setting the value to NULL doesn’t delete it. Unfortunately, we want to get it to delete properly. The only way to do this that I know of, or at least the most easy way to do it is to go back to our good buddy FOR XML path and – I’m not going to waste everyone’s time going through all this code, it’s here if you need it, but basically we’re just doing a lot of string manipulation to get that rust color out of there completely.
With 2017 it might become a little easier, at least we don’t need the FOR XML path. There’s a stream aggregate which will kind of do the same thing, so it’ll make it a little simpler, but ultimately I don’t think this is a great way of having to delete the data. It drives me nuts, so there is another connect item I have out there to add a flag to that modify statement, specifying whether you want to delete a value in an array or not. Please if you’re using this stuff and that drives you nuts like it does me, then go vote on that item and hopefully we’ll get it fixed here in the future. And I just want to put out there right, I saw the 2017 release, there were some JSON improvements made to it, so I’m very happy to see it’s not like a feature that Microsoft added and just stopped supporting or stopped developing new things. Think you can use JSON in cross applies in 2017 and there’s some check constraints that you can use with JSON so it’s cool. Cool to see that they’re working on it, makes me hopeful that maybe we’ll get some kind of JSON modify function argument that we can pass in to truly delete a value from an array, instead of having to do this mess with FOR XML path.
Alright, so time for some performance demos here. Right, this is what everyone usually wants to know, is how fast is this stuff, how do we get this stuff running fast. Especially if you’re a DBA, right? You don’t want this new JSON functionality to kill your servers. Maybe it would be good for developers, but you like sleeping through the night and not being woken up at four in the morning to solve performance problems.
So let’s take a look – I’m just going to run this, make sure my stats are on. I don’t know how accurate these stats will be right now because I’ve got a bunch of other applications running but let’s take a look at this test table I’ve already created. The script for this I’m including in – all the scripts that I’ve uploaded on my website if you want to recreate it for yourself, but it’s basically almost 20,000 rows of year, make and model data from 2001 to 2015 or something like that.
So we have all this JSON data in this table called cars, and let’s say we want to filter this data, and only return the rows where the model is Golf. So this is how we would write the query, we just use our JSON_VALUE function in our WHERE clause. We run it, by default this will be pretty slow – 295. So obviously I’m running a lot of stuff on my computer right now. My RAM is before too – it’s about 130 milliseconds. Not bad for a 20,000-row table I guess, especially because there’s a lot of writes doing a table scan, then it’s actually parsing out the JSON values for each of those rows and then filtering to only the ones that are Golf. But it is painful. If you’re going to run that at scale, it’s going to be painful, it’s going to hurt.
So the official solution for this is to add a computed column index. So let’s do that, and this is really cool how this works. We’ll get into a little bit of internals real quick for a more detailed explanation. But if we add a non-persisted computed column to our table, what does that mean? So a computed column is just – we’re deriving the value that’s displayed in this car model column that we just added based on the other values that are in this table, and the non-persisted part means this column isn’t being saved to disk.
So every time this query runs now, select from cars, we’re going to get these three columns, but this car model column is being derived on the fly. So right now, there’s not benefit there because it’s probably actually – it’s the same performance. I’m sure if we look at the execution plan, it’s doing the same thing. It’s still doing table scan, it’s having to calculate all those – all these values to parse out what the model is. But there are some cool things we can do with it, and if we filter on it right now, I just want to show you that nothing has changed yet. We run that execution plan and it doesn’t matter if we use the JSON value function in the WHERE statement or if we’re using our computed column name in the statement. It’s going to do the same exact thing behind the scenes. So just use whichever one is easier, but it’s still doing this painful table scan and then parsing out all the values and filtering them out.
So the solution is to create an index on that computed column, so even though the computed column itself isn’t saved to disk, as soon as we create an index on it, something amazing happens, and if we run our queries again, you’ll see we are now getting these beautiful clustered index seeks instead of those table scans, and there’s no more of that filtering happening, of those computations of parsing out that data. So how does that work? These computed column indexes are the official ends for making your JSON queries run really fast, and we’ll look at the performance test in a minute. It truly is really fast. In past presentations, this kind of confused people how it works, and it’s cheating because the computed column, even though it’s not persisted to disk, when you create the index on it, those values do become persisted to the index page. So what SQL is doing behind the scenes then when you run this query and you’re trying to filter on Golf for example, it’s going into the index where it has all those car models already parsed out and then it’s – that’s how it’s able to quickly do a seek to find that data.
So I know that’s a little confusing, so I just want o quickly take a look at how that – let’s actually see what that’s doing. So I think this will then open up opportunities for people to get a really good understanding of how they can make their JSON queries run fast. So I’m going to create this new table, deal inventory, it’s a little easier to work with than a 20,000-row table full of cars. But it’s the same idea, there are makes and models, and we’re going to dive into some database internals here and take a look at page data. So if I just turn on this trace flag, this will enable us to use these two DBCC functions here, and if we’re going to look at DBCC IMD, if you’ve never looked at this data before, basically what this is telling us is where all the data – which pages this data is stored in. SQL Server stores all its data in 8-kilobyte pages and we want to see how that data actually looks like in those pages.
So this page type of one is our page data, so I’m going grab this page ID and we’re going to toss this into DBCC page so we can actually see what that data on the page looks like, and I’m going to – this isn’t supposed to be a detailed how to use these page – DBCC page commands, I just want to show you how this indexing works because I think it’s really cool. So if we’re looking at our page data, we see our three columns here. We have an identity column, we have our 2017 year, we have our plain old JSON data. We haven’t added our computed column yet, we haven’t added any indexes to this new dealer inventory table. So this is all the data that’s saved to SQL Server. JSON still stays in its raw format. We go ahead and add our non-persisted computed column here, call it Make. We look at our table, we’ll see okay there’s our non-persisted computed column, and to verify it’s non-persisted, we’ll just go grab that page ID again – hopefully, it hasn’t changed, it shouldn’t have.
We’ll type that in, and if we go take a look at our page data, our data looks the same. It’s still only those three columns, our computed column that we created up here for Make is not here. It’s not persisted to the page data itself. Now here is where the really interesting part happens. If we create a non-clustered index on our computed column Make, there we go, and if we look at DBCC IN now, you’ll see we have a little bit more data, we have this throw here with page type of two, so that’s our index page, we grab that page ID, let me copy this guy from up here too.
So if we look at this first DBCC page call that we’re making, this is the page data once again. I know it’s getting a little repetitive but I just really want to prove to you there’s nothing changing in the data on the page. It’s still just these three values with our JSON totally unparsed. If we look at the page that actually has the index, it’s really interesting because now down here you’ll see our car makes are actually parsed out into individual rows, which then explains how when you’re doing a query with these JSON functions and they are indexed, you’re getting index seeks instead of scans because it’s not having to parse those. It’s not having to run a JSON_VALUE function behind the scenes to parse out your JSON string. All those values are already pre-parsed and saved in the index. Even if our computed column is not saved to the page, the data page, these parsed out values are getting saved to the index page.
So with that in mind, our JSON queries in SQL Server can run basically as fast as any other types of queries that you correctly write in index, because they work the same way behind the scenes. There’s nothing magical, there’s nothing special, there’s no special index types you need like with XML data. These are just plain old clustered, non-clustered indexes that you can use and get excellent performance with.
So the last thing I want to go over are just performance comparisons with you know – so now we know how to write these really efficient JSON queries. How does that compare with XML and with running JSON.net in C#, because that would be the other alternative, is doing this JSON manipulation in our application layer. I have these two files here of all these different performance tests, I’ve blogged about these a lot. We’re not going to go through these line by line, we’ll just take an overview, high-level overview of what’s happening, but I’m included them there and all the code so you can go download and go through them one by one if you’d like. But let’s take a look at how do these SQL JSON functions compare to XML.
So let’s start with the good. In almost all cases, if we’re just selecting data, manipulating JSON data versus manipulating XML data in SQL Server, JSON wins hands down. Way faster than XML, so just based on that alone, I would say going forward, if you’re trying to decide, you’re building something new, should I be using XML on my database, should I be using JSON, I’d definitely go with JSON, it’s way faster.
There are two small caveats, but they’re not as bad as they look. So if you have an XML data type in SQL, it is actually faster to insert XML data into that XML data type than JSON, which is a little counterintuitive, because the XML data is much bigger. We’re looking at physical size comparisons of the XML versus JSON, the XML is bigger. It’s got all those extra tags, but the XML data type in SQL Server doesn’t store it as an XML string. Parses it out to – takes all the elements, parses it out to like a separate internal hierarchy that stores, so the actual data size inside an XML data type column in SQL Server is smaller than the JSON equivalent. So that is true there, but that’s only if you’re using the XML data type.
If you’re storing your XML in varchar, in varchar you’re not getting that benefit. And the insert speeds are then faster too, because it’s writing less data to the disk, because the physical data size of that hierarchal XML is smaller. However – so that’s only on SQL Server. In the real world though, most of the time you are interacting with applications that are using this data, so if you consider the environment as a whole, think about an application that’s generating some data, let’s say it’s a website, it’s generating some data, some user input data, now we need to save that to a database. For saving it to XML, or inserting an XML column in our database, our application needs to serialize that data first, then send it to XML so that – to SQL Server so it can insert it to the XML column. Even though SQL can insert that XML faster once it gets to SQL Server, you have to remember the network speed will be slower. The transfer rate will be slower because that XML data, before it gets to SQL Server, is larger in JSON.
Additionally, if you run a comparison of like an XML serializer and this is in all those extra C# tests that I’ve included if you want to take a look at it, I use the default XML serializer from Microsoft and I compare it to the new install JSON .NET. It is much faster to serialize like a C# model into JSON data than it is into XML. So not only are you dealing with a larger physical size transferring over the network, you’re also dealing with a slower serializer in your applications, even though XML might be faster in SQL to insert, and the size might be better. You consider the entire application environment, it’s actually most of the time – right, case by case basis, but it’s probably faster to use JSON. Especially if you consider that if you’re dealing with web apps, you’re not even having to serialize your data in the application layer. It’s probably already serialized in JSON format, so all you have to do is send it to SQL Server. So when you add up all those times and performance test it considering the whole environment, JSON is definitely the way to go.
And let’s take a look at .NET now. So this really impressed me. Normally I’m used to working with JSON data in .NET using the JSON .NET library, which is super fast. It’s kind of – it’s the standard, I think Microsoft ships it now with MBC .NET 5 and it’s just really quick at serializing JSON, so I thought there’s no way SQL Server is going to be able to do even compare to that. So I ran all these different tests, so de-serializing large amounts of data, de-serializing small, smaller sets of data, and the amazing thing is that SQL Server’s JSON functions are basically the same as the .NET serializers and de-serializers. They’re just as fast. Here if I – the SQL Server I’m showing is a little slower, tiny bit slower, but none of this stuff is statistically significant. I can run this a thousand times to figure out who knows what actually comes out if I ran it, you know, enough times, but it is very comparable in terms of speed. If you’re filtering a predicate without a computed column index, so something that we would never want to do in like the production scenario, at least for high loads of data, SQL Server is still competitive to .NET. It’s almost just as fast.
If we do use those computed column indexes and we take that as an advantage, you know, hands down your SQL Server JSON parsing manipulation performance is going to be way faster than anything else you can do in an application. And that make sense right, because that data is already kind of pre-parsed, it’s just sitting there, it’s indexed, so it’s very fast query and use and so we have these other examples here.
Serializing data is a little slower, but usually, I’m doing the opposite. So that doesn’t faze me. Overall, JSON performance is excellent in SQL Server, and especially if you consider – if you’re getting data and you’re – SQL Server’s just one step in the process. If you’re then sending that data to some ETL, or just an application, if you’re able to reduce your data, your vendor is sending you these huge files and then before you send that out to your ETL, if you can use a JSON function to reduce the amount of data that needs to travel across the network, you’re just improving performance everywhere. And because of the SQL Server JSON functions are just as fast as like the .NET functions, you have no reason to do that in your SQL Server code because you’ll get better performance in your environments.
And so that’s it. Just as a quick recap, there’s a lot of good uses and there’s some bad uses too, like our developer friend at the very beginning of this presentation. For JSON and SQL Server, I don’t recommend that you just create tables of only JSON, but there’s a happy balance of creating tables with regular SQL columns and JSON data in them. You can fully manipulate that JSON data, all those JSON functions, open JSON, JSON_VALUE, JSON_QUERY, you can basically do anything you need to in SQL Server in terms of JSON manipulation.
Also, if you’re considering – if you’re building a new project, you for sure JSON is the way to go. Don’t even think about XML anymore, JSON’s way faster, I think it’s easier to write, I think the functions are simpler, there’s no special indexes or anything like that that you need to kind of be aware of, like with XML. It’s the way to go. And finally, JSON performance is comparable to that of .NET. It’s just as fast, if not faster, especially if you are using those computed column indexes and, you know, if you’re able to reduce some of the data that’s leaving your SQL Server to your applications, why not? That’s going to make all your applications run faster. So thank you very much for watching, I appreciate your time. Check me out on Twitter, email me with any questions you have, go to my website, download the slides and demo code and play around with JSON for yourself. That’s it, thank you very much.
Aaron Bertrand: Bert, there’s a question from the audience, any guidance on when to use JSON and SQL Server, versus a dedicated store or like Cosmos or Mongo?
Bert Wagner: Yes, you know, I think it depends. So I’m not a big NoSQL only kind of guy where I want to use Mongo or anything like that. I mean, some may have a biased answer I would say, I keep it – keep everything in SQL Server, it’s going to run faster, it’s going to be better, and if you’re using a lot of unstructured data, I guess that’s where those other databases come in, but I’m not an expert in that, I definitely prefer using SQL Server so.
Aaron Bertrand: Okay, any other questions before I post to the – in the GoToWebinar question area or in the Slack channel?
Brent Ozar: Let’s see here – I was going to ask you, Bert, how did you get started working with JSON? Like what made you decide to start working with this? Was it a business requirement? And if you can talk about it, like what was the business requirement like?
Brent Ozar: E Cub says, “Just want to say this was really good, as a DBA moving to 2016 I’m getting ready to start teaching my devs about the new JSON feature, this was really helpful for me.” Aaron, did you want to say something? Looked like you did, no?
Aaron Bertrand: Nope.
Brent Ozar: Shane says, “Quick question, how much does Bert hate X query and did that help with your change over to JSON?”
Bert Wagner: Yes, I hate X query. I don’t know why, I mean if you look at them side by side with the JSON path expressions and the X query, I mean, I shouldn’t say they’re the same because once you get into like attributes and stuff with XML it gets a little crazy. JSON path expressions are very easy, I did not use a lot of XML in SQL Server before. I rather – in those cases, I was not converting my web app data to XML to save to a server most of the time if I could avoid it. I would use some kind of ORM, like any framework or whatever and have that manage the data mapping. I stayed away from XML.
Brent Ozar: We have sp_Blitz cache and every time I open sp_Blitz cache I’m like oh god, the pain. Yes, Eugene asks, “Question for Bert, any opinions on using JSON for allowing dynamic columns on a table?” wow, and I’m going to ask the same thing to Aaron too when you finish your answer.
Bert Wagner: So for dynamic columns on a table, I guess I don’t even understand…
Brent Ozar: Like say that you want to let developers add columns any time they want, like say – yes, go ahead.
Bert Wagner: Yes, right, so depends on your application. As a developer, I want to be able to say, yes, let me add all the columns I want any time I want and whatever right, wild west, but I realize that’s not a great environment to be in long term. It’s going to create a lot of problems. I think if you’re working on small projects where you’re trying to iterate really quickly or like a proof of concept, JSON is awesome for that because if you don’t know what your database structure is going to look like on day one, using these JSON columns is really easy because the developer can just change their data structure instantly. There’s no DBA involvement required. It’s really quick. But long term, like I said, especially if you want great performance or if your application is going to be hitting – maybe you have some reporting on your data, just parse it out into columns, that’s what SQL is good at. Don’t be forcing it to read that JSON, and even if you are using the computed column indexes, it just feels dirty to me to keep it that way.
Brent Ozar: Aaron, how about you?
Aaron Bertrand: So I like EAB in a relational sense where you have – you just have rows basically instead of columns, so you have properties like let’s say for shirts you have shirt size and color and for shoes you have size and color and width maybe. So you have different sets of properties for different things and sometimes they’re strings, and sometimes they’re numbers. And I think it could be really flexible to build a table that actually supports that and you can create filtered indexes for certain things. With JSON I think it’s really hard to do because to pull an EAB value out of a relational table, you only have to know the name of the property, right? With JSON, the code around JSON to pull a specific value out of JSON string is a little more complex, and it’s not as easy to just plug in a bunch of different things into that and make that structure grow to support different sets of properties for different types of objects. I have a blog post that I posted earlier in the thread because Eugene was asking me about this earlier. And I’ve always been a fan on EAB, not as something you implement all the time, but when you have a need, when you have customers. We have customers that we’re every day saying well we need to add a new column, we need to add a new column, and different customers want different columns and how do you – that’s really hard to manage schema that way, so having it as fixed columns doesn’t really work, and switching to NoSQL or some other unstructured store is not always an easy thing to get SQL Server DBAs to do. So I just find that a proper EAB relational structure is a good middle ground.
Brent Ozar: The software vendor one is so good because inside one database you have to manage so many different schemas and that’s so hard. Yes, James, what about you, any opinions on that one?
James Anderson: Well, I’ve used a few in some of the projects I’m working on at the moment we’ve got some talk of using JSON, which can be quite scary, especially if you want to add any kind of constraints to that data. I think you really want to bring that out into columns, but like the guys say, it does give you that flexibility and logging is a great example of that. So we create a feature for the application that logs different types of data, we don’t have to change the schema of our log table, we can actually just dump it out into JSON. So yes, there are some really good use cases for it. I do wish there was a JSON type in SQL Server though, that would be quite nice, but I’m not confident on that arriving.
Brent Ozar: Adam Machanic says he really hopes that that design pattern catches on. Gene linked to – Microsoft did a demo of a product catalog using JSON, and the thing – this is why I always get in trouble with Microsoft. The thing you have to remember when you read Microsoft blogs is that they – someone is given a feature like here you go, here’s JSON, you need to store – you need to show us a demo of this, and they’ll come up with a use case and they’ll blog about it. That’s not necessarily the best use case. It’s just one that they came up with. After the feature goes live then you hear from people like Bert, Aaron, and James who actually went through and used this type of stuff in production, where it does make sense. Like logging is such a great example.
Product catalogues are just so much more sketchy because you are constantly going to query for where size equals large, and if people don’t know to create the index computed columns like Bert does then their performance is in the toilet, which is why Adam Machanic says in Slack, “Wow I hope Microsoft sign pattern there catches on, my consulting business could really use a fresh ten year infusion of new customers.” I would actually disagree with that, I don’t think Adam needs a fresh ten-year infusion of customers. I think Adam is probably set for life given his reputation, he’s probably okay.
Well, thanks so much Bert for doing the presentation, good feedback from the folks in Slack, people are wrapped in paying attention, you can go through and read those comments afterwards.
Latest posts by Bert Wagner (see all)
- SQL Injection Attacks: Is Your Data Secure? - November 27, 2017
- DBAs vs Developers: JSON in SQL Server 2016 - June 9, 2017
- High-Performance SQL Server Reporting Services: Lessons Learned - January 13, 2017