Working with Dynamic Data Masking & Row Level Security

Audio podcast:

Enjoy the Podcast?

Don’t miss an episode, subscribe via iTunes, Stitcher or RSS
Leave us a review in iTunes

Target Audience:

Data Platform Engineer (Dev/DBA) who are looking for additional options to secure data.


Microsoft has introduced native capability to the database engine around masking sensitive data and restricting access at the row level. Allowing us new options for building more secure Data Platform solutions. However, it is important to understand the differences from our previous options in order to be able to gain the most benefit from these new technologies.

Join me as we look in more detail at how we can engineer these features into our Data Platform solutions. Starting with identifying the problems they are trying to solve, through their core architecture and on to potential design patterns for their use. As with any security technology solution, there are a number of ways to use these features. However, as with all security features there are gaps in coverage, using them effectively in a layered approach is vital.

After this session, you will be in a position to start looking at whether your systems can benefit from these features. Along with how you can potentially start building functionality into your applications.

Why I Want to Present This Session:

Far too often security is seen as something that can be added in later. I’m a firm believer in secure by design, and I would like to show that this is something that is achievable with minimal overhead.

Additional Resources:

Session Transcript:

Brent Ozar: In this session at GroupBy, John Martin will be talking about using row level security and dynamic data masking in SQL Server. Take it away, John.

John Martin: Okay, thank you very much. So just find everything here. So what we’re going to be looking at today is row level security, dynamic data masking, two features that were added with SQL Server 2016 and are in Azure SQL database as well, but we’ll cover that more as we go on.

Okay, quick thing about me, so my name is John Martin, I am currently the product manager at SentryOne, managing the SQL Sentry monitoring toolset. So if anyone feedback, get in touch with me, but otherwise we’re done. I’m also a Microsoft data platform MVP. Overall, been working with SQL Server for a long time – Dev, DBA, BI. Proud of working for Sentry, I used to work for Microsoft as a field engineer based over here in the UK. Contact details are there on the slide. So please feel free to stalk on Twitter, send me emails about SQL Server stuff, read things I blog or get in touch on LinkedIn. Happy to do that.

So going to talk about some of the things that we’re going to see on the agenda for this session, so whilst it’s quite obvious the title is row level security and dynamic data masking, one of the key things that we want to have a look at and understand first and foremost is some of the key security principles. If we don’t get the key security principles and the fundamentals in place then no security feature is going to save your bacon. You can add as many as you want afterwards, but as – I’m sure you’ve known in the news today, I don’t know if it’s come on – how widely known it is, but obviously Equifax had a little breach through earlier this year which they announced today, which is kind of big considering the things they do.

Then we’ll talk about the features themselves. We’ll start talking about row-level security and we’ll start talking about dynamic data masking. Both in the engine, both very interesting features, is the way to put it, both V1. Make of that what you will, but we’ll discuss that as we go through as well.

So starting with security principles. I am a very, very firm believer in security by design. My background is – the industries I’ve worked in have ranged from payment processing through local and central government organizations, large corporates, all the way down to retailers and things like that as well, and one of the overriding themes there is how do we protect the data because the data is of value.

You can build a system nice and quick. Very, very easy these days just to knock something together with the various technologies that are out there, and then think, “We’ll stick it behind a web application firewall, that will protect us, or we’ll just put it over there on that server and we’ll restrict who can have access to it and that will solve all the problems.” No, it doesn’t.

You have to think about all of the security components. You have to think about okay, how are we going to be accessing this? What’s the medium? All that sort of stuff. If you don’t understand all of the components that go into building a solution, you can’t understand the different layers of security that you’re going to need to put in place in order to make sure it’s a secure system. If it’s not a secure system then you run at greater risk. Like I say, designing that core security in place is more than just a permission set, it’s more than just views and stored procedures. It’s basically all the way from where the data’s stored all the way through to where someone is consuming it. And if you don’t get that right, it’s going to hurt.

Touch wood, never been involved in a company that’s had a breach, so the big thing to understand there as well, and the mindset that you need to really adopt is when, not if. Something’s going to happen at some point in your career, and in order to understand that and why I say it’s a case of when, not if, it’s about understanding the different threats that exist and the different risks that exist for your data environments, and that can be the external threats, that can be state actors, it can be criminals looking to try and get that, it can be corporate espionage, depending on the type of industry you’re in and the type of work that you’re doing and the type of data you’re managing and the systems you’re managing.

It can be internal threats as well, so people trying to become employed in order to take information out, how you’d heck that sort of exfiltration of information. It can just simply be people – someone’s got wind they’re going to get fired and they start going through and deleting things, you know? Drop database, that type of stuff, just go into the application, delete all the data. It happens.

I was, say, next to a consultant who’s customer once had that problem and they rang him and they said, “Help, we need to recover.” Again, it’s something to think about, it’s not just people doing it for criminal gain or monetary gain. It can be people just – can be vindictive if they’re pushed beyond a certain point. But then also internal threats also take into account accidental disclosure. So it could be a DBA on another team that’s got access to a server he shouldn’t have access to, and then getting that information. It could be the BI developer that has access to an amalgamation of systems that allow you to piece that information together, so the sum of the parts is sort of broken down in such a way that okay, well we’ll store personal information here, we’ll store financial information here, we’ll store sensitive information over there. But one BI platform that pools it all together then has access to all that information, you can join it all together to give you that full picture. Again, that’s again, thinking about those risks that exist, what’s the options there?

The accidental disclosure thing is probably the nice end of the spectrum in all honesty, but understanding those threats, understanding the mindset to have will help you build more secure systems. Now, when it comes to security principles, security is basically like Swiss cheese. There are holes in every solution. There’s no one piece of software, one hardware appliance, one virtual appliance, one company out there that will protect you completely. Anyone who turns up in a shiny suit and says otherwise is lying. There are no two ways about it. It’s all about understanding where the gaps are in the different technology layers and then coming up with solutions that complement one another in order to try and resolve those issues, which is why it’s like making a sandwich essentially. You get your bread, you put your piece of cheese on it and it’s like, “Yay, cheese, but I can still see bread. I need more cheese.” So you take it, you get it another piece, you turn it 90 degrees, put it down. Still see bread, clearly, I need more cheese. Pick it up, another slice. In we go, so four, five slices later you’ve got a nice big cheese sandwich, whether or not you’re going to put anything else on there, all good. But you don’t want to be able to see that bread. That’s the way to do it really. You want to eliminate those gaps. And that’s why as I say, when you understand what the threats are, you understand the end-to-end solution, you understand the security options that exist for all of those technologies and how you can build them together to complement one another so that one security solution will fill the gaps that exist in another one potentially, or a configuration, things like that. That’s where you end up with secure systems.

So, those are the sort of fundamental security principles we need to understand. Once we’ve got that to hand, we understand that, that’s when we can start looking at technology features like row level security, dynamic data masking. Don’t believe the marketing hype, one of these will not fix all your problems and the other one again, will not fix all of these problems for you. Maybe using them together in combination with other technologies you’ll end up with a solution that works for you if it’s appropriate for the system you’re building.

So then we move on and we start thinking about row-level security itself. We’re going to start here with this one. Now, row level security at its most basic level is about restricting who can gain access to specific data rows. Nice and easy we say, that’s what where clauses are for and will hide things behind. Yes, you can get away with that. Row-level security here is built into the engine for a change. Historically, it’s been a case of needing to roll your own. There are various different mechanisms that I’ve seen systems using, so prior to SQL Server 2016 and Azure SQL database where it was introduced, it can be the use for example, which if you’ve ever used dynamic CRM, you’ll understand some of the pitfalls with that when it comes to really scaling for performance on that side of things. When you got a lot of data, nested views over nested views over nested views, which is one of the implementation mechanisms falls flat on its backside. Not great. You can use functions and things like that, we’ll discuss those in a bit more as we go through.

Like I said, it’s SQL Server 2016 and above, and Azure SQL database, so those are where they stand at the moment. And it’s a good implementation, simply because you have to do a lot of work, which means you’ve got a lot more control over it. It’s nearly there, there’s still some gaps in it, and we’ll think about some of those as we move through the slide deck in a moment. We’ll think about some of the options we have, but fundamentally, the key element here is that the engine is helping you to build a more secure solution. It’s not a case of it’s not a tick box you turn it on, you’ve actually got to do a lot of work for this one.

Now, row level security is essentially designed in. Now, in order to look at doing this, you’ve got to basically write the function that’s going to classify the data that you feed into it as to whether or not it’s going to be applicable. Now, what that means is obviously we’re going to be writing a function that’s going to be running quite often. If that function has any inefficiencies in it, then that’s going to materialize in your query performance, so for anyone who’s used resource governor, you’ll know the importance of making sure that you’ve got a very, very efficient classifier function because it hits at every connect that comes in.

Similar sort of fundamental principle when it comes to the function that you use when it comes to writing and securing using row-level security. It’s going to basically run for every query that hits the table, so if your function has three second query response time, and querying the table takes half a second, then it’s going to be three and a half seconds aggregated because that overhead’s going to come from your function. Very, very important to make that as efficient as possible.

Now, because it’s designed in in such a way and we’re feeding data into a function, it comes down to understanding your data model, and you’ve got to do data modeling correct in order to really leverage this function and this capability. What we need to look at and understand is okay, well, what data are we looking to protect? What are we actually going to use to sort of then basically say these people can see this data or they cannot? These users, whether or not they’re going to see it. And that comes down to we need to essentially tag every row with something that we can then do a true or false on to say yes, this passes a check, or no it doesn’t pass check, which again, means potentially you’re going to have to duplicate data in certain areas or what have you and then have that data management overhead built within the application itself, can be quite painful.

Now, it can be very simple. It can be very, very complex. Depends on what you’ve got to do. I got a work example that’s on the more simple end of the spectrum, and we’ll have a look at it. But fundamentally, the key thing to understand is when we’re doing this function, we’ve passed data in, whereas before if we’re building it ourselves, like I mentioned earlier, one of the options we have is we can do functions, we can do views, we can do stored procedures, we can do it all on the app tier. The merit – each sort of option has some merit, that’s for certain, but the one thing is where you’re writing all of the code that’s going to handle all of this and then you’re basically going to make the – when you’ve done the queries such like, you need to remember to call the appropriate functions or call the appropriate security functions be then the data tier or be then up in the application tier to make sure that you’re actually going to do that restriction.

I’ve worked with some systems where it was all hand-rolled if you will, and did some dev work and then oops, we actually forgot to put that function call in that’s going to do the row level security for this particular part of the application. So go back and do it all again.

Again, you end up going all the way to test and then coming back and doing it again. This is where row-level security can help you out. You write the function, you build the data model and then SQL Server’s going to help you sort of not have to remember to go and put that function in and do every stored procedure or every other function call when you want to do that data restriction.

And it’s all about having a big sort of tick list. You need to have a plan. If you haven’t got a plan, it’s going to go wrong. Now, from what I’ve said so far, it should be very, very apparent that this is not going to work for ISV apps. You’re not going to be able to get share point or anything like that and go, “Well, use row level security and that will solve my problems.” Nope. You need to have control. So if you’re doing development in-house, this is where you’re going to start looking at potentially being able to make use of this feature. If you’re just running third party off the shelf apps, forget about it I’m afraid, don’t even go there. It’s not going to help out.

If you’re rolling your own reports on top of things, potentially, but then you’ve got to think about third-party vendor support, contracts, things like that, whether or not you’ll invalidate any of that support after we’ve got all that, if you have a problem then go back to your support vendors. Just be aware of all of those issues.

Now, one of the important things to understand here is because we’re working with the database schema itself, those traditional security model for SQL Server of logins and users doesn’t really apply because it’s data-driven, it’s not built in to the security model there. So again, when you’re looking at developing the application to try and leverage this capability within the SQL Server engine, we need to think, “Okay, what’s my security protocol? Are my authentication – am I authenticating through the application using OAuth and things like that, I get a token that I can then pass in that’s then allows me to tag the data and say okay, well is this particular user allowed to do it from the application side of things?” If you’re using Windows authentication, it is possible, but it becomes a little bit more convoluted in the way that you need to start looking and saying “Okay, well who’s the user that’s coming in from Windows?” Doing the evaluation, and then again, it’s all going to fall down potentially if you’re coming in through a website that’s then say behind an application pool account that’s generic and every connection hits with the same Windows authentication, then that goes out the window. So again, these are all of the considerations you need to think about when it comes to whether or not this is actually going to be a viable option for you.

So now that I’ve finished sort of giving a bit of a beat down, we’ll think about a potential scenario. So anything where there’s a hierarchy involved, where you’ve got a salesperson or a manager, the salesperson should be able to see all their data rows essentially in the application, the manager should be able to see all theirs plus all of those for their minions. Common enough scenario – the hierarchy type thing here that we’re looking at, so again, that’s what we’re going to look at and we’re going to work through now.

So before we jump in and have a look at the demo, we’ll just have a look at the lovely schema that AdventureWorks gives us because AdventureWorks is a really nice schema for real-world scenarios because there’s a lot of problems with it. So we’ll start off with the employee table. What we’ve got here is basically a list of employees and managers. Now, one of the reasons that I like the AdventureWorks scheme for this one is because they’ve got the organization load as a column there and the table itself. Essentially, this is using the hierarchy ID date type as opposed to doing something like a self-join adjacency list mechanism for building the hierarchies where you may need to run things like recursive CTEs and such like to try and build them out. It’s painful, it’s not fun.

Hierarchy ID can work, maybe not great at very high sort of scales for number of rows you’re going to be passing in depending on the functions that you’re going to be using with it, but for the purposes of illustrating this demo, it’s going to work reasonably well for me.

We then have that links out to the salesperson ID. So these are the actual salespeople. So we’ve got our manager and employee hierarchy, then we actually know who’s going to be selling things. So this is our salesperson. This is where we’re essentially going to be doing the restriction. So we need to make sure, okay, we need to take this salesperson ID and we need to start tagging the data that we want.

Now we start looking at the data itself. So we’ve got our sales order header table, our sales, all the details table. Now, the relationship that exists here is between sales person and sales order header. What that means is in order to restrict things like the rows that we’re going to see for the actual granular level of the sales order, we’re going to rely on relational integrity here as the sort of primary key, foreign key joins that we’re going to be putting in place. If we’re going to permit people to see sales order rows, sales order details table without having to go through the sales order header table, this is the scenario where we would have to look at taking the information that’s currently held on say, salesperson ID, and actually have to tag all of the rows as well as the header table.

So that’s where we’re looking at the potential duplication of data within the system there, and obviously, you’re going to have the primary key, foreign key links in place, but again, all of those bits and bytes add up when you start building out systems that are quite large.

Now, the sales data is what I refer to as the scope. This is what’s going to drive it. When you understand the scope of what you’re looking to try and protect, that puts you in a much stronger position when we go back to what we were speaking about earlier when it comes to security by design. We design it securely from the outset when we understand what we’re looking to protect.

So, we’ve also got product information out there which is just ancillary information that’s going to add a bit of fluff to the queries that we’re going to be querying a bit. So that’s the database schema. The way we do this, we have our security classifier function. Now, build the function in isolation, you can have as many functions within the database as you want, as you need, that type of thing, but only really sort of comes into play when we create our security policy. The policy essentially then links our classifier function to our table object.

So what I can do is I put the policy in place, the policy then is active, which means every query that then hits that, that function is going to fire as well. That’s very, very important, and like I said earlier, if your function is inefficient, then kiss your query performance goodbye because it’s going to run for every single statement. Not nice.

With that, we’re going to start looking at a demo unless I’ll just check in with the guys; have we got any questions yet or…

Erik Darling: No questions yet.

James Anderson: The security function, because it’s a function, would that affects the parallelism of our queries?

John Martin: I haven’t tested it with big scale reporting queries that should go parallel if I’m honest with you. The viewpoint I’ve taken with this is it’s more in and around the transactional side of things that I’ve seen. I’ll show you – it might become more apparent why I haven’t bothered testing it with large-scale reporting queries when we get into the demo because of one of the things that we have to do with that one.

Erik Darling: So just quickly, is the function that you use to set up row-level security, is that scalar, multi-statement table value, what is it?

John Martin: It’s an inline table value function.

Erik Darling: Okay. So it shouldn’t prevent parallelism out of hand?

John Martin: Yes.

Erik Darling: I know it adds a bunch of junk to your query plans but it actually threw a warning for that in Blitz cache.

John Martin: Yes, exactly. It causes all those problems, but it’s also because developers are inherently lazy creatures. Having been one at one point in time, I will use all the features I can to avoid writing code. That’s one of the potential pitfalls of this.

So let’s fire up Management Studio, and here we go. So here we go. So I’m going to use AdventureWorks 2014 on this one. I’m running on my local SQL Server 2016 machine here, mainly because given the little blip we had earlier, maybe playing with Azure is not going to be the best thing for me.

So first and foremost, because we’re building it into our data model, I’m going to create a schema. I’m going to create a schema called security, you don’t have to call it security. I’m a simple creature, and for the purpose of the demo and preventing me from being confused, it’s called security. And that’s going to keep all of my objects, which are going to be functions and policies and things like that in place.

Okay, now, first things first, the type of application I’m going to simulate for this demo is that we’re doing some form of application-level authentication, the application is going to know who you are, and we could be coming through an application pool account, anything like that, but essentially, when I ask the application to do something, what it’s going to do is open the connection, and the first thing it’s going to have to do is tell the database engine, okay, well this is the data essentially I want to use for the restriction in one form or another.

So in order to facilitate that, I’m going to use the session context. So there’s a procedure here called sp_set_session_context, and essentially, it allows you to build something that’s akin to a key value pair in the SQL Server session. Now, the key here is going to be employee ID, the value is going to be the app user ID, I’m going to be passing that into a stored procedure that essentially when I open the connection the first thing the application does is calls this procedure to set this so it says I am this person, so that the rest of the database session that we’re going to be running the query for knows who I am.

The important thing that we want to have a look here at as well is this little value here, number one. That essentially there means okay, this is going to be read-only, so if I run this again, I will get an error basically saying nope, we’ve set this particular value for this particular key, go away. Just a little bit belt and braces approach to if some bizarre reason that someone managed to do some form of injection or what have you, runs this again, and says actually – they figured out you’re using this and then say actually, we’ll switch the value and we’ll be all good. No. Just helps you protect it for the terms of that session.

So I’m going to create that stored procedure to start with. Now I’m going to create my function. The first thing to watch for here is it’s schema bound. So a few people seem to be quite happy with it at the moment, the whole sort of dev ops CI thing; one thing to be aware of here is if I want to make a change to anything that I’m obviously referencing this function as part of an automated bill process or anything like that, it’s going to mean that my process becomes essentially offline because I’m going to need to shut my security system down. This function is – basically, I want to make those changes, I need to make this function go away. Prevent the schema so I can make the changes, and then I’ll bring it all back. Not a fun scenario really. Again, this is another one of those little gaps may exist there in the potential adoption of this one. Anyway, moving on from there.

I’m essentially just returning a value to say, yes we’re going to go for this one or not, when it comes to the data we’re passing in. Now, the query itself, just for the purposes of this one, because it’s a hierarchy ID, essentially what I’m doing is saying okay, well find me the employee record that’s equal to the person that my application says I am, and then am I a descendant of essentially the same query again? So let’s see. It’s essentially saying – this is a fun one to try and explain.

Do I have any minions or not is basically what this query is looking for, so if I don’t have any minions, I’m just going to get my row back. If I have got any particular minions in place or employees or whatever, then I’m going to get my record plus all of theirs essentially to allow me to start filtering the information that’s coming in from the source table. So I pass in employee ID from the table, because that’s what the function is going to be passing, and then I’m doing the lookup based around the content – session context, where the employee ID – so this is my value, everything that matches me or my subordinates, that’s the values that I want to say approved for being returned. I hate trying to explain that query. Here we go.

Erik Darling: [crosstalk]

John Martin: Yes, then you’ve got the parent of – that’s one of the reasons I like employee – I like the hierarchy ID when it comes to hierarchies that are not huge. Well, when I say huge, I haven’t got a huge number of rows that you’re evaluating. It just makes it easier rather than doing self-join and then that’s recursive, oops, I’ve run out of stuff or I’m not going to set max MAXRECURSION to zero and then watch my server fall over. It just helps abstract away that particular part of the pain. Whether or not there’s more efficient ways to do things depends on the type of environment that you’re working with.

So I’ve created my function. Then I now need to stick it all together with the glue and that glue is the security policy. So create security policy, again, it’s an object in the database. Going to say, okay, here’s the filter predicate, which is my function, pass in the salesperson ID column from this particular table and we’re going to turn it on. So nice and quick and easy, on it goes. Execute, and done.

Right now we’ve got it set up and running. So every query that I hit the sales order header table with is going to then run this function and then do the evaluation and then say are we going to return any rows or not. And that’s going to go back to the client. So the reason this is in my view a better implementation than writing all of the code my hand is that it’s nice and easy to find all of these little objects in your database and then try to tie them all together. Relatively simple, it’s all in source control, that makes everybody happy, security is in the code, it’s in source control so if there’s any changes to it again, you can go back and look at the history. So source control is good.

So now that we’ve got all that set up, let’s have a look and see what’s going on. Now, before I go in and start querying things, I’m going to do it with some stored procedures, so I’m going to create two. So I’m going to create a stored procedure which is going to get me some sales order data from the row level security side of things, and I’m a young, naïve developer and I don’t really care about where clauses. I’ve got plenty of processing power, I’ve got plenty of CPU, storage is fast, why do I need to filter my data? Then I’ve also got a procedure that I’m going to create where I’m a little bit of a more mature developer and I understand the value of a where clause. So create those two stored procedures for the rest of this demo.

Now, what we’re going to do now is my Visual Studio side of things – my Visual Studio “Fu” is not strong, so rather than going there and show you some terrible, terrible c#, we’ll simulate the type of activity here and it gives me a bit more versatility to play with this. We’re going to do it through Management Studio instead. So we’re going to simulate it. The application connection opens, the first thing that does when we – essentially when we’re going to write the application itself, we basically write a wrapper around this so that when we call that connection we know we’re always going to fire this stored procedure, and I’m going to say I’m employee 274, which means when I start looking at them with content session context, I’m employee 274.

So I’m employee 274. Now, if I try and do this again, or if I want to change that particular value, I want to say, actually no, I’d try and get in – pretend to be my manager. Like I said earlier, you get an error because it’s set as read-only for that particular session. When you change it, it’s dump the session and then come back again. So now I’m interested in getting some sales data, so my application then goes and says, go and fetch me that sales data. So off it goes and there we are. Nice and happy. I’ve got database, salesperson ID is 274, there we go. I can see that information, if we scroll down we’ve got 1987 rows, which is nice. There we go. And all of them are 274 it says. Scrolling all the way down, nice and easy.

Now, that was with the procedure that has nowhere clause. So it’s done all the filtering for us, we’re all happy. However, if I come to here, let’s include the actual execution plan, shall we? Just a little bit of this, what we’re going to do is run the one with the where clause and one without the where clause. Off we go. So that one’s run, there we go, got two execution plans back and 4% the one with the where clause is potentially considerable cheaper than the one without the where clause. Because of the way we need to understand how this is going to work, what it’s doing is even though – essentially what we’re doing is saying, okay, well do the restriction, then pass that restricted data set into the function. So that’s great. In it goes, done. Nice and happy. Whereas for this one, we’re saying go and grab everything and then feed the whole damn table into the function. That’s where it’s going to hurt. That’s why I’m saying – that’s why I’ve not really sort of done much in the way of tests with reporting functions because it depends on the type of thing you’re running. I mean, you’re going to run a sum for an entire year for whatever, you’re potentially going to be feeding a lot of data into this function and then obviously it’s going to – it’s something on table value functions are in theory – it’s going to be the more efficient end of the spectrum, but not great.

Now, if I just grab this and bring it over, give us something that we can actually read the damn plans in a bit better. Here we go, so estimated costs, so the one with the where clause, 4.3 versus 95.7. And then we come across here and we say okay, well we can see the row estimates now. Where I’m doing the where clause I’ve got a stats issue potentially that means I’m shooting low on my estimates, not great. The actual row is there versus estimated rows of 134,000 and we’re coming in low there. Again, because we’re going to go and read all of that data, pass it through into the function itself, again, that’s going to drive a bit more I/O, it’s going to drive a bit more – it’s essentially going to drive more I/O, more compute, generally makes things that little bit more unhappy. And if we look at here as well, you can see here we’ve got the index scans going on.

One of the important things and as Erik eluded to sort of few minutes ago is even though my stored procedure ran and I have no information in there to say go and talk to the employee table, you can see we’ve got this branch here in the execution plan going off and talking to employee. That’s the function taking place. So even though it’s a function, we’re still going to see it here actually in the query execution, which means that we can look for those inefficiencies and we can start to look at maybe doing the optimizations for them. So yes, we’ve gone in and we’ve scanned the entire employee table, all nearly 5000 rows of it. Well, that’s on my laptop and my laptop is laughing at it. All it’s not laughing at is coming down and having to sort of go through and say, well, we’ll pull in 120,000 rows instead. That’s why making it more efficient can make your life a lot easier. You make the query as efficient as you can and the function, but then also good development practice when it comes to writing your queries anyway that you’re going to be passing into row level security again, can look to improve the overall performance. Don’t rely on it to give you that filtering without any pain.

So we can build some index, I’m going to skip past that bit, it’s just basically getting rid of the actual potential key lookup that’s in there. Now, one of the things we want to have a look at, okay, we’ve got row level security. Let’s go and talk about some statistics. So this is where we can come in and say, actually, I want to create a stats object on my totals due with full scan. Execute that, that’s done that quite happily.

Now, SQL Server 2012 – now, prior to SQL Server 2012 SP1, we required sys.admin.dbo and a ddl.admin role permissions to actually do show statistics. 2012 SP1 and above, we only need select permission and access to all of the columns and things like that to be able to view the actual stats objects, which when we think about row-level security and the way the histogram gets built, how do we make sure that we’re actually not then sort of able to potentially go and see some rows on values of data that are associated with rows that we’re not permitted to see?

So I’m going to create a select user here without login, just in the database, I’m going to switch my execution context, just makes life a little bit easier for me rather than having to open multiple query windows, and I’m going to grant select on sales order header table and then the customer table just so I can illustrate this. So I’m going to execute a select user, I’m going to show stats for the customer table which I’ve got select statement on, but no row-level security configured. And then I’m going to run this one here, show stats as sales order header table where I have got row level security configured on the total due stats object that I just created.

So one of them works, one of them fails. The reason being is that essentially, the select user should not have permission on select stats to see potentially values that are associated with rows that they may or may not be allowed to see. So there are some protections in there about some of the inference attacks, and that’s an interesting one to look at.

So that’s been a quick run through on row level security. Have we got any questions in the chat window guys before…

Erik Darling: None so far.

John Martin: Awesome.

Erik Darling: Let me check WebX chat really quick, I’m just been paying attention to…

John Martin: That’s alright. Make sure I haven’t put everybody to sleep.

Erik Darling: [crosstalk] … Slack. No, nothing in WebX either.

John Martin: Not a problem. In which case then, we’re going to go off and…

Erik Darling: We do have a question, I’m sorry. One just came in. Do we have any special considerations if we’re also using partitioning with this? I guess the gist of the question might be one of us could mess with partition elimination of some sort?

John Martin: I don’t see how it can because essentially, you’re running the base query and then feeding that into the – so the restriction on the where clause is the one that’s going to do the partitioning elimination and then that will feed that data into the function in which case the restriction beyond there, so that shouldn’t really have any negative impacts on using in conjunction with partitioning tables.

Erik Darling: Cool.

John Martin: Not from what I can think and the way the execution plans are showing me the data.

Erik Darling: Carry on then, sir.

John Martin: Cool. So, moving on to dynamic data masking. So dynamic data masking is again, something that we’ve been trying to do in one form or another for many, many years, and essentially what we want is to be able to prevent certain people from seeing data values that are correct. So we want to basically say actually no, you’re not allowed to see the salary values the national sort of security IDs or social security identifiers, things like that for the permissions you have within our application, and you are, so that’s the sort of thing we try and do. And again, we can use views, we can roll our own sort of scalar value functions or table value functions, that type of thing, we can do it up in the application layer. Many, many options. There’s always many, many ways to do things. But essentially, it’s all about protecting certain specific sensitive values that we have contained within our databases.

Now, again, Microsoft has now taken that and they said okay, well let’s bake this capability into the storage engine, let’s bake it into the actual SQL Server engine itself. So let’s see what we can do to help out. It’s going to leverage functions. Now, I say functions, these are not your traditional scalar table value functions, anything like that. They’re built in functions to SQL Server itself. There’s four of them, and we have varying levels of control over what we can do with them in order to prevent people from seeing the original data values and the values that should or should not be masked.

One important difference between this and the actual row level security implementation is that this does leverage the server security authentication model. So think logins and users. That’s where it’s going to have the permissions and essentially they grant permissions to accounts to say masked or unmasked data, which one they’re going to see. So again, we’ve got two technologies that are fundamentally trying to protect the data, but have been implemented in two different ways. I’m guessing two different teams kind of worked on it.

So we think about some of the data masking itself, and what we’re looking for here, and this one’s courtesy of XKCD, but essentially, what we’re looking to try and do is mask that data. It’s the data at rest however, is going to be plain text. Once it’s sat there in database, the DBA is going to be able to see that, which means if you’re looking for a solution where you want the DBA to manage the database but not see the data, this is not the solution for you. Go look at always encrypted. Go and look at something you rolled your own. This is not going to help you in that scenario.

So again, when it comes to speaking with the security guys, you’re going to turn around and say, “Well, yes, we can protect it from the application users and that type of thing”, but the DBA can just go in and go, “What’s the data?” Get the data and off they go. DBA is all powerful. Joys of sys.admin.

So fundamentally, the way it’s going to work is we have a database, we have a user. In that database, who’s going to then ask that database a question? It’s going to say, give me the name and the national ID numbers, or social security numbers if you’re in the US, of all of the people. And then it’s going to get something back. Now, if that user had been configured so they can only see masked data, then what they’re going to get back is a table with names are not blanked and then the national ID number where it has been blanked, and that’s what we see here. We see masked data for user one. I have the same database. I have a different user this time, user number two. They’re going to ask the same question to the database. But because they’ve been set up in such a way that they’ve got permissions to see the unmasked data, then they’re going to see everything that’s in there. Names and the raw data around the national ID number.

So again, thinking about the way we do the authentication with Windows and things like that, that’s going to be wonderful. So when it comes to thinking about the potential use cases for this scenario, this is when we think about, okay, well if I’m letting people do direct queries and I’ve not got – they’ve not got the level of permissions that allows them to either grant – basically grant themselves the unmasked permission or they’re not sys.admin.dbo, things like that, then okay, we can give people Management Studio access, direct query access, that type of thing, and off we go. Works for things like certain BI reporting solutions where you’ve got that pass through authentication of actually – of the user who’s running the report then passes through to the underlying data source, that’s when we get it.

Again, however, this is likely to fall down if you’ve got some intermediary in the data source connection. So for example, if you’re running reporting services and you’ve got a data source configured and that data source configured with an identity that it will then connect to the actual database with, then that’s going to fall down. That’s when you need to start building in extra logic potentially, which can get very complex in the reporting services, where I’ve seen this before where they actually detect who the user is, pass that through as a parameter to the underlying query and then actually switch the execution context to be able to go and get the data or not.

So again, you end up still having to build in a lot, lot more logic into the application itself from end to end to try and actually sort of make it work the way you want it to in order to handle this single account that’s going to be doing the authentication to the database engine. Not fun. Again, great potential feature, but again, we’re starting to hit a few sort of stumbling blocks here.

Now, when we think about those functions that I mentioned when it comes to the actual masks themselves, we’ve got four. First and foremost is the random function. A random function is really, really useful for numeric data types, so you’re thinking your integers, your decimals, your money, whatever you’re going to be using. Essentially, every time you execute it you’re going to get a different numeric value back at any given point in time. It’s not a case of it’s got a value of 10 that it’s always going to come back with a random number of whatever because it’s set seed value or anything. You ask it, it’s going to go today you’re getting number 22, now you’re getting number 23.

One thing to bear in mind is that you have got the ability to set the lower and upper bounds of that range of random numbers that’s going to be generated. So there is the potential to really sort of mess it up, so if you say I’ve got values one to ten in here so I’ll start at a range of random numbers from 11 to 100 because no one’s ever going to notice that values one to ten are missing if they execute it over and over and over again and the human is looking at it is very, very good at looking for patterns and saying, “I’ve not seen number nine.” Okay, then they go and be inquisitive as people are and start saying, “Well, one to ten never shows up. Wonder why that is. Wonder what’s there.” That’s when we start looking for those sorts of things.

So again, also don’t restrict your range to one to ten and well, I’ll give them a random one back from this pick. That’s not going to work necessarily. Just make sure you’ve got a good size random range that you’re going to be giving people so that it’s very, very difficult to start looking for inference around what that data may or may not be when it comes back to the client itself and is displayed to the end user.

The next function, default. It’s a nice easy one here, it works for pretty much every single data point that you’ve got and it will provide you with a default value. That default value for string data is going to be four x’s. Funny story, back during the early sort of CTPs, if your string was less than four x’s long you would get a corresponding x for each of those one, two, or three characters up to four, which was pretty rapidly fixed, thank goodness. So yes, string data, you’re going to get four x’s back. For a date, you’re going to get first of January 1900. For numbers you’re going to get zeros, so on so forth. Basically, it’s just wiping that data out. You’ve got no control about what comes back, it just goes, nope, you’re not having anything.

Then there’s the email function. The email function is great fun, you don’t have to apply it to an email string. You can put it on any string, so you can turn anybody’s name into an email address blank, which is great, not. Just think about how you’re going to use them. But essentially what it does is it goes first character x’s @ x’ So if you’ve got, .org, anything like that where you might want your end user to actually have some form of inferred context about the data that they’re seeing. When it comes to email addresses it’s not going to give you that I’m afraid. It’s basically just one size fits all, there’s your data gone, that’s what you’re getting back into your client when you do the call.

And then the last one is partial. Partial is technically the one we have the most control over whereby we can pick the leading number of characters, we can specify the pad string pattern and then we can say okay, on the end I want the last three characters for example, so that’s where we’ve got some control. Again, coming back to the previous thing I was talking about there when it comes to say, okay, well what about email addresses? Can we use some form of funky sort of string work to say okay, well let’s find the – is it a .com, is it a, or is it a .org or anything like that. Nope. It’s straight value for the start, here’s some padding, give me that number of characters on the end, done.

There is no intelligence there, it’s pretty dumb. So whether or not it’s got a huge amount of value to you will depend again on what you’re looking at. A good scenario that it may be for would be something like that national ID where you want to bring the first number and the last two and then wipe out everything in the middle. I know that’s something you may see on websites for credit card information where you’ve got the sort of six last four, don’t plan on using dynamic data masking for credit card numbers in your databases because as I said earlier, the data’s in clear text in the database. Storing pan data in the database in clear text is a big no-no for PCIDSS so make that one go out of your mind. Go and think about kittens playing with balls of yarn, not storing credit card data and masking it with dynamic data masking because that is bad. Cool.

So now we have another demo. Let’s go and grab Management Studio again.

Erik Darling: Before you get started on that demo, there was one question. It was a little unclear if it was about what RLS or dynamic data masking based on when it came in, but can we restrict data on a column or cell level? I’m still waiting for clarification, I asked for it.

John Martin: So for row level security it’s at the row level, not the cell level. So it’s not like cell level – it’s not like the cell level security you have in something like analysis services, for example. In like that. As for column, that’s where your dynamic data masking comes in, and that’s what we think about, which is what we’re going to look at now.

So dynamic data masking, demo, here we go. So again, I’m going to use AdventureWorks 2014 for this one and what I’ve done is I’ve created a little function just to help illustrate a point on this one, which is a terrible function. Essentially what it’s doing is stringing concatenation for an address because we should always not do this in the database but it makes a point. If you need to do this do this in the application, please. That’s where you should be formatting string and things.

So again, I’m going to create a user, this time without a login. Again, because I’m just going to switch the execution context because it’s easier than breaking this down into three or four different windows. So I have my database user, I’m going to grant some permissions to them. I’m going to grant them permissions to see the employee table, the personal table, email address, business entity address table, and to select the function. There we go. So what I do now is if I come down to here, I’m going to switch my execution text over to masked user and I’m going to run this query, which is going to show me the first name, last name, personal details, address details. So the formatted address using the function, nice and happy. So let’s run that query.

And as you can see here, I have a lovely pile of clear text data from AdventureWorks. First name, last name, hire date, sick leave hours, national ID number, email addresses, and a nice clear text address there, so all the lovely sort of sensitive data we may or may not want to share with our employees, depending on the type of role they’re doing within the organization.

So coming back from there, one thing I will quickly do actually is just copy all of that out and stick it in here. Just so that I’ve got something to go back to just to prove that certain values are as they should be later if I need to. So I want to add these masks to my data. This is where it comes in and we can actually specify them as part of the code, so essentially they’re part of the database schema, which is great. It’s basically a column property saying add masked and then with whichever of those lovely functions we spoke about earlier. So here I’m going to put a mask on the national ID number on the employee table and I’m going to use default. I’m going to put another one on the sick leave hours and I’m going to say random, between one and 100. Then on the email address, I’m going to use the email function because I can.

So those go off, and again, because that’s part of the schema, it’s going to be in source control, which means again, getting changes, we can go back and we can track those nice and easy. So what I’m going to do now is run exactly the same query as my masked user again and this time as you can see, I’ve got four x’s for national ID number, I’ve got a blanked out email address here with the first letter and then some x’s and some more x’, and then I’ve got the sick leave hours, which is altered. Now, I know that the first one there was 47 earlier, now it’s 21. Just to prove that if I execute that bit of T-SQL again, there we go, sick leave hours, just to prove that it properly random, again, it’s now 18 instead of 20-something that it was a moment ago. So nice and easy.

Now, again, okay, what I want to – basically I want to think about, okay, maybe I want to see the first one and the last two of the national ID number so that if someone rings in that we can ask that information. So, I can just…

Erik Darling: [crosstalk] number for that?

John Martin: Yes, if you could, that would be great. I’ll just go and – that’s alright, I’ll just check up on Equifax for you. So there we go, we’re in, we’re going to switch it over to partial. So again, it’s just an alter column statement, which is nice and easy, so if I come and run this again, now you can see – there we go. We’re using partial national ID number, first one, some padded with x’s and again, last two numbers that we got coming out of there.

Erik Darling: That is my national ID number so…

John Martin: So, now we want to think about addresses. Addresses, again, sensitive information. Anyone in Europe who is dealing with Data Protection Act or GDPR as it’s going to be coming in will be very aware of what is not and what is considered sensitive personal data these days. Anything that allows people to be personally identified is, so let’s get rid of the first line of the address and the postcodes that people know. Rough geography where someone is rather than being able to go and knock on the door. So I’m going to use default again here, and then what I’m going to do is I’m going to come down here and I’m going to execute statement by executing again a moment ago.

Now, let’s have a quick look and say, okay, well the formatted address now coming back as just four Xs. What this means is that when we’re calling the function and the function is doing the concatenation, what it’s doing is getting basically the most restricted type come back, the most restrictive format. So we see nothing. Not fun. So that throws a spanner in that plan of using a function. What a pity. I know, I’ll use a CTE instead because a CTE always works.

Okay, country region, person, because all of that was masked by the function I now need to give more expansive permissions. Not fun. Not a huge fan of expansive permissions for people being able to see everything they should. So we come back here and then, nope, we’re still seeing our little four Xs. Essentially, the long and short of it is as soon as you start doing any concatenation against masked columns, you’re just going to get a masked value back for the default in this case. So it’s going to be painful. So we want to go away, just rerun it again, essentially, what we have to do here is then come and do a query against it where we’re actually pulling back the individual rows. And then as I said earlier, take it up to the application and do it properly, in which case, you may want to mask it in the application. So if you’re going to do it in reporting services, mask it here, take it up, concatenate it, using the reporting services you’re doing it in a .net application, do the query, take it up, concatenate all the stuff together, format it nicely, spit that out to the user. Not going to work here.

So that is – dynamic data masking is quite easy to put in place. As you can see, you just need to alter the database schema, which again, we’re coming back to it, do we have control over the database schema? Is it ours? Do we own that particular code base? Yes. Then it’s potentially a candidate for you to use. Is it a third party ISV application? That’s where it gets a bit dicey. Talk nicely to them. Maybe it will work, maybe it won’t, depending on what you need to do. Because it’s in with the server and database authentication mechanism, in theory, yes, you’ll be able to create a server level login, map it to a user that has read access to the table and then you can mask the columns, and then if you’ve written your own reports or whatever, anything like that, then potentially it could work. But anything that’s slightly going to interfere the application, your ISV app, the vendor is going to go, not support it. So just bear that in mind.

Erik Darling: With something like dynamic data masking, if it’s like a known set of values, how easy is it to unmask data?

John Martin: We’ll get there in a minute. I’m not going to go into too much depth because someone has – there’s a guy there who’s written a very good and comprehensive blog post that goes into how to crack through it. I will reference that and have a quick look at that a little bit later rather than me try to sort of parrot that information. He tells it better than I can. So I’d say go there, but a couple of things to cover off.

You remember we talked about show stats earlier. So if I wanted to execute this as the – so we’ve got our dynamic data masking in place on the employee table on the national ID number. So because I’ve got select permissions and I’ve got access to the columns, I can then come in here and then execute this and say show stats and then there we go. There’s unmasked values in the histogram. Not fun. However, there is good news. I actually logged connect item for this, and Microsoft said, thanks, we’ll fix it. And they fixed it in Microsoft – they fixed it in Azure SQL database, and they said it will be fixed in the next version of SQL. Whether or not it will be backported I don’t know. Since the connect item was marked as closed as fixed, there have been two cumulative updates for 16 in which that fix has not appeared. So who knows? If we make enough noise it may get backported. I don’t know.

Erik Darling: 2016 is just so irrelevant at this point.

John Martin: It’s so passé.

Erik Darling: What year is this?

John Martin: This is last year, so 2016.

Erik Darling: It’s like short shorts.

With SQL Server 2016 SP1 CU2, quoting builds that you hear, we’ve now got the dm.db stats histogram DMV, similar functionality in that it will show you stuff unless you’ve got Azure SQL database 2017, in which case it shows you nothing, which is nice. It will execute and just give you a blank table back for the histogram, which is nice. It doesn’t tell you that you don’t have permissions, it just says nobody here but us chickens.

Erik Darling: That was my connect item.

John Martin: Nice. Makes life a little bit easier, that’s for certain, rather than this. One thing as well, if you do want to revert to the earlier permissions set where someone has to be a member of sys.admin or the dbo or ddl admin database roles, trace flag 9485 will revert that behavior for you. So if that makes you a little bit twitchy, there we go.

So done all of that. Data is masked, that’s all good. So potentially how do we work around some of these things? This is where I said, and we go back to the start of the session where I was saying it’s important to think about building things up in layers. It’s all about understanding complimentary technologies and complementary ways of doing things. And this is where I’m coming back to say who remembers views and who remembers stored procedures? They’re wonderful, wonderful things, and they can help. So if I want to prevent the stats thing or what have you and people seeing extra information, I’d come in, I’ll just build some views, create some stored procedures, using though, basically the traditional way that we would do a lot of this with regards to securing the information behind essentially an abstraction layer. I revoke all of those direct permissions on the underlying objects, I can grant execute on the stored procedure, select on the views, and then I can come in and I can execute – this is masked user and I can have a look and see what is going on. Here we go, and you can see here we’ve got the masked data, we’ve got all of the information, it’s all set up, it’s nice and relatively secure because people can’t get to those underlying tables. So if you are going to use 2016 and you want to play with this and you want to take this technology into production, then again, you’re able to circumvent it because if I come back to now try and do show stats, this one here, I don’t have the permissions on the underlying table to be able to go and run that stats object.

So that worked for 2016, 2017 it becomes a non-issue for us because essentially it’s been fixed, and then we go from there. Now…

Erik Darling: So, just to be clear real quick, you can’t just pass the masked portion as a string and have that show up, it’ll still be masked. Like if you did substring whatever in the middle it wouldn’t…

John Martin: It shouldn’t do anymore. There was a point at which it did. I believe that got fixed during the very late stage CTPs. So that’s alright. Let’s go a little bit off piece, we’ve got a bit of time. I’m going to write something in the middle of a demo because that’s always fun.

Erik Darling: Don’t do it.

John Martin: Doing it, he says.

Erik Darling: It’s going to crash.

John Martin: Right, so let’s take that one. So national ID number – doesn’t want to know. Come on. So we get something that’s even more restrictive than the type that we were looking for, which is nice. Now, one thing as well that I don’t know if anyone’s noticed is that I’ve got a where clause here now. Where clause on postcode. Where clause on that particular zip code.

Erik Darling: Dangerous.

John Martin: Yes, very dangerous. And this is something that Ronan goes into in his blog post, but going to run this – postcode. Wonder what’s behind that window. Behind wooden door number one is that particular value of 98052 because it’s an equality predicate, which means I know that all 21 rows have that particular value behind that mask.

That sucks. So yes, one of the things you can do is you’re quite happy to write a where clause and off you go. There’s no restriction around that. I did raise a connect item saying, can we look at this, and got closed by design on that connect, along with the comment of, set up SQL order and you’ll find out when people are doing this, and I’ll also discover after the case that my data is halfway round the world on somebody else’s system. So I’m sorry Microsoft, but you suck with that response.

It will quite a while to compose the right email response there. There were three or four emails that got – maybe not, let’s delete that one, and…

Erik Darling: Smash keyboards, mouse through a monitor…

John Martin: Big facepalm. Yes, but SQL order is a great little feature but it’s not a security feature. It’s a what’s happened feature after the case of let’s go back and find out where it all went wrong sort of thing, which I’m kind of a bit twitchy about when it comes to data, especially if it’s personal sensitive data that we want to really be holding on to and we want to be securing.

So back to the slide deck. So I know I had about an hour and a half-ish, we’re going to roll in a little bit early on this one unless people have questions, but essentially, high-level summary. Row-level security, you design it into the schema. You have to build it into your application, it’s no good for third-party ISVs, it is very customizable, however, which makes life a lot easier, and the engine is actually going to take that function and apply it to everything. So even if you’re – one of the most common ways of doing this at the moment I have seen is using functions, and people will build up their restrictive dataset and then they’ll do a cross apply or what have you, or a join under the function and say actually, let’s restrict it based around what I’m building here in my tables or whatever. But you’ve got to remember to write that code yourself.

The advantage here is you’ve still got to do all of that legwork building that logic into the procedure, you’ve got to design your data model accordingly, make sure all the data’s in place, build it into your application, but Microsoft takes away the fact that you then need to remember to write the damn code to make it happen. So that’s good because if you’re onboarding new developers and they may not necessarily be completely familiar with your code base, it means you don’t get all the way to QA and a bit of UAT testing and then having to go all the way back and saying, you go and make sure you rewrite your queries so that it includes this restriction that we built into the system. You’ve got oops, sorry, there we go. It’s just part and parcel of people becoming more familiar with the code base, and if you’ve got a code base of several million lines of code, it becomes difficult to manage In-line table value functions, so in theory, it’s not truly terribly like scalar function would be.

Dynamic data masking, use the server and database authentication mechanism. So again, just think about how it’s going to be applied. If you’re sat behind a single account that’s doing the access to the data tier, that’s going to really throw a spanner in the works and you start having to do a lot more work in there to come up with saying, okay, well I am this person, which means when I connect as that, I actually want to masquerade as this person in order to see the data that should be masked, and that’s what my permission set is. You end up having to build a lot of metadata in and around it in order to handle that.

It’s all masked at individual column level. I mean, it’s all part of the code of the database itself so you’ve all got it drilled in that schema, which means it’s in source control, happy days. It can be circumvented easily by the DBA because I’ve got sys.admin permissions, which means then what I can do is I can come back in here and say, okay, well, actually if I want to say – well actually, I want to see that data, so yes, I’m going to grant myself unmasked permissions, and there we go. The problem is that there’s no granularity to the unmasked functionality either, so it’s not like you can say for this user I want to grant them the ability to see unmasked data for this object and these columns, or anything like that. It’s a case of they see nothing or they see everything. So again, that’s just another potential gap in that potential framework.

So just bear that one in mind. Again, it’s yes, no, but the DBA can see everything anyway. But also certain query access patterns will allow you to circumvent that. Now, that information is Ronan Ariely, he’s got a great blog post that goes into a lot more depth. I’ve the link to his blog post so that it’s easy for people to see right now rather than seeing the big long URL. Please take that one away, but if we quickly – let’s see if I can follow that one. Is my web browser going to wake up?

Erik Darling: One of these days.

John Martin: Okay, don’t.

Erik Darling: SQL Server stole all its memory so…

John Martin: It’s laughing at me. It is actually. My browser is not happy.

James Anderson: What’s going on, John? It’s like your laptop on Wednesday at the user group.

John Martin: Yes, the laptop is – it could do with a little bit of love I think, and a rebuild. So here we go. Either that or I’m using carrier pigeons to get my web traffic. Anyway, I’ll talk about the other ones.

Dynamic data masking, Microsoft there has got some interesting documentation that you can use to understand the syntaxes, how to apply it, how to turn it on, how to turn it off, all that sort of stuff. Don’t really provide any real context in how to apply it and how to apply it well.

The Robert Sheldon’s done a blog post as well, at a simple talk, that goes in and sort of gives you an illustration, kind of like the demo I’ve done today and the session I’ve done, but in a more sort of concise blog post with instructions. You can go follow this, this, this and this, and go from there.

Here’s Ronan’s blog post, definitely recommend reading this. He’s an MVP as we can see, but he goes through a whole sort of mechanisms for how to potentially circumvent it, so sets up all the masks and then exposing the masked data step by step, he’s got a guide there. Definitely go and check that out. If you are thinking of using this particular technology option, it’s well worth reading that mainly – not so that you can understand how to circumvent it in order to do it, but more to how – what are the patterns that will allow circumvention and can I put anything in place that will help make that more difficult. So when you understand what the vectors are, then it allows you to try and combat them effectively.

As I mentioned, connect items, show stats issue fixed in Azure and 2017. Whether or not that will come back to 16 I don’t know. Enhancing unmasked, to make it more granular, I have actually got that connect item and it’s still open. They’re thinking about it. Votes help. If you think that this could be useful to you and by having a more granular unmask ability, so you say, this database user can see these columns in the table – if I’ve got a database with ten – sorry, table with ten columns in it and I’ve got eight of them masked and user one can see everything unmasked, user two can see four of the masked column but is allowed to see the actual data in the other ones, things like that. So having that ability to say grant unmask to this users for this table, for this column, and this column, and this column. If you think that will be useful functionality, please go and vote on the connect item. The more votes we have, the more chance we have of not getting it closed by design.

And then there’s the filtering unmasked columns where I got closed by design and got cranky about it. If you think it needs to be reopened please leave comments. I think they still get emails when we leave comments on those close connect items. But I think again, looking at Ronan’s blog and some of the things that he’s doing there to try and circumvent it, again, it’s potentially a chance for us to try and help Microsoft improve the product.

And with that, we’re going to roll in and say there we go, done. If anyone’s got any more questions…

Erik Darling: No, no other questions about that stuff. I do have a couple though. So when you were mentioning that there’s access patterns that still kind of expose data even accidentally, are there any overall access patterns that aren’t a good fit for either of these features, or like, who would you recommend them to?

John Martin: So one of the things I’d say is if you’ve not – when it comes to dynamic data masking, if – to try and prevent those sort of access patterns where I can put in a postcode that’s on a masked column and then see that result set come back and say, well, I know that I used that as a equality predicate and therefore all of those x’s are actually this value, it comes back to thinking, okay, well if people are going to be doing that sort of querying, you’re going to sort of struggle. If it’s behind something like reporting services or some sort of BI platform where people are not connecting in with interactive query potentially it might work. So reporting services is where I’ve looked at using dynamic data masking a couple of times just simply because historically I’ve had to do it in the report layer itself, which – and based around doing Windows lookups about who the Windows user was that was querying the thing and what the report was coming back for. Dynamic data masking is probably the less useful of the two at this point in time in my view. It’s got a lot of potential, but at the moment I don’t think it’s really there.

Erik Darling: So are there any third party tools or vendor tools out there that would take the place of it or do a better job of it then dynamic data masking?

John Martin: Not that I’ve looked into in any real depth. I fall back to my patterns that I know rather than spending the money I could write some code around that and make it work while saving myself a few dollars sort of thing.

The row level security I’m actually evaluating for our product suite because we’ve got roles and rights-based security in the suite of tools that we have, and at the moment we use functions and I’m looking at whether or not obviously this may or may not help. But it’s one of those things that I don’t think that it’s quite really there for ISVs yet because it’s 2016 and above. That rules out a large chunk of the old sort of security – the SQL Server market for you if you want to run…

Erik Darling: 90-some odd percentage.

John Martin: Precisely. I really like the function but I mean, we’ve already got that capability and it works really well the way that we’ve written it, so at the moment I don’t think there’s any real merit in transitioning the code base or anything like that. So we’ll stick with what we’re doing at the moment because it works quite well. But yes, I mean, I think for anyone developing applications, whether they want to use row level security, whether they may have historically had to use things like nested views, the way the dynamic used to do it, or pull lots of data up to the application and then filter it there, it may be a candidate for use. But again, it depends on the authentication mechanism for the application, or if you’re willing to put the leg work in around building the metadata into the database to handle if you want to use Windows authentication and then handle it that way.

Erik Darling: There was a question, Martin wants to know if the demo scripts and stuff are all going to be available anywhere.

John Martin: Yes, Brent, do you publish the decks and the demos and stuff? If not, I’ll stick them all up on my blog.

Brent Ozar: I was going to say well put them on your blog and then that way we’ll link from this session to your blog and people can go grab it. And speaking of which, you should put your blog up there too, so where do you blog at these days?

John Martin: So I’m a corporate man.

Brent Ozar: And there’s a lot of bloggers there too, I mean, there’s a lot of sharp people writing blogs over there.

John Martin: Yes, so I think we’re up to nine or ten MVPs now on staff. I lose track.

Brent Ozar: Basically the guys in Microsoft hoovering them all up.

John Martin: Yes, you know. Yes, obviously we’ve got Aaron, who’s a wealth of knowledge and information there. We’ve got Melissa Connors who’s one of our special projects team here at SentryOne. She’s done some really, really cool stuff that she’ll be blogging about soon. Got Melissa Coates, who’s – we’re doing some stuff with Azure at the moment, she’s publishing information there as well around the sort of patterns and practices we’re doing. So yes, there’s a wealth of information there. I blog up there as well. I’ll put this up there along with all the demo scripts and let me just quickly grab my – where’s it gone? There we go, that’s the one I want. So there we go, found it. So my contact information, feel free to email me. So if for some bizarre reason someone thinks of a question tomorrow, just email me. Send me a message on Twitter, anything like that. Like I say, I’ll get this probably up on the blog over the weekend and I’ll send you a link Brent, and we’ll roll with that.

Brent Ozar: Cool, perfect.

John Martin: Thanks very much.

Brent Ozar: You bet, nice job sir, very good round of virtual applause there.

The following two tabs change content below.

John Martin

Product Manager at SentryOne
John (@SQLDiplomat) is the Product Manager at SentryOne, looking after SQL Sentry and Plan Explorer. John is also a Microsoft Data Platform MVP, with over a decade of experience with SQL Server and the Microsoft Data Platform. Having worked with SQL Server for the last decade, John has gained a level of experience as a DBA, Developer, and former Microsoft Premier Field Engineer. Gaining a broad understanding of how you can use, and misuse, SQL Server.

Latest posts by John Martin (see all)

Previous Post
Why Defragmenting Your Indexes Isn’t Helping
Next Post
How to Use Parameters Like a Pro and Boost Performance

4 Comments. Leave new

Dumb question: why put Azure SQL DB in the title? Don’t these features work on-premises too?

If there’s no feature quality differences, I just wouldn’t mention Azure SQL DB exclusively in the title because it’ll restrict your audience. (I think this one easily has the potential to get voted in on its first try, although I’m notoriously bad at predicting winners, hahaha.)


Great Abstract. You clearly identify the audience, the core topics and what the audience will walk away with.

My only criticism would be to turn that middle paragraph into a bulleted list. Just a personal preference, but when sentences are 20 or more words, they can become hard to follow. A bulleted list could be a great way to tighten up that prose a bit.

Bonus points if you can add a bit of a teaser. You talk in general terms about the two technologies and what we’ll learn. However, I’d love to see an acknowledgment that data masking has SEVERE security flaws. Give us a cliffhanger, so we’ll be desperate to find out why data masking will burn down our house and say mean things to our puppy.


Leave a Reply to Brent Ozar Cancel reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.