Developers and Development DBAs interested in unit testing Transact-SQL code.
There are tools available from Microsoft, as well as from a number of third party vendors, to ease the process of integrating database unit testing into your development environments and your continuous delivery pipeline.
This session isn’t about any of those tools, but is an attempt to get back to basics – with the open source tSQLt framework – and address questions such as:
- Why is database unit testing important?
- Why is database unit testing difficult?
- What do I need to test?
- What do I not need to test?
- What does a “good” unit test look like?
No particular experience of automated testing is required, we’ll be talking about database testing “from the ground up”. A good grasp of Transact-SQL constructs will be helpful in understanding the examples.
Enjoy the Podcast?
Why I Want to Present This Session:
This session was born of years of frustration attempting to create effective automated tests for SQL Server, both inside and outside of the database.
Brent Ozar: So welcome to this session at GroupBy. Gavin Campbell will be talking about effective unit testing for SQL Server. You can take it away, Gavin.
Gavin Campbell: Okay, cool, are you seeing effective unit testing for SQL Server on the screen?
Brent Ozar: We are. I was going to make a joke about seeing something for Oracle, but no, yes.
Gavin Campbell: Great. So guys, thank you very much for the introduction, and to everyone else out there, good evening, good afternoon, or good morning. And welcome to the last GroupBy presentation of 2017. And well, what a privilege it is to be here sharing this little bit of time with all of you out there. So as it says on the slide, my name is Gavin Campbell. That’s my website at the bottom there, arapaima.uk, and we’re going to be spending the next 45 minutes or hour or so talking about effective unit testing for SQL Server. If you want to get in touch after the talk, this is probably the best way. I kind of do Twitter, I do the SQL Slack community, but the only thing that really has an SLA is email or its equivalent, like filling in this page on the website. So a bit about this talk. As you can see from here, I do quite a few different things at the day job, and whilst I am talking about you know, how we write unit tests, obviously that fits in to this much bigger picture about continuous integration and continuous delivery and all of that good stuff, even the dreaded dev ops. So whilst I don’t really have any presentation stuff about that, I think comments and questions about all that stuff are definitely very much on topic in the Slack channel or afterwards, because it would be foolish to pretend that this testing thing kind of exists in isolation. Also here on the site, there is a list of talks that I’ve done and here is GroupBy here, and if I click on this link, that’s a link to a hosted version of the presentation. It’s the same as this one I’m reading from. I’m not going to click on it now to avoid any kind of like presentation inception.
So having said all of that, why are we here? What’s so great about unit testing anyway? Well, it might be sensible to start by considering what do we mean by the single unit testing? And what do we mean by this thing we call a unit? Well, there’s one definition that I kind of like, and it came from Microsoft’s website, and it reads like this. “The primary goal of unit testing is to take the smallest piece of testable software in the application.” Now, it’s possible to get really bogged down in thinking what do we mean by unit, when we do unit testing. And I heard this story once – no, I was on a website, on Martin Fowler’s website, about a guy who had been to a course about testing and before lunch on the first day, the instructor had had some like, 25 or 30 or some ridiculous number of definitions of what a unit was. But I think the real takeaway is that it’s actually not important what the unit is. The unit is a small thing that you can test independently. We isolate this unit from the rest of the code. Isolation is really important because that’s what guarantees that we’re only testing the thing that we intended to test. So we’re not accidentally testing a load of other stuff that we don’t really care about in the context of our unit test. And finally, we determine whether this piece of code, the unit, behaves exactly as we expect. This is important too, because this helps us draw a distinction between so – there’s all this stuff that our code is supposed to do, like do the things it says in the spare, or run you know, in a reasonably fast matter. But all of that is not really our concern here. This is about – this piece of code here that we are inspecting, or writing code to inspect doesn’t do what we thought it was going to do when we wrote it. So this is what unit testing is all about. So this is actually from the Visual Studio 2003 documentation. The Visual Studio 2017 documentation is much less concise on this particular point, but you can still download the old docs from that link there in the slide.
So what makes a good unit test? Well, there’s a couple of things. We want our unit test to be fast. This is because you want to be able to run them over and over again. We want to be able to run them on the developer’s desktop, we want to be able to run them when we check in or push or merge our code with the old developer’s code. We want to be able to run them by a CI server, and because we want to do this so often, we want all of our unit tests to be able to run very quickly. Secondly, we want our unit tests to be independent. This is independent of one another. So this thing about having a coded test that depends on the value of another coded test, that kind of trumps up sometimes in like a UI testing type scenario, but really that doesn’t concern us here at the unit testing level. We want our tests to be reliable, and I kind of have a special technical meaning for reliable, where reliable is the opposite of fragile. So our test that is fragile is a test that can fail for some reason other than the code that you’re testing is wrong. So that can be something like it depends on some external service that’s absent, it depends on like a user doing something. We want our test to be reliable so that if our test fails, we know that we have to go and fix something, we don’t have to go looking around for other causes. And finally, kind of an extension of that, we want our test to be specific. By specific, we mean really, we want each test to test only one thing. That means that we know if our test for that one thing fails, we need to go and fix that one thing. If we have a test that tests a bunch of things, then we have to go and consider, well, which of those things do we need to go and test first? Well, to go and investigate first. We want our test to be as specific as possible.
So all this fits into the notion of a test pyramid. This is originally from a book about agile software, but there are a few different versions of it on the internet, most of them drawn slightly better than this. And what this is intended to illustrate is the – even depending on your interpretation, even on the number of tests or the amount of effort that we put into each of these types of testing. Now, I think Google did some work on this on their own internal projects and figured out that they wanted the base of the triangle, for unit tests to be 70% of their effort in testing. The integration tests to be 20, and these acceptance tests only to be 10. So why is that? Why do we want to have so much more effort expended on unit tests? Well, this speaks to the characteristic of unit testing. As we move up the pyramid, the tests at the bottom are very fast, and we want to be able to run them over and over again. As soon as we get into integration testing, it doesn’t matter how we define these terms precisely. Integration testing, we’re incorporating some external dependencies, some code that we didn’t write, maybe a file server, all that stuff makes our tests slower. Finally, the acceptance tests. Well, these are generally concerned with driving of the UI, or even getting a human user tester to drive the UI, and obviously, all of that stuff takes forever compared to running our short specific unit tests.
Also as you move up the pyramid, we go from reliable, where if a unit test fails we know we have to go fix the thing that it was testing, but an integration test is by definition more fragile. The FTP server might be down, we might have some problem with active directory, all of those things, some external servers we depend on is absent. And at the top of the tree, well, acceptance testing in any software system tends to involve a couple of things. It tends to involve browser automation, which is like a nightmare for things failing for reasons that are nothing to do with your code, and browser not automation where there’s a human driving the browser, and there are many things that can go wrong there that can cause your test to be marked as a failure, that don’t really have anything to do with the code that you wrote.
Another way I kind of like to think about this is in terms of this notion here, and this is something called the circles of influence and the circles of concern. And you know, I have no idea where this was from, so I ended up searching for it, and it turns out to be from this book called The Seven Habits of Highly Effective People. As I just showed, some of that stuff penetrates our culture. So anyway, the kind of idea of the picture here is that the stuff in the light circle in the center, which is our code and our tests, are the only things that we’re actually empowered to fix. And all these other things in the blue circle, the circle of concern, are kind of all the things that can go wrong. Right, and so the idea in this chapter of this book is that effective people, and in our case, effective developers try to maximize the size of this circle of influence, the things we can actually fix, compared to the things we have no control over. So these are things like the behaviour of the users, the behavior of the SQL Server, the performance, the volume of transactions, of third-party services. All these are things that if we worry about them, makes us less effective at writing our code because we can’t really do anything about those things. The things we can fix is the code that we’re writing and the tests that we’re writing to exercise it.
So how this fits in to the pyramid model is that at the base of the pyramid, the size of the circle of influence is indeed maximized relative to the size of the circle of concern. All the things now, units here, tends to be things that we can fix. As we move up the pyramid, there are loads of things coming in to our tests that we don’t really have any control over. Third party services, browsers, and even users at the top of the pyramid. So the effective tester is going to be concentrating their effort at the base of this pyramid where they have the most impact and the most control over their environment.
So what does a test look like? Well, I’ve written a short procedure that we can write a test for. This is the Chinook database, it was on code collect, I think there’s a version on GitHub, and it’s kind of like an iTunes-y database where you can buy tracks and albums and all the rest of it. So this procedure is called add track to album, and it kind of does what it says. It adds a track to an album. It takes two parameters, both INT, and sets the album ID of the track in the table. So how do we go about writing a test for something like this? Well, everyone’s written scripts like this. This is how you write a stored procedure. You think, “Right, I’m going to see if that works.” So what we do is we set up a bit of test data and then we run the procedure to see if it, you know, see if it runs you know, without crashing or trying an exception, and then we do a select at the end to see what it did. And everyone has written scripts like this. And so this is kind of what a test looks like, and even this, let’s say naïve example, has a bit of internal structure. So it’s broken down into three distinct phases. The first one is here, where we’re setting up the test data that we need to run our test. In this example, we set up all the data we need in all three tables involved in the test. In testing speak, we call this the arrange phase of the test. Next we do the thing that does the thing that we want to test. This is called the act. This is what we call the procedure or call the function or call the method that we’re actually testing. And finally, we have the assert. So assert is where we say this is what I was expecting that code to do when I ran it. So this arrange act assert thing is going to come back again and again, and I’ve added it in the comments here. Now, I think when you’re writing unit tests in real life, do you always have those comments? No. But I think the rule of thumb is even if those comments aren’t there, you should be able to see where they would be if they were there. You want the test structured in such a way that you can look at it and see, “Right, here’s the set up phase, the arrange, here’s the act where we do the thing, and here’s the assert where we make sure we do the right thing.”
So having gone through the effort of typing all that in, what do we then do with it? Well, we say we’ve done a file, and I’m sure this is a familiar site to many people, the SQL Server Management Studio my documents. But you know, sometimes I wonder, could you be dead clever and write some PowerShell script that might parse all this stuff here and went back and replayed it? We’re not doing that today. The point about this presentation is that we can leverage that effort we’ve gone to, to type in that test, and save it in a file and look for it again next time. The testing framework that’s going to let us take that same effort but convert it into a test that can be automated and can be run again and again repeatedly and in a consistent fashion. So if you’re writing a testing framework, you need to be implementing these things here. All testing frameworks provide these tools for creating tests, attributes which are ways of saying, “Yes, this is a test”, and assertions, which are ways of saying, “Right, this is what I expect to happen.” There are tools for running tests, these can be graphical test runners that run maybe in an IDE, or in their own GUI, or command line test runners which are obviously for these CI, CD scenarios where we run stuff from the server. And finally, there are tools for parsing the output. They can be machine readable for use in CI and CD servers or build servers, or human readable. That might be some visual thing in the IDE, it might be something on the dashboard. So if you’re inspired to go away and write your own test framework, if you don’t implement kind of these three things, your buddies at the golf club are writing their own framework, it’s going to be like laughing about your framework when you’re not around.
So there have been a whole load of these written over the years, and it kind of started – certainly, this type of test framework was something called sUnit, which was a test framework for a language called Smalltalk. Now, Smalltalk I guess never really took off in the way that it was envisaged back in the day, and jUnit then came along, which is a test framework for Java that implemented much of the same things. After that, there was early nUnit, a test framework for .NET, and then starting in 2005, Microsoft had a reinterpretation of what nUnit had done, and that was called MSTest, and shipped with Visual Studio back then. So out of nUnit, there was also a framework called tSQLUnit, there was a load of stored procedures that could help with test automation, and finally, out of the two of those, the framework we’re talking about today, called tSQLt. And before I move on from this slide, there is something to call out. You’ll notice that these test frameworks kind of map one to one with the language or the framework that they’re intended to test, and there’s a reason for that. We want the developers to be able to use the same tools and environments that they use for writing the code to write the tests. This is all about reducing that contact shift between like, it’s kind of less useful to be saying, “So I’m going to write a bunch of stored procedures in my application here, and then I’m going to change to let’s say, Visual Studio or something, and write a load of tests in nUnit over here for my SQL Server to code.” So that thing about having the same language for the tests that we have for the code, it’s about contact switching and maybe to a lesser extent about skills. You might not necessarily want to learn another language, e.g. Ruby, to write the tests for your .NET or SQL Server application.
So tSQLt, you can download from this website here, tSQLt.org, and I’m just going to scroll a little bit so we can see some features. So what does tSQLt do? It’s a unit testing framework for SQL Server, compatible with SQL Server 2005 and onwards. Every test is automatically run in a transaction. Now, this is actually a good feature, and you know, most I guess, OO frameworks can’t do this, right? It creates a transaction at the start of a test, and does the test, and then rolls it back up the end. So that means whatever happens in the test, the state of the database is the same at the end as it was at the start. And that’s really powerful, because that means that all the tests can be independent of one another and can be run in any order. An output can be generated as text or as XML. Well, that’s like the human readable and the machine readable versions of the output. And it provides the ability to fake tables and views. This is the really important feature. In fact, right, if it wasn’t for this, I think tSQLt would be not interesting at all, and we’re going to have quite a few examples of that further down. So in terms of ticking off the list, tSQLt has tools for creating tests, which are test classes, and this is way of saying, “Yes, this thing here is a test, it’s not really for my application.” It has assertions. This is what I expect to happen. It has tools for running tests, which are all stored procedures. tSQLt.Run, tSQLt.RunTestClass, and there are others for running different subsets of the tests you have. There are tools for parsing output, either as text or as XML for consumption by a build server. But it does more than that, and tools for isolating dependencies. Now, if you’ve done unit testing in other languages, let’s say with .NET testing frameworks, you kind of have two decisions to make. Let’s say you’ve decided you’re going to use nUnit or xUnit or whatever to do your tests, we have this other thing. How are we going to do mocks? How are we going to do fakes? So then we have this other discussion about are we going to use n sub, are we going to use a mock, and there are others, the Microsoft one molds. But here with tSQLt, all that stuff is kind of in the can, so with tSQLt you get the test runners and the output formatters, but also you get all the mocking tools built in that you can mock objects in SQL Server, as the name suggests.
So when you download tSQLt, you end up with this SQL script here. Where you run this is in the database that contains the objects you’re going to test. So this has a couple of implications. The first is you’re not doing this in production. This is in some other environment where it doesn’t matter what you do with the objects in the database. The second thing is, you can see further down in this script, this requires CLR to be enabled for this server. Now, I’m not one of these people that subscribes kind of the conspiracy that enabling CLR is going to cause your server to be taken over by man hackers at any moment, but having CLR enabled is a let’s say, a code smell, isn’t it, In any SQL Server application. And often, what CLR enabled means is this application is doing something that it shouldn’t be doing in SQL Server, it should be doing it somewhere else in the application stack. And that indeed is true of tSQLt. If you think about what you have to do to write a test framework, there’s lots of non-database-y, non-set oriented stuff you have to do. You have to be able to do iteration over these collection of tests, you have to be able to run them one after the other. You have to do all this text processing stuff, which again, is no fun at all in T-SQL. So kind of that’s why tSQLt has this requirement for CLR to be enabled because it needs that to do some stuff in kind of a convenient way that would just be acres of code in T-SQL to do. So when you run this script, kind of this happens, thank you for using tSQLt. If that doesn’t happen, I guess it’s back to the website to read the instructions again.
So like any framework, tSQLt has a couple of rules, and there aren’t many. So in tSQLt, a test class is a SQL Server schema that has some extra extended properties, and it can be created with this stored procedure called tSQLt.NewTestClass. It can also be created with create schema, but then you have to know what the extended properties are that you need and know how to set those manually, and there are some scenarios for doing that. The second thing is a text case or a text fixture is just a stored procedure that belongs to one of these schemas that’s been identified as being a test class. And the name of the stored procedure has to start with the word ‘test’, and that’s about it. Those are all the rules. The reason that the stored procedure has to start with the word ‘test’ is that in other languages we have these things called attributes that enable us to tell that yes, this here is a test. SQL Server has no equivalent. There is no way of decorating a stored procedure to say that this one is just a test, so it has to go in the name. So if we revisit our earlier test, written for tSQLt, we can see that the arrange, the act, and the assert are all still present and identifiable, but not commented. But there’s some extra stuff here that we’ll step into.
So first we say, create a new test class, and the class name is add track to album. This is kind of significant actually. The name of the test class is the name of the stored procedure that we’re creating, and it’s going to create a schema with that name, and this is kind of a weird thing for SQL Server people, because then it’s like, doesn’t that mean that we’re going to end up with like, hundreds and hundreds of schemas in our database? Like, one per stored procedure that we’re writing tests for? And the answer to that is yes. And the reason for that is that this schema mechanism is being hijacked to do something else because we have no way of implementing like class in T-SQL. We create it with the same name as the stored procedure so that we keep all the stored procedures, all the tests that go with this stored procedure together, in the same schema. And for instance, they can all be run together independently with the test of some other procedure.
Then we go on to creating the test itself, and this is just a stored procedure. So we create procedure in that schema, add track to album test, which is required, existing track, add it to existing album, it succeeds. Now, the name is kind of important here. We’re saying the name of the thing we’re testing, add track to album, that’s why we use that as the schema name, the conditions in which we’re testing it, existing track, add it to an existing album, and what we expect to happen, and all of that is in the name of the test. At the start, we’re starting the arrange phase. So we do fake table a couple of times. Fake table is the real like big win with tSQLt, so what this is going to do is rename the existing dbo.Album table to something else, and create a new table called dbo.Album that is empty, has the same columns, the same column names, the same column types, as everything in the real dbo.Album table, but with no constraints. Everything is going to be nullable, there are going to be no primary keys, no foreign keys, no check constraints, defaults, identities, all that stuff. And what that means is that when we’re setting up our test data, we only need to insert that columns that we care about in the context of this test. So all the other ones that are needed to satisfy, let’s say, foreign key constraints, like the old test even had setting up stuff in artist because that was a foreign key to track. But now we don’t care about that, right? All we’re actually interested in is what does this stored procedure do? So this is isolating this stored procedure from its dependencies by saying, “Yes, don’t use the real tables, we’re just going to use these fake ones instead.” So insert the bad data, now in the act, we call the stored procedure, and now we move on to the assert. Now, we have a bit of set up to do here. we’re using select into to create some new tables in the database. Alright, we’re going to select two things we’re expecting, the album ID and the track ID into a table that’s called expected, and that’s kind of a permanent table. And then we’re going to have a look in the track table that that procedure was supposed to update, which is just a fake, and we’ll select that into a table called actual. Those names are not important actually, it’s just that’s kind of a convention that I use and it’s suggested by the Intellisense for tSQLt. And then finally we do the assert, so we’re saying having done all that, the expected table that I have defined should be the same as my actual table, meaning those two values should have been inserted into the table. Then when we run that, we don’t run the stored procedure directly. Sorry, I’ve got ahead of myself. We never run the stored procedure directly. All that stuff I talked about with transactions isn’t in here. So the rolling – the beginning of the transaction at the beginning and rolling back at the end is handled in the test runner. So when we call tSQLt run all, or tSQLt some other subset, that’s what’s going to take care of begin transaction at the end – begin transaction at the beginning and throw away the transaction at the end. So we don’t need to worry about all those changes we’ve made to the database in the scope of the test, they’re all going to go away at the end.
So when we run the test, something like this should happen. And it says we’ve run one test, we’ve tested that adding an existing track to an existing album succeeded and it passed. Now, we’re often told that in test-driven development or whatever, we should be running failing tests, and the way we make a test fail is by making a change in the code we’re testing. So we’re going to say, right, instead of setting the – instead of using the parameter we’ve parsed in to set this album ID, we’re just going to set it to null. This is going to make our test fail. And indeed, when we run the test, that’s what happens, right? So it’s telling us here, it was expecting a 456 and a 345 album ID and track, but actually it got a null. This is kind of important. To generate a failing test, we didn’t change the test to make it fail. We changed the code we were testing so that the test failed for the right reason. Imagine things had been a little different and that procedure actually had a bug in it. So instead of inserting – instead of using the album ID we parsed, it had like an off by one error, and it was inserting one more than that. But we didn’t bother to change the code, we decided, right, we’re just going to check – we’re just going to decide to make a change in the test so we’re going to expect the wrong thing, we’re going to expect null. And indeed, that test is going to fail but it will fail whether or not the bug in the procedure was present. So by doctoring our expectations in the test, we’re making our test fail for the wrong reason, and we’re not – it’s not helping us to identify the fault in the procedure. So the takeaway is, when adding tests to legacies, to legacy code, we change the code and not the tests to generate failures.
There is another feature here in tSQLt, and this is a feature that loads of test frameworks have, these custom assertion methods. So we can say exec assert equals table between the expected and the actual, and if it fails, give me this message, fail to add track to existing album. So when we run that we get a failure message and it says, add track to album test existing track added to existing album succeeds failed because fail to add track to existing album. So what’s wrong with that? What’s wrong with that is the message isn’t telling us anything that isn’t in the name of the test and it’s because we chose this sensible name for the test, and we don’t actually care about custom failure messages. So the way we name our tests is the name of the thing we’re testing, not the name of the schema. What we did, what the initial set up was and what we expected to happen. That seems like a silly thing, but it’s really important. It’s actually – it’s on a Post-it note here stuck to my monitor, and not really – I don’t forget that when I’m doing tSQLt tests, I do forget that when I’m doing tests in other languages to get that correct instruction, the thing we are testing, the conditions we are testing it in and what we expect it to happen. So in general, if you find yourself reaching for custom error messages, you should probably have another look at the name of the test that you have.
Let’s move on to another feature. So this is a table value function, and this select some stuff from a few tables, joins it together, and returns the table, and it’s the list of all tracks purchased by a customer. And the reason I have this example is to demonstrate a pattern or a heuristic that we kind of use for testing data intensive things, and this is known as testing 0, 1, or an N, or 0, 1 and many. So what this – the idea here is that we’re going to write a set of three tests for this stored procedure. One in which it won’t have any rows to act on, one in which we’ll have one row to act on, and one in which it will have many, or in this case, two; the important thing is more than one rows to act on. And it’s going to enable us to drive out different aspects of the behavior of this function.
So this is the first test here, and this is the zero case. So the test is called get tracks purchased by customer, test no purchases returns on empty result set. So you can see here, we’re setting up an invoice with a customer ID, and that is not the same as the customer ID that we’re parsing to the test. So the customer ID we’re parsing to the test doesn’t have any purchases. It’s a different type of assertion here, there are loads of them in tSQLt, you can go and look them up. So we don’t need to bother setting up expected and actual, we can just – or expected, we can just say here we expect the actual results to be empty. And this may seem like kind of a pointless test. What this is actually testing is the error handling in our function. So what we’re saying here is when we call this function, get tracks purchased by customer, and there are no rows, what we expect to get back is an empty table. What we don’t expect to get back is like an exception or something like that. So this – the zero case, this is how you find out like the gaps in your error handling.
After that is the one case, so this is a single purchase. This is really the happy path, it’s the easiest test to write. So this is where we set up a single purchase or a single track by this customer, and we’re expecting to get just one trackback for this customer, one where you select from the table value function. And so here we’re back to assert equals table, the expected, and the actual. So this is you know, often the first test we write, right? Because it’s the easiest. Right, it’s like we’re going to bunk one row into every table, run the function, see what we get.
Finally, the many case. Get tracks purchased by customer tests separate purchases, returns separate rows. There’s a bit more set up here. So we have two different tracks that have been purchased, on two different invoices, but they’re for the same customer. And that’s the customer we’re parsing to the function. And when we run the test, we expect to get back two rows from table value function. Now, why is this important, right? This is kind of a simple example but imagine there were a multicolumn join involved in that query, and in the test you’ve missed off a join criteria. So what happens in those scenarios, in like a complicated stored procedure is you end up multiplying the number of rows and the output. You end up with an accidental Cartesian join, or something like it. And if you only ever test with one row in each of the tables, you never see those, because one times one times one is one. So this is how you find – this type of test here is how you find missing join conditions, often in these like reporting or – yes, reporting type of functions in stored procedures. But there is something here, right? So there’s quite a bit of repetitive code in these tests, right? We’re setting up the same stuff at the start of all of them, and we can imagine if this was more complex, the amount of this set up will dry when we’re copying and pasting it between tests, so is there another way? Well yes, there is. In tSQLt, there’s this feature called test set up and again, a lot of test frameworks have this, especially older ones, and I’ll come to that. So there was some extra rules for test set up. It has to be in the schema where the tests are, so there has to be a test class, and it has to be called set up. What tSQLt will do with this is at the start of every test in this schema, get tracks purchased by customer, they will be given the transaction, call this set up procedure, then do the test, then roll everything back. So that means that in every test in this test class in this schema, we know that all this stuff here is already going to have happened before we get into our test code. So we abstract that common code into this set up procedure, so we fake those three tables, and we set up let’s say, three rows of test data that we can use for like a whole bunch of different tests that we might want to write. So what does the test look like now? So we’re creating the procedure, this is the single – no, this is the separate rows case. Right, so we’ve got act, we’re going to call this function, this is what we’re expecting to happen. But there’s something missing, right? There’s no arrange, and that’s because arrange has been like hidden in that other procedure, and that test set up business, that’s a feature from like old nUnit really, and even the founder or inventor or whatever of nUnit I think has said, not long after inventing nUnit, but if he was doing it again, there would be no common set up routines. And the reason for this is that the price you pay in let’s say, diminishing the readability or the comprehensibility of this test isn’t worth the convenience of having that stuff hidden in another method. This is contact switching again, right? So if I look at this test and want to think about, “What does that do?” Well then I have to go and you know, find the other file, and say, “Here’s the test, that’s up here, and then we have to have two windows so I can see the rows that we’re setting up and what we’re doing in the test.”
Fortunately, there’s another way. We’re going to create some helper procedures. So again, I’m going to put these in the same schema as the tests, but that doesn’t matter really. I just kind of put them there because they kind of go with these tests. So in this schema, the name doesn’t start with the word ‘test’, that means that the test framework isn’t going to detect it and automatically run it, and that the procedure is called fake track invoice and invoice line. It has a long name that says what it does. And it does just that, fakes those three tables and we’re ready to insert the data. That’s done by a longer procedure called a test data builder. So again, I’ve used the same naming convention, it’s in the same schema as the tests, but it has a name that doesn’t start with ‘test’, so we’re going to have to call this manually. Test data builders are characterized by having a whole lot of optional parameters. What we can do with these optional – and what we do with these optional parameters is we define one for every column for every table we’re interested in for this test. All of them are optional because they all have a default, and the default is always null. There’s a bonus example here of how to use an output parameter in a test data builder, so this is maybe – maybe in real life that’s done by a sequence or by an identity, and we just want to make sure that the calling code knows that it can never know ahead of time what that value is. That’s going to come from scope identity or something in the real application. Sorry, there we are, so it links to a random value. So because we’ve set default of null for every single one of those, when we actually insert the data into the fake tables, almost every column will be inserted as null because we haven’t parsed it in a call to the stored procedure, but that’s fine, we can parse only the parameters we care about for the scope of this test. That’s kind of abstract, I have an example of using it.
So here, we have a revised test using this test data builder, and instead of all that set up stuff, we’re just going to call our two helper procs, but now you can see what this test does. Now it says at the start of the test, we’re going to fake those three tables, and then we’re going to do this thing, create an invoice for one track, and we can see in the test what the data was we were parsing in. Now, there’s a whole load of other parameters to that create invoice for one track procedure, but we don’t need to care about them. They’re just going to be inserted as null, and their not relevant to our test. Finally, the rest of the test is just the same, but we can see from this test that when we set up a track like that on its invoice, at the end, that’s what we expect to get back from calling it a procedure. So there we use helper procedures and data builders, not set up procedures.
There are a couple more examples to go here. So this is maybe a common scenario, this is playlist track in our iTunes like database. So this is a many to many table with two foreign keys, and we also have a date time two column that we’re going to use to log whenever we make updates or inserts to this table. This is a fairly common kind of thing. Then we have our procedure that merges into this table, and puts in the two values that we’ve parsed and sets the column to sys.utcdatetime, which is the millennial get date. So here, when we write a test, we’re saying tests that are valid track and a playlist was inserted, but what are we expecting? When we select from that playlist track table, when we’re expecting our two keys that we parsed in to come back, 66 and 67, but what date are we expecting? We don’t know because we were calling sys.utcdate time inside the procedure to give us a date to put into the table.
So how do we go about writing a test for something like this? Right, method one, widely used, including by me, because sometimes method one is the only method. In the test, we cast what we get – we cast the current date as a date, and we store that in the expected table. Then when we select from the table that was added on by the procedure, we cast that to date as well. So that means that’s kind of safe to compare the date that we’re expecting, the date portion of the date we’re expecting with the date portion that we inserted. And yes, this is kind of okay. It’s obvious when this is going to fail. It’s going to fail if two bits of that test happen at either side of midnight. That’s normally alright. But this test is more fragile because we’ve introduced a reason that this test can fail that isn’t because there’s something wrong with the code. So is there another way? Well, yes, but it involves changing the code itself. This is a really common thing to do, I’ve done this this week at work. So instead of like, casting the results of the date, we’re adding an optional parameter to the test, and that means that current users of this stored procedure don’t need to know about my optional parameter date time two, it’s just going to be parsed as null. If it is parsed as null, we’ll just set it to the current date and time, so that means for existing clients, it will just work. So finally, when we do the thing, when we do the merge, we don’t call sys.utcdatetime, we just insert the contents of that variable. Now, what that means is that when we write the test, or we can create our own time and date outside of the test and parse that into the procedure when we call it, and that gives us a predictable result that will work including at midnight, this test will pass as long as everything in the code is alright.
And so this technique kind of has a name, like so in .NET land, this is called dependency injection. So this is where instead of like, you know, letting the code go away and find all the things it needs to do its thing, including the time, and time is so often the problem, that its more like we tell you what time it is and then we’re going to see what you do based on that information. So here, when you can, it’s okay to change the code to make it easier to test, and one of the ways we do this is this like fake dependency injection by parsing dependencies as optional parameters wherever we can.
Right, in this next example, I kind of have a strict rule about like not stealing stuff from clients to put in public presentations, but this line here, if the client’s watching they know who they are, so this is a stored procedure that runs every night to perform various tasks. And yes, I have seen that comment, and this is not so uncommon a thing. A stored procedure that does like everything that runs on some schedule, and I think if I scroll a bit we can see some of the stuff. So if it’s Monday it does this thing in the genres table, on Wednesday it does something to tracks, on Thursdays it updates some derived measure, on Friday we send the birthday email to every employee who’s birthday is in the next week, and you get the idea. And yes, this is far less uncommon than you might think. So where do you start with running a test like this? Well, we know about isolating dependencies, so we add a bunch of mocks. Now actually, this is kind of tame. There are about seven or eight in there. You can easily imagine in like, you know, your own company’s version of the epic procedure, there could be like, 20 tables, some functions, some views, some procedures that all of those things have to be mocked before you can write a test to do all the things. And that itself is a [unintelligible], right? This is testing telling you something about your code. If you have to mock like the whole universe to be able to write a test for your procedure, this is a sign that there’s something wrong with the procedure, it’s doing too much. So again, where we can, we refactor the procedure and what we do is we make it more modular. So we encapsulate all those things that does on the various days of the week into these independent stored procedures, which we can then fake when we’re testing the outer procedure. So in the context of testing do all the things, we don’t really care what those inner procedures do. We’re just going to fake them, and we can create specific tests like this one. So we can say, do all the things, on Thursday we update artist rankings, but in the context of testing the outer procedure, the megaproc, we don’t care about what that does. We’re going to use spy procedure to create a fake of it, which is a copy of the stored procedure but doesn’t do anything except for log that it was called and with what parameters. So now, when we say, right, we’re going to call that stored procedure and pass it Thursday, the only time I’m going to test is, so on Thursdays we call that procedure called update artist rankings. We don’t care about anything else.
We can do a negative test like this too. So we can say on Thursday we don’t do update the postcode, so again, we create a spy of the procedure and we call do all the things. Again, we look in this table, updatemissingpostcodes_spyprocedurelog, so that was created by spy procedure, and that’s going to record one line for every time we hit that procedure, for every execution. And that row will contain like an ID, and it will contain whatever parameters got passed. In this case, there aren’t any, so we only care about the count. So having got started with that, we can save loads of time by testing more than one thing at once. So here’s a test. On Mondays, we delete the empty genres and update the artist rankings but don’t increase prices or update missing postcodes. Great. We can test all that stuff at once. Right, there are a couple of alarm bells here. And you know what are, right? They’re in the name of the test. So here, there are some conjunctions. The name of this test has an and an a but in it, so all this by definition, we don’t even have to look to see that there are four asserts. We can tell from the name of the test that something’s amiss here, and indeed there is. So what’s really the problem of doing this kind of thing? Well, we’re going to run this test and imagine like, you know, it passes, great, so now imagine it fails, right? So then we see in like our report of tests, this test that tests on Mondays we delete empty genres and update artist rankings but don’t increase prices or update missing postcodes, that’s failed. Well good, so which one of those things didn’t it do? Which one of those things made it fail? So that’s kind of why we try to keep our tests down to like one assertion per test. And this isn’t the answer either. So we can convert our four assertions into like a compound assertion to say, like on Monday all of those things happen, but yes, this isn also not really the way ahead. The way ahead is with one assertion per test.
Some things we can’t mock. This is a systems stored procedure. It’s not the reason we can’t mock it. It can’t be mocked because it’s in a different database, and tSQLt can only act on the things that are in the same database where it is. So this is xp_logevent, our system thing, but it will equally apply to something we’re calling to our link server or something like that. So instead of mocking it, we wrap it. So we create our own procedure called do pointless logging, and it does the pointless logging. And you know, you can probably use this in a hundred different applications afterwards, couldn’t you? And then we can write a test that says on Thursday we don’t do pointless logging, so now because we’ve got a procedure that we own, we can create a spy for it and say, mock this procedure, do pointless logging, call do all the things, and assert that on Thursday we didn’t hit that procedure because we only do pointless logging on Fridays. So the takeaways here, mocking the whole world is a sign that the procedure is doing too much, and if you have difficulty saying what the test condition is, you’re going to have difficulty maintaining it.
Final set of examples are about exceptions. Always a fun topic. So here we have a stored procedure that adds an invoice like to an existing invoice. Common thing to do. Take some parameters, those are all the columns from the invoice line table, except for the key. Let’s just imagine that’s an identity and we’re going to insert those values into the invoice line table. So obviously there’s a foreign key between invoice line and invoice. That’s kind of implicit here. So let’s write a test for that. So we’re going to say create a test, add line to existing invoice, for this procedure, test non-existent invoice as an exception. So right, we’re going to fake all our tables, invoice and invoice line, but here’s another feature of tSQLt, apply constraint. So having made a fake copy of invoice line, we can say, “No, not completely fake, I want to bring back this one constraint, this foreign key from invoice line to invoice.” And so now we have kind of a partial – there’s still no data, but now there’s a constraint between that and the invoice table. And now we can say, tSQLt, expect exception. So what we’re saying here is when I run line after this, or a line after this, I’m expecting an exception to be thrown. We’re expecting an exception to be thrown because that foreign key is going to be violated, the invoice table is empty, so anything we insert into invoice line is not going to match it. And this test passes. Great. So what have we just tested? We’ve just tested that foreign key works. Now, foreign key has been in SQL Server since version six, and we kind of hope that Microsoft, all the haves and tests for whether or not foreign key works, and often if you’re using these additional features, there are scenarios for them. In particular, triggers, right? When we want to fake a table and then bring back some – bring back a trigger to say just test the stuff in that trigger, obviously, the end goal is getting rid of the trigger, but maybe one of the steps towards that goal is having some tests for it. So there are some scenarios for bringing back constraints, but it’s easy to stray into testing things that you’re kind of hoping Microsoft already tested when you’re doing that kind of thing.
And so a slightly better example of this is this. So this is certainly an older code. You see this kind of error handling, don’t you? I’ve got a procedure that takes a parameter and it’s going to validate that that parameter matches something in the other table, and if it doesn’t, it’s going to call a procedure called log error that’s going to log an error, and it’s going to throw an exception. But if none of that happens and the artist ID is valid, then it’s going to insert the data into the table. So we can use that expected exception to test this procedure as well. So here, I’m going to test a missing artist throws an exception writes log record, so again, we isolate our dependencies, we call fake table a couple of times, we fake out that log error stored procedure as well, and we say, right, expect an exception, something after this is going to throw an exception. And then also we’re testing the logging so we’re going to set up our log message into this expected table and use assert equals table to compare that to the actual that we select from the spy procedure log. So this test is going to pass, right? This test is going to pass too. And it actually doesn’t matter what we put into the expected for the log message. The test is going to pass every time, and that’s because of expect exception. And when we say expect exception after this, we’re kind of telling tSQLt, get ready, we’re about to throw an exception. This line here, the stored procedure call is what throws the exception and then it gets caught and nothing below that ever gets hit. So this test is going to pass every time, but we’re never testing that the logging thing actually happens. We’re only testing that the exception gets thrown. But there’s actually a clue in the name, right? There’s a conjunction in the name of the test. And the name of the test is test this procedure here, in this error condition, throws an exception and writes something to the log. This is the thing with exceptions, isn’t it really? So you can have a function or a method or something that returns some value, but that’s not really the whole signature of the procedure. It could be. This stored procedure will you know, return some value or produce this result set, or it might throw an exception. So that’s kind of some implicit additional piece of information about the procedure that is kind of difficult to see when you just type create procedure.
So the answer is that we create a separate test. Those are really two separate assertions, they’re two separate things the procedure can do. It can write a message or it can try an exception. So to test writing the message, we need to catch that exception ourselves. This is a good tSQLt top tip here. So we’re going to call the stored procedure, but we’re going to do that in our own try and catch block, but our catch block doesn’t do anything. All it does it stop the exception from being surfaced back to the calling code. That means the exception is caught and we’re free to go and inspect what’s in the log table to make sure that’s what we’re expecting. Then separately to that, yes, we can have a test that it throws an exception and there’s no and in the name of this test any more. All we have to do is set up our fakes, call the procedure and say, yes, we got an exception and the test is going to pass. So yes, we treat exceptions in separate tests and also we don’t really need to test that like, SQL Server is working.
So in summary, arrange, act, assert, we need to be looking for that in all the sets. We need to see that pattern in every kind of tSQLt test we write. We mock our dependencies out, so isolate the code that we’re trying to test. We only test the code that we own; again, coming back to that thing, but not just what Microsoft have written, what your like, third-party partners have written really. You really want to be putting it on them to write the tests, and you just test your touch points. We change code, not tests, to generate a failure. That ensures that when we do generate a failure, we’re failing for the right reason. We observe our test naming conventions because that means when we see in our log of you know, 500 tests, that this test has failed, we can see without drilling in any further what it is we need to go and fix. We use data builders rather than set up stored procedures. We’re not afraid to refactor our codes and make it more modular and easier to test, or indeed, to inject dependencies wherever we can. We need to be careful with exceptions, and that’s about all. Thanks for listening.
Brent Ozar: Very cool. Nice session sir, very nice introduction to testing there. Had a nice lively discussion in Slack along the way.
Gavin Campbell: [crosstalk]
Brent Ozar: [crosstalk] was like, “It’s neat to have a test where you expect failures to happen and you have to have that in order to pass the test. That’s pretty cool.” Where would you go to learn more about testing moving forward for the folks who want to learn more?
Gavin Campbell: There’s a couple of things there, so tSQLt itself is pretty well documented on the tSQLt website, right? Like, there’s whole load of stuff I haven’t shown there, and it kind of says how to use all the bits of tSQLt. I think one of the things that’s lacking maybe in a SQL Server literature is like the great book about testing. What you can find is that exist for Java and it does exist for C#, so actually a lot of this stuff for about you know, solid, which is what we talk about .NET programming or OO programming, a lot of being more effective in writing SQL Server tests is applying stuff that is good practice and other programming language in SQL Server. So while the original extreme programming book is something worth looking into, like, Michael Feather’s book, Growing – is that Michael Feathers? Have I just got that wrong? Growing Object-Orientated Software: Guided by Tests, who wrote that?
Brent Ozar: No clue.
Gavin Campbell: No clue. So Michael Feathers wrote about legacy code, and where legacy code is defined as code that doesn’t have tests, right, there’s something to think about.
Brent Ozar: Every piece of code I’ve ever written.
Gavin Campbell: Yes, so really a lot of this stuff comes from like, applying things from other programming languages and I think there are some people who have written a few blog posts about – Ed Elliot I’ll call out, I’m not sure if he’s in the chat or not, it’s kind of late here in England for him. But he’s often in the SQL Server Slack channel and has stuff on his site about it. I don’t know if James wants to help me with any others about tSQLt. [crosstalk] possible Redgate have some information about – and Redgate’s front end is just a [inaudible] for calling those procs, right? So possibly Redgate will have some kind of insight in what you actually do once you’re using their front end to call the procs.
Brent Ozar: Got you. Alright, well thanks a lot for giving the presentation today.