People who wish to manage and test T-SQL changes more reliably and efficiently, regardless of role.
This isn’t the dark ages anymore. You’ve learned that you need to put your database in source control and you’re competent with source control systems like TFS or Git. You’ve also learned how to express your database in script form using a tool like SSDT, DbUp or Redgate.
However, you still haven’t written as many automated tests as you know you should and you aren’t convinced you’re going about it the right way. You haven’t looked at the build functionality in VSTS yet or gotten to grips with build servers like TeamCity or Jenkins, and even if you have you aren’t sure how the process should work for SQL Server builds and tests.
In this session I’ll explain how to use tSQLt to build a suite of automated tests to give you confidence in the quality of your code. Then I’ll talk through various ways to automate your database builds and run those tests on some schedule or trigger. I’ll also discuss the pros and cons of various different approaches so that you can understand which approach would suit your teams and projects.
Why I Want to Present This Session:
I see so many organisations managing and testing their T-SQL changes poorly. This creates cultural problems, finger pointing, blame. I makes it very difficult for us to deliver updates with the pace and quality our businesses desire. It makes it very difficult to adopt DevOps.
I love SQL Server. But given that we are so poor at managing relational database changes, many people see the database as a bottleneck. They move data logic into the application because sprocs are seen as evil, they drop relational databases for NoSQL alternatives and use DevOps or Continuous Delivery as the argument, rather than the suitability of the particular NoSQL database for storing the particular data in question. This means there are less opportunities to work with SQL Server – and some people see me as a dinosaur.
I want to do my part to bust the myth that SQL Server is impossible to test properly. I want to stop databases being the bottleneck in releasing software reliably.
Brent Ozar: Alright, so at this next session at GroupBy, Alex Yates, returning presenter, will be talking about getting continuous integration right for SQL Server, so take it away Alex.
Alex Yates: Hi there, thanks very much for having me back. I clearly didn’t screw up last time quite badly enough and you decided to have me back, thank you very, very, very much. Before we get started, just a bit about me, my name’s Alex, I’m a database DevOps and Continuous Delivery consultant based in the UK, although I work internationally. I’m a keen community member, so I co-organize the SQL Relay conferences in the UK, they’re a lot of fun, they’re in October. So if any of you can get to the UK in October, it’s five conferences in five days in five different cities in the UK. So we kind of put up one conference, we’ll put it down, all get into a minibus, drive two hours, do the next one, so that’s quite a lot of fun. Five SQL sachets in five days effectively.
I also run the London Continuous Delivery meet up with a guy called Matt Skelton. So the pair of us run that in London, that’s kind of a monthly meet up type thing. Also earlier today, I got an email, which went into my junk folder, which I very nearly deleted. Actually, I got this yesterday and only spotted this today because I’m traveling and I just turned on my emails.
Adam Machanic: Congratulations.
Alex Yates: As of today, I can add a new badge to my photo, so that’s still settling in, and I’m busy saying thank you to a lot of people on Twitter who are congratulating me. So I still got a massive inferiority complex about that because I feel like there are a lot of people – I do not feel worthy. Thank you very much to Microsoft for completely humbling me like that.
Adam Machanic: Good job.
Alex Yates: Thank you very much. Thank you. So the hashtag for the [enters] is #GroupBy, and my Twitter handle is @alexyates. I’m going to do the same thing that I did last time, which is every time I mention a reference or a blog post or something, rather than just showing the link on the slide and expecting everyone to screenshot it and copy it into their browser, I’m just going to tweet it on the hashtag GroupBy. So follow the hashtag or follow my Twitter handle and all of the links and references will be there.
So I think I showed this slide last time, maybe I didn’t. One of the principle things that I’ve witnessed in the last seven years or so helping people out with releasing databases is there does seem to be some conflict between developers and DBAs. You may have noticed that, because essentially developers are trying to move as quickly as they can, get their stuff out to market as quickly as possible, and DBAs are kind of coming from the other end, trying to avoid stuff breaking, avoid stuff going wrong, and that can cause some amount of disruption. Because developers and DBAs have very, very different processes, very different ways of thinking about problems, they don’t necessarily have the same view of the world and sometimes stuff works on one machine and not on another.
So one of the movements is trying to bring Dev folk and Ops, or DBA folk together is the DevOps movement, and one critical part of that is continuous integration. Now, this was the picture that I used as the header picture for my session, which got a few people making a few comments on the interwebs, and that’s because – that’s kind of a response that somebody – and I’ll share who it was a bit later. I can’t take credit for it – that’s response that somebody gave to how a lot of people are interpreting continuous integration. So a lot of people are interpreting as automated builds, plugging up your source control system to Jenkins or to TeamCity or TFS or something to run automated deployments for example, and before I started this session, I just wanted to come back to what the real definition of what continuous integration was.
Now, we can argue all day about what the genuine definition for continuous integration is, maybe there is one canonical definition, maybe there’s isn’t. Maybe there is one particular group that are responsible continuous integration and maybe not. I think ThoughtWorks would claim that – would like to lay a claim that they have been very influential in the development of our thoughts and our ideas about continuous integration; they can probably take some credit for that. This is their definition: “continuous integration is a development practice that requires developers to integrate code into a shared repository several times a day. Each check in is then verified by an automated build, allowing teams to detect problems early.” Okay, so the thing I like about this definition is it very much chimes with what the word actually is. Continuous integration – that is literally what continuous integration is. Continuously integrating my code.
So the whole idea is as teams, how many of us have had a project where we’ve had various phases of the project and one of those phases has been integration? It’s been schedules for about three months out and it’s about two, three, four weeks’ worth of work. Probably many of us have been in that scenario, and that integration phase never really goes well. When we actually try and put all the bits and pieces together, it never quite fits quite as smoothly as we hope it will. Certain assumptions are made by certain developers or certain teams, and there are conflict in code and it’s a very painful experience to get that all to work together. So one of the ideas here is that integration phase, we want to shift it left, we want to do it as early as we can in the pipeline, and the idea is that every single time I make a change, I’m going to integrate that with a shared repository of code. That’s interesting wording as well, a shared repository.
It is a fundamental requirement of continuous integration that you are using source control; a source control repository of some sort. You cannot do continuous integration without doing source control first. So that’s the theory, and just to premise what I’m going to talk about in this session. Those of you who looked at the agenda may have noticed that there were two continuous integration sessions that have been accepted to GroupBy this week, this week and next week.
I’m doing this one, and James Anderson has got a really great session about continuous integration that’s going to be shown next week. So we wanted to make sure that we didn’t tread on each other’s toes, so what this session is going to be is going to drill into this kind of what is CI in a bit more detail, kind of practically how do we put that together, but I’m going to be very, very high level. So I’m not going to get too deep into any particular one technology, any one particular way of working. I’ll talk about a couple of different ways that we can do it, and I will have a bit of demo using a couple of different tools. I’m going to try and use one or two different ones from what’s normally demoed. But I’m very much going to talk about the concept, and then James, next week, is going to show you a much more deep, technical session where he’s actually going to get into – he’s going to use ReadyRoll, tSQLt, he’s going to use docker containers and he’s going to do some clever stuff and actually get into the technology a lot more than I am. That way, if what you’re looking for is a deep technical dive into ReadyRoll and containers and tSQLt, go and watch James’ session. If what you’re looking for is more of an introduction, then this session might be more appropriate to you. So I’ll tell you that now and I won’t be offended if you want to tune out now and tune into James’ next week. That’s fine, I don’t want to have you here under false pretenses.
So coming back to this definition, continuous integration is development practice that requires developers to integrate code into a shared repository several times a day. There’s nothing in there that specifies exactly what tool that we use, build servers, CI servers, and there’s this idea that you have to spend a lot of money on tooling for continuous integration, you have to buy expensive CI servers, and a post that I really enjoy, one of my favorite blog posts is this one – Continuous Integration on a Dollar a Day by James Shaw. I’ve tweeted the link to it, or roughly about now the link should be going out there, and in this blog post, James Shaw says that look, the process of continuous integration, you don’t need fancy technology to do that. You can do that with a one-dollar chicken and an old PC that’s lying around. So if you want a bit of a funny – yes, slightly deliberately controversial blog post, do read that, because it does kind of hammer home a lot of the things that I believe in, that it’s much more about the idea of continuous integration than any one particular implementation.
And I would like to demonstrate that by – and I know I talked about this company Farm Credit before, but I would just like to try and put into a real context why that’s important. So John Morehouse, the guy on the left here, was the production DBA of Farm Credit services of America, and they had an IT team of about a hundred different people, split into 14 different development teams. All of those different development teams were doing their own work for their own stuff, using different processes. Some of them were doing source control, some of them weren’t, and those that were using source control were using different source control systems and when it came time for deployment, they would all send scripts and things to John Morehouse, who was the person who owned the production databases. And when they were all deployed, we heard a lot of it works on my machine, it works on my machine. It works on my machine because in my environment, I’ve only got my set of changes, I’ve not got the stuff that other people are adding in. Possibly my development environment isn’t quite up to date with the production version.
So often there’s a high failure rate. There are a lot of people – there are a lot of times these deployments didn’t work, so what we did there is we – if you can see the screen behind on the whiteboard behind, we basically drew together kind of a rudimentary deployment pipeline, put together a couple of proof of concepts. And they rolled out over the course of the next couple of years, a really great process where all the developers were committing into source control in the standard format, there’s a shared repository, and that was every time a change was made, they were automatically verified builds and tests happening there to make sure that it hadn’t broken.
And that’s the other important point, we don’t just integrate. We also validate that what we’ve integrated works, and the first part of that is compiling your code or building your code. So that often translates as taking your source code and seeing if you can build a database from it, and secondly, we want to do some sort of logical testing. Does it do the things I want it to do? What sort of unit tests have been applied to actually validate that my stored procedure does the thing that I want it to do? I mean, it’s one thing to say you can deploy it, but it’s another thing to say whether it does what you expect it to do. So – yes, go ahead.
Adam Machanic: A lot of what you just described sounds to me like branching methodology and where does branching methodology end and continuous integration begin or vice versa?
Alex Yates: That is a hotly debated topic, so I’m not going to fall into – I’m going to give a classic consultant answer of it depends. So some things that I would say, a lot of the time, a lot of people talk about how do I manage these different workflows of development, I’ve got some bits of development that are ready to release, some bits that aren’t, the different team works on different bits and pieces. Fundamentally, whether you’re doing it in source control or not, what you’re already doing is branching your code. You already have different development teams doing different things. Now, one of the pieces of advice that I give is your branching process should reflect what’s happening in reality, and if you’ve got different development teams doing different bits of work, in reality in isolation, then you should use your source control system as it’s being designed to work to facilitate that, which means having different branches. Often that leads to a realization that this branching strategy doesn’t make sense because it’s a complicated branching strategy, I’ve got branches all over the place, they’ve not been merged back in together, that creates pain.
And, for me, the correct response there is not to say well let’s not use source control branching; the correct response to that is to say well actually, source control branching isn’t working for us because we’re fundamentally doing something that’s a little bit silly with the way that we manage our projects. So we need to think about the way that we manage our projects first and try and simplify that and then we can have a branching pattern that matches the real world. That’s my view on that.
Adam Machanic: KIS principle.
Alex Yates: Yes, keep it simple. Absolutely. The other thing that I would say is every change is merged with main line several times a day. Now, if you believe in the true definition of continuous integration, what that means is, whenever you’re working on a branch, that branch needs to be merged back in several times a day. So that works when your development process is you branch from main line, you make some change and you commit it back in. When you have people working for a sprint or two on a feature, and then merging it back in at the end of the sprint, some people argue they’re doing continuous integration on that because that team is integrating their code and running builds off that branch, but strictly speaking, if you’re not merging that back in until a couple of weeks later, then those few branches haven’t been continuously integrated, which means that you still have that merge pain in a couple of weeks when you merge back in.
So if you’re purest fan of continuous integration, I would argue that that’s not continuous integration. However, practically getting to a point where you can have several flow of development going on at the same time and merge them all back in to main line several times a day is actually a very, very difficult thing to do. I mean, you can try stuff like feature flag or branching by obstruction or releasing in very, very small increments or making sure that it’s always deployable, releasing, breaking down bigger tasks into smaller tasks. We can talk about all of these strategies to try and get round that, but fundamentally it’s quite difficult. So there is an argument for why you might want to do that, but there’s also an argument against it as well and I’m going to try and avoid getting drawn into giving opinions on that.
I am in favor in general of breaking things down into as small as you can, the shorter your distance from main line, the better, is my opinion. So this picture behind – I’m not quite sure how visible that’s going to be for everyone, so this is a slightly cleaner version of it, and this is what we’re going to be talking about today, the source control part and the continuous integration part. We’re not going to get into any detail about the release management part because that’s a whole different discussion, I’ve only got one hour.
To get through it, this is an agenda, so we’re going to start talking a bit about version control, then we’ll talk about model and migrations. I did talk a bit about this in my last GroupBy session, so I’m not going to go into any detail, but it is an important concept, so I’m just going to go over that again. I’m then going to talk about automated builds and kind of build servers and how we can set that up. I’m going to talk a bit about unit testing, and then finally I’ll just finish with a few words on database drift and some general advice for you. So that’s what the structure of this session is going to be, it will feature a demo of a couple of different tools, so I’m going to do – I wanted to use a couple of different tools. People know me for doing Redgate demos, so I was going to try and avoid doing a Redgate demo, so for migration space I was going to show you Flyway, which is an open source migrations framework, which I quite like. I was going to show you SSDT instead of the Redgate SQL compare thing, but then I got made MVP earlier today, and that was kind of a bit distracting so I apologize. So we do still have my regular Redgate SQL source control demo to show you the model-based way of working, and then…
Adam Machanic: You know that as MVP you have to show the Microsoft product now, right?
Alex Yates: I’m showing SQL Server and I’m showing you how you use it. And then we’re going to talk a little bit about tSQLt. I’m not going to go into detail about tSQLt because I think James will speak in a bit more detail than I will next week, but consider it a bit of an introduction. So let’s get started. Let’s talk about version control. This first bit, let me Google it for you, done. It’s 2017, and that’s as much time as I have for telling people what a source control system is nowadays. If you’ve not started using a source control system, and I don’t care that much whether you’re using some version or TFS or Git, except use Git. By now we should all understand what a source control system is and how it works.
So I’m going to kind of take that as the foundation level of understanding. If you don’t understand that, pause this video, go watch a video about source control, come back. What is more interesting for me is the code that we put into source control because how we define the database in scripts is much, much more interesting and a much more hotly debated topic. So that’s this model versus migrations debate, and I did talk about this last time, so I’m going to go through this very quickly.
Broadly speaking, there are a couple of different ways that we can approach it, my wife made me fix that slide because she felt sad. So the fundamental problem is we have a version that we start with, a change that occurs, and then a version that we end up in. So model based solutions define the desired state, so something like Visual Studio Database Project, SSDT, Redgate SQL source control, those sorts of tools define a desired end state and generally we let a software work out the upgrade script. A migrations based solution is basically just a series of upgrade scripts that we run in sequence to get the final version. Typically they’re accompanied with a migration history or migration log table that you add to the database, and that will have information about where each time you run a deployment, when it was run, what version you got up to. So for example, when I want to run my deployment, I just query that table, I say what version are you up to right now, I’m up to version seven, good, I’ve got a scripts eight, nine, and ten. So we’ll run eight, nine and ten.
People have strong opinions about which way is right, so Paul Stovell, who’s the guy that wrote Octopus Deploy, who I did a webinar with on Monday, so if you’re interested in a much more detailed webinar about all of this, go to OctopusDeploy.com/webinars and look for the Youtube link. But what he says is there is nothing more reliable than keeping track of exactly the scripts you intend to run and running them without trying to compare model and guess.
What he’s saying here is that letting software work out our upgrade script is a scary thing, and when I deploy to production, I’m not comfortable with just hitting the deploy button and letting Microsoft or Redgate or somebody else work out the upgrade script and deploy it, because frankly, it works 90% of the time, and who wants a deployment process that works 90% of the time.
Adam Machanic: Can I just say thank you for saying that. Thank you so much. Thank you from the bottom of my heart.
Alex Yates: So as I said, people have strong opinions about which way is right, and a lot of people feel die hard I’m a migrations fan, I’m a model or a state way of working fan. Some people say that SSDT is the way forward, some people say that migrations frameworks is the way forward. And this is the argument that a lot of the migrations folk give, and is a perfectly valid argument, it’s a very good, clear argument about why using software to generate our upgrade scripts doesn’t always work. It works a lot of the time and it makes our job 90% easier 90% of the time, but we do need to cater for the time when it doesn’t, and a migrations framework works a lot better in that scenario.
On the other side of the argument, somebody else said that as soon as you have multiple changes to an object, on a single aspect of an object, ordering and the ability to detect which change needs to be made gets very complicated. That was Gert Drapers from Microsoft. He’s the guy that built DataDude, which is what later became Visual Studio Database Projects and SSDT. What he’s saying is that have you ever been in a scenario where you’ve got a whole loads of scripts from all sorts of different sources, and they’ve all got to be deployed against production, trying to get the ordering right of those scripts get complicated. If you have things like stored procedures, use programmable objects, that’s just code, right? And alter statement’s a really rubbish way of dealing with that because no matter – let’s say Adam’s written a script and I’ve written a script, and we’ve both changed the same stored procedure, but we’ve not taken into account each other’s change, there’s no way in which you can run both of our scripts and end up in the correct state, because the last one is always going to win and overwrite what the first person wrote. So that can be complicated to manage.
The only way to spot a merge conflict, if you’ve got a list of scripts, script one, script two, script three, script four, is to go through all of them and check, and that’s a frustrating, difficult process, especially if you have a lot of scripts. I wrote a blog post about the difference between the two and as I said, it’s a hotly contested topic. A lot of people felt it was appropriate to tell me why I’m right, or more often, wrong. But in summary, generally teams that look more like the stuff on the left, if that looks more like the work that you do, generally model based solution is going to be better for you because it solves a lot of the problems associated with development and managing all of these upgrade scripts, but it does have the drawbacks of the deployment that I mentioned before, so you need to think about that.
If you use – if you look a lot more like the organization on the right, then you’re not so worried about managing so many upgrade scripts frankly because you have less of them, and there’s less chance that they’re going to contradict with each other and you have fewer programmable objects. So those problems don’t feel so painful to you, but actually, this does solve the problems of automated deployment an awful lot better. So in a sense, model feels more agile and migrations feels more DevOps, which is a bit strange to think about things that way because we often think of those two things going together, in my opinion it does anyway.
Adam Machanic: Can I ask you a quick question? What are your thoughts on application level migration tools such as full migrator? Have you ever seen that?
Alex Yates: So that tool’s from within the application that basically just run the upgrade scripts one by one.
Adam Machanic: Well that one, in particular, you can actually model your database in c#, and you can actually version and in theory you can go conversion zero all the way up to like version ten thousand, and in theory, you can roll it all back – I say in theory.
Alex Yates: Well that’s [inaudible] So is this like to do with ORM sort of tools?
Adam Machanic: Yes.
Alex Yates: So my view about ORMs is whatever the entity framework or otherwise, is that you can write perfectly good SQL code using them. What they do is extract a lot of the problems to do with writing SQL and make it very, very easy and natural for a .NET c# developer that isn’t necessarily – doesn’t enjoy spending their time in the database, which helps them, but fundamentally, it is a skill to be able to write well-tuned – to do database development. That’s why we have database specialists, and if we’re using an ORM to get around the fact that we’re not a database specialist, we still don’t know how to write a well-tuned database. But now it’s even harder because there’s an obstruction layer for us to even realize and see that, which is why ORMs have a reputation among hardcore database DBA folk for creating really cruddy, poor performing SQL code, why DBAs don’t like application developers doing that.
So I don’t think that it’s necessarily the ORM that is a problem, I think it’s the fact that we’re using the ORM to shortcut the fact that we don’t understand SQL and it gives us a nice environment where we can hide away from that, which for small stuff, if you’re just a couple of .NET developers and you want to get a website up nice and quick, it can be really useful. It can help you get to market an awful lot more quickly, but it tends to not scale very well. That’s my experience of it. What’s your experience, Adam?
Adam Machanic: I agree 100%, I’m just nodding my head yes, just keep it up. As far as I’m concerned, you’re completely on point.
Alex Yates: Fantastic. I realize I am talking to a SQL heavy crowd right now. If this was a .NET conference…
Adam Machanic: You’ll be getting lynched right now.
Alex Yates: I would be. I’m not going to talk any more about model versus migrations, at least in slides anyway. I am about to do a demo, but before I do that, I just wanted to show, if you want to see the full one hour version of this talk where I walk through an example and I show you in more detail exactly how both the model and the migrations approach are going to burn you, then go to DLMConsultants.com/modelversusmig.
In the meantime, what I’m going to do is I’m going to show you a demonstration. As I said, I wanted to show you SSDT versus Flyway, because they’re two tools that people don’t often see me demoing. I’ve got the Flyway demonstration working and then I got made MVP and earlier today I kind of lost a few hours of my day, so I apologize for that. So we are going to be looking at a couple of tools, Flyway and the Redgate tools.
So let’s start in Management Studio, I’ll show you my databases. Adam, can you let me know, is that screen coming through nice and clearly? Do you want me to be using zoom in and stuff, or do you think it’s fairly clear?
Adam Machanic: I can see the text in the edit panel just fine, the left-hand side is a little small for me, but…
Alex Yates: Actually, let me just come in here and show you the databases that I’m going to be using. So I have a Flyway trainers Dev and Flyway trainers test that I use for my migrations demo, and I have a trainers SOC Dev and trainers SOC Test that I’ll be using for my model based demo. SOC for source control, I used to work at Redgate, internal shortcuts for things at Redgate, SQL Source Control, SC, already taken that SQL Compare, SSC, that’s already taken that SQL Server Central, so it went – so we at Redgate, all Redgaters talk about SQL Source Control as SOC. So we’ve got trainers Flyway Dev, and Test and trainers Source Control Dev and Test.
Now, the scenario here is I have a kind of findatrader.com type website, you want a plumber, you put in your details and it’ll show you the plumbers near you and who’s the best price and who’s the best review and so on, and what I want – there’s a particular class of tradesmen which is a kitten trainer, someone who can come in and train up my kittens. And I have a particular client who wants me to add the column whether they accept tiger cubs because they’ve got a little tiger kitten and they want to know whether this tradesmen is going to come along and – this trainer is going to come along and teach their little tiger cub to do tricks.
So I need to add an accept tiger cubs column to the table and I also need to – there’s a view, kitten trainers near London, because that’s the most important city, and I need to add the kitten trainers, the accept tiger cubs column to the view as well. Apologies after delivering six sessions this week, my throat is going a bit, so I apologize if it’s a bit tight. So what I’m going to do is I’m going to execute this against the – let’s run it against the Source Control database, and Redgate SQL Source Control has a nice QI that allows me to go onto the Source Control database and it will allow me to take a look at the changes that have occurred, I can write some comment, please write better commit messages than that, and I can commit that to Source Control.
What you then see in Source Control, so this is my Source Control repository for SQL Source Control, I have a table in Source Control called the kitten trainer’s table, and the key thing here is that I have a CREATE statement to represent that table. So I’m representing that table as the desired state. This is how it should look in its end state. I’ve not represented this as a migration script. If I want to represent it as a migrations script, there are a bunch of options. Let’s just push that to the right…
Adam Machanic: Can I just quickly ask you a question before you continue, so you just altered the table and then used the tool to modify your desired state script, right?
Alex Yates: So one of the nice things about Redgate Source Control is you can plug it into Management Studio, so you can just work in Management Studio if that’s where you like working, make your changes and you can just hit commit and it’s going to update that script folder in your local Git repository or your TFS repository or your sub verge repository or whatever you use, it’s coming through that…
Adam Machanic: Do you see any pros and cons of doing it that way versus just editing the desired state script or is that kind of the migrations based methodology?
Alex Yates: It’s still a model-based methodology because it’s defining the state. So you can work directly on the scripts and some people like doing that because it’s quicker. I think it comes down to personal preference as a developer. A lot of developers like the experience of Management Studio because you get the feedback from the local development database, you get the confidence that you’ve written legal syntax. Like if I create a script in Management Studio and execute it, and I execute successfully, then I know I’ve not screwed up and put a typo in there. I’m always – I’ve got butter fingers. If I had to work purely on scripts personally, I would be screwing up all the time because I wouldn’t have that validation that I’ve written legal syntax and I wouldn’t have that validation that the script that I’ve written, like the stored procedure, works successfully. I want to be able to check that locally before I commit it into Source Control, because personally, I know that frankly, I don’t think I’m a talented enough developer to work on the local files. I think that takes a little bit more skill and precision that personally I don’t know, but each to their own. There are definitely people that like working – like there is a class of people that likes working purely from the command prompt and purely from script files. I’m not one of those people, but each to their own.
Right, so let’s take a look at the Flyway Dev database. Now, before I do anything, I just want to show you – I’ve not run this script against Flyway database yet, and I can show you look – wrong table, let’s put that table away, and kitten trainers, and also let’s do this. So first of all, this is a schema version table that Flyway adds to your database. Flyway is an open source tool, it’s cross-platform, it works on SQL Server, it also works on other databases that I shall not be named, lest I lose my MVP qualification – nomination. Every time I run a deployment against a database, it adds a script here. So you can see right now in the kitten trainer’s table, there is no accept tiger cubs column. What I’m going to do is I’m going to take that script that I ran before here and I’m going to go into my Source Control repository for Flyway, so this is just a local Git repository for Flyway. And inside it, this is what Flyway looks like on the inside.
So there’s a .CMD application here that I can just run from the command prompt, and what that’s going to do is it’s going to look into the SQL directory here, and it’s going to look for any SQL scripts, and it’s just going to run them in sequence. So there’s also a conf directory here, where we need to add configuration about what the target database is and stuff, basically just a connection string, and we also need to add the SQL Server driver in there. So let’s come out of – so let’s take a look in the SQL directory. You can see at the moment I have one script, version 001__baseline. This is just a naming convention in Flyway, v number, double underscore, command. So what I’m going to do, the quickest and easiest way for me to do this is I’m just going to copy – just going to copy that and I’m going to change it to a version two and I’m going to change the command to tiger cubs or something similar, I’m going to go inside it and I’m going to take the initial baseline script that created the initial version of my database, and I’m going to replace it with the script that I ran on the database.
I’ve not run it on this data – I’ve run it on the Source Control, on the SQL Source Control version of that database rather than the Flyway version of the database. So now that’s out of the Source Control, and I’m just going to open up the command prompt so I can run the Flyway migrate. Now all I need to do is if I go back to the file directory just to show you, now all I need to do, now that those SQL files live in the SQL directory, all I need to do is run SQL.CMD with a command, so I need to run Flyway migrate, so if I just run that from the command prompt, Flyway migrate, it’s going to ask me to authenticate myself, let me just log in as the user. I need to update the user saved on this standard template VM that I still have from my days at Redgate.
Adam Machanic: There’s a request that you zoom in a bit on the command window, and I agree, it’s pretty hard to read. Thank you.
Alex Yates: So what’s happening here is it’s spotted there are a couple of migration scripts, it says it’s currently up to version one, so we’re going to run script two, accept tiger cubs, it’s successfully run that and the deployment is finished. So it did that thing, it checked the schema version table, it saw it was up to version one, it ran the remaining upgrade scripts, so if I now take a refresh of this database, you’ll see that now the columns, we should see accept tiger cubs has now been deployed to that database.
So it’s a very, very simple, very straightforward, open source tool. Just Google Flyway, you’ll find it, and I like it just because it’s pure simplicity. If you’ve got a little thorny bit of your deployment process that doesn’t quite work, then you can just add this in, just to deploy – you don’t have to add the whole database in here, just the migration script, that really complicated bit. And you can just run this before the rest of your deployment with SSDT or something, which you can let to do the nice and easy stuff, just to update in stored procedures or whatever. But you can handle your complicated data migrations this way, which is quite nice. So that’s just one solution, there are many solutions out there, I’m not trying to promote one over the other, but I’m trying to show you something a bit different from what we normally see at these sorts of events. Just to help us think about different ways that we can approach the same set of problems.
So let’s come back to my slide deck. Doing this from where I am, I don’t have my nice four monitors set up, so I need to make do with what I have. Right, so that’s the model versus migrations. The first thing that you need to do is get the stuff in Source Control and the first thing that you need to think about there is use an appropriate approach for you. If you feel like on this slide here, you feel more like the migrations sort of approach, use a migrations tool. If you feel more like a model based approach, use a model based tool. One of the really nice things about ReadyRoll, which James is going to show you next week, it’s kind of a hybrid tool that takes a best of both. So it kind of does a bit of model, a bit of migrations and tries to use it appropriately for your database. So I hope James will get into that programmable objects feature that they have, so I’m excited about seeing that next week, so do, if you want to learn more about that, come back next week and see James’ session.
What I want to do now is talk about the automated build. So this is the bit that a lot of Continuous Integration folk – this is the bit that a lot of us see as being kind of Continuous Integration… …this tool product category, so they’ve got a few years on them, a bit of maturity on them. Basically, at their simplest, what they do is based on some trigger, they perform some job. They’re fancy PowerShell schedule is that’s all they are.
Typically, that trigger is a commit to Source Control, so when I commit to Source Control, do some action, the action is build my database, run my tests. Something a bit like this. How do they work under the hood? Typically you have a server component, and you have agents. So the server is where you configure the whole thing, then the jobs get run on agents, and that way the thing can parallelize, you can say I want to set up these agents for my database deployments, I want to set up these agents for my .NET deployments for my .NET builds, you might have different requirements. So for example, for a SQL Server build, you might need to install SQL Server, or at least local DB on the agent just so you can actually build the database. And you might want lots and lots of them for example, if you have a team of a thousand people in who are committing every five seconds and you got a lot of work to churn through, you might need lots and lots of agents.
So now let’s go back to my demo scenario, and take a look at the example I’m using. Now, I’m using TeamCity. TeamCity is an example of a Continuous Integration or a build server. It is not a – I’m not necessarily promoting it over the others, but I had to pick one for this demonstration. It’s a commercial tool by JetBrains. So what’s happening here is I’ve got various different projects for the various different projects, and I’ve got to the various different bits of codes that I’ve got set up, so I’ve got a Redgate tool belt project, and I’ve got a Flyway project, so let’s take a look at the Flyway one first. What’s going to happen here, if I go to the settings – actually no, edit configuration settings, that’s what I’m after.
If you look on the left hand side here, you’ll see the three most important things of any build server. Where is my source code, what am I going to do with it, and when am I going to do it? That’s fundamentally what every CI server is going to do for you. So my version control settings, I’ve just given it the link to my Git repository, my build steps are the actions that I’m going to carry out, so that’s what I’m going to look at because that’s a lot more interesting, and triggers, I mean, that’s normally dead simple, you just go into triggers and you just tell it add a new trigger based on whatever. I’ve just added a VCS trigger because every time I commit to Source Control I want to run a build. So that’s just reminded me, I need to do a Git push on that. So the code I added, I didn’t actually commit that to Source Control, I just added it to the directory, so let me just navigate up. Git add all, Git commit, some comment and now I need to escape from Vim – let’s see if I can remember how to do that, and Git push.
Alright, so those of you who know Git, well done, you probably followed that. Those of you that don’t know how to use Git, I refer you to my previous slide about Source Control systems. So now I’ve committed that to Source Control, what you should see is a pending change and some action is about to happen. I’ve not shown you what that action is yet, so let’s just refresh this, let’s see this start working. Any moment it should start working, so we should see this turning over any moment. There we are, it started doing a job, so let’s go and take a look at what’s actually happening there. So in the build steps, first of all I’m doing a new get pack, so there are various different build steps that we can add that do various different things for example, run a PowerShell script, do a Redgate thing, do a command line thing, do MS build, so I’m just running new get pack, I’m pointing it at the new spec file in my Flyway repository, which if I come back to here, you’ll see there’s a new spec file here.
So I’m just packing up the whole thing, and giving the new get package a version number. A new get package for those that don’t know, is basically a fancy zip file with a manifest, which allows you to – if I just download it quickly, open, run – so I’m just opening it up with a tool called new get package explorer, which just is also open source and free download, and you can see that basically the zip file with a version number 1.9, because it was the ninth build, it’s got information about who owns it. You can add, dependency information in there, you can add all sorts of good mesh data, licensing information, so that’s all my code. That’s all I need to run.
Of course if I was going to improve on this I probably wouldn’t actually put the executable in there, I would probably have the executable saved somewhere else, because that makes my packages really big and it’s probably not a good idea to store my executables in Git, but it works a very quick hack to get the demo working. So that’s all packaged together. Now what does it do next? If I go to edit the settings again, we are then just going to run a command line step, so I’m just going to run that Flyway migrate command with the migrate and I’m just going to point it. This time I’m going to use the test database, Flyway test on the Dev instance, username and password, you could use Windows Auth but I was putting this together in a hurry, and a result of that is my code will be successfully deployed to the test environment.
So if I go to tables and look in the – first of all let’s look in schema version, let’s do a select up to a thousand rows you’ll notice that this now thinks it’s up to version two, and if I take a look at the kitten trainer’s table, you’ll see that we now have accept tiger cubs inside. So all we’ve done, I’ve committed to Source Control, we’ve compiled the code, we’ve created a nice package that I can now pass over to release management process, and we’ve build the database. We’ve deployed it on some test or integration database, so every time I commit to Source Control, that’s going to happen. If it should fail, I’ll be notified about it, so I can go back to the dashboard and if I just click on this project for example, I can see a list of whether it succeeded or passed or failed every single time. I can see the changed associated with that, apparently I typo-ed when I tried to write the word sum.
So this now means that every time I’m making a change, I’m getting feedback, not just on whether the commit worked, but on whether the actual source code, whether that can actually be used to deploy a database, whether my database is deployable. That’s a really good first step. Now I’m not necessarily saying that I’m happy to send this to production, all I’m saying is I can deploy it. I’m not saying the stored procedure that I added for example, had I added a stored procedure, I’m not saying that that stored procedure works. What I’m saying is I can deploy it, and that’s the first step. For completion, let’s also take a look at what happens with the Redgate SQL Source Control build. Now, this was set to run automatically from Source Control as well, so that ran 16 minutes ago because that’s when I did the first demo using Redgate SQL Source Control. And if I take a look at the build steps here, you’ll notice that Redgate have been quite kind, they’ve provided various plugins that we can use to provide the information about where we want to build this database and I’m not going to get into too much detail here because as I said, I don’t want to get into too much detail, but I just want to show you that there are multiple ways in which you can do this. You can also do the same thing from PowerShell, because a lot of the Redgate stuff is all just PowerShell command, that’s what you have access to, and the build steps are just built on top of those PowerShell command lines. If I was building an SSDT project, I can just run that off MS build and I can deploy it using a command line took called SQL Package. If I was using ReadyRoll, I do actually have a ReadyRoll demo here, that packages stuff is a new get package as well, and ReadyRoll does a kind of combination of kind of migrations and deploy my stored procedures again at the end. So, I’m not trying to promote one tool over another here, I’m just trying to show you lots of different options, but the same principle applies. I commit my stuff to Source Control, and then automatically it’s going to build and deploy that database somewhere, and compile it.
A lot of the time people like to build a brand new database, deploy all their stuff and throw it away, other people like to have a known integration database that they deploy their code to. Both are valuable tests, sometimes it’s good to do both of those things just to make sure that you can build your database from scratch and you can also deploy your database to an existing database, because they’re two slightly different tests. Both of those things are valuable to do.
Let’s jump back to the slides. So we’ve talked about Source Control briefly, we’ve talked about the code we put into Source Control, either a model or a migrations format. We’ve talked about automating those builds, but what we’ve not talked about is actually testing whether they work. Again, this is something that James is going to go into in a bit more detail, so consider this more of a primer session. I just want to give you an idea of what unit testing is and why we should do it.
So, if anybody has every worked on a project where it’s a big legacy system for example and you change something over here and something over there breaks unexpectedly, maybe you know, just maybe you didn’t. If anybody has ever worked on a project a bit like that, they’ll understand the value in unit testing, so what we’re doing with unit testing is we’re trying to take the smallest unit of code that we possibly can and check that it does what it says on the tin. If I run this stored procedure with this input, do I get the expected output that I’m expecting? And based on that kind of input output, we can provide rules, is the output the same as x, is it the same as the expected thing that I was looking for or is it different? If it’s the same, good, it’s a pass, if it’s not then we’re going to fail. So let me show you an example.
So tSQLt is an example of a unit testing framework, as with CI servers, as with Source Control tools, for databases, there are many different CI systems – there are many different unit testing systems for databases that are available. There’s DBFit, there’s DBTST, there’s the stuff that comes from Visual Studio, there’s tSQLt, which is open source. In my experience, tSQLt is the most commonly used and most popular. It’s not without its flaws, it’s open source, there are some bits and pieces that aren’t perfect about it. My response to that generally is the reason we don’t have as wide and as mature a set of testing tools c# and Java developers do is because c# and Java developers care about it, and in my experience, most database people don’t care about it as much as they should. So how tSQLt works is you add a test as a stored procedure on the database, actually, the entire unit testing framework is a whole bunch of objects that get added to the database as well. That’s something that is a bit controversial, some people don’t mind it, some people hate it. Obviously you would not deploy these to production, they would only live on your development machines, so when you’re in deployment process, you need to find some way to strip those out to make sure that the tSQLt objects exist in Source Control but do not get deployed to production. So applying some sort of filter would be important here.
So the makeup of a unit test typically exists in three parts, so it’s called assemble, act, assert. Assemble is we’re going to get the database into the scenario that we want to get it into, and there’s a really cool function in tSQLt called tSQLt fake table, which helps with that. Before I explain it, let me explain that when we’re running our tests, what we’re going to do is we’re going to run them inside a transaction, persist the results of running that test, did it work or did it not work, and then at the end we’re going to rollback that transaction. So whatever happens inside this transaction does not get persisted and we end up back in our original state, so we don’t break stuff.
So what happens with tSQLt fake table is we run SP_rename on a given table and give it a random GUID name. So we have the contacts table in this example, and give it a random name. We then create a brand new table in its place with all the same properties the original table had. There are one or two things that are actually missed off so it’s not quite perfect, but it’s pretty close. So we end up with a new contacts table that looks, to all intents and purposes, like the old one, but this time it’s empty of data. Now, why is that important? Well, if I write a test, I want to know exactly what input is going into that and that includes the data inside my tables. That’s because if the results of my test are different, based on different data existing in that table and when I run this test I have different data in this table in reality, what I end up with is a flickering test. A test that passes sometimes and doesn’t pass other times, purely because of the circumstances of what was in that table to begin with, and if my test sometimes passes and sometimes fails, it tells me nothing, so there’s no point having it. What I need is a test that when it passes, it passes, and when it fails, it fails, and that tells me something.
So we’re going to run fake table to make sure we have a clean amount of data in this table. We’re then going to execute the object under test, so in this case it’s the add contact function. What that does is we pass it a couple of parameters and it squirts some data into the contacts table. So if I add full name Alex Yates, email, my email address into this function, what I should end up with is that data existing in the table afterwards. We’ve had assemble, act and finally assert. We’re going to make sure that the data that actually exists in that table is what we expected. So that’s the makeup of a unit test, and I suspect that James will talk you through a couple of examples.
What I wanted to do is show you a very basic example that you can get out of the box with tSQLt, and for simplicity here, I’m going to use a Redgate plugin Redgate SQL test, just because it helps guide new people through SQL tests a little bit more clearly. Everything that I’m about be done can be done in raw SQL and using plain open source tools, so you do not need Redgate, but it does drive a nice UI for it, which just helps us explain what’s happening. So first of all, we need to add a database SQL test, I’m just going to use one of the ones that I’ve not used before, trainers SSDT Dev, so I’m going to add that database. What I’m also going to do is I’m going to add the SQL cop static analysis test. Now, these are really cool. There’s a blogger, Lessthandot, and what he does is he blogs about all sorts of SQL Server worst practices. All sorts of bad things that you can do, all sorts of cardinal sins that you can commit in tSQLt, and what he’s done is he’s added a unit test for each one of those bad practices. So it’s kind of like a bunch of static analysis tests, so this isn’t testing the functionality of your database, what this is doing is testing whether or not you’re guilty of any particular bad practices.
So that’s run successfully, I now have – I’ve now added – if I just go into the SSDT database, then go for example, I look for stored procedures, and I probably need to refresh that, I’m also on the wrong database. Let me just do a refresh and now go to the correct database. You’ll now see a whole bunch of these SQL cop stored procedures that are all unit tests for various bad practices and if I scroll down here, here’s all the tSQLt stuff, which is the actual unit test framework itself. So there’s a lot of stuff that gets added to the database, but it’s all useful stuff in a development domain, not production, because you need to set various things like trustworthy to be on and enable CLRs that you don’t want in production, but it’s perfectly fine in Dev.
So I can now run through this list of tests, and it’s going to run through each of those tests that have been written by Lessthandot and you’ll see a bunch of them fail. So let’s pick one for example, it’s one that’s really easy to demo is stored procedures named SP_. There are no stored procedures on my database called SP_x at the moment, so let’s add one. So I’m going to run a new query, create a procedure, SP_Oops – and I’m going to call it select one. So I’ve now added a stored procedure to the wrong database, let me add it to the correct database. Execute. So now I’ve added that to the database. I’m just going to run this test once more, run, and this time it fails, I get some information up here on the top right hand side, and the procedure failed because there’s a stored procedure called SP_Oops and if you want to know more about why that’s a bad idea, go to this link and I’ll tell you why it’s a bad idea and what you can do to fix it. So you can go to that link and it’ll take you to the blog post all about that.
So if you want to get started with unit testing, this is a really good way to just get started, and I promise you, I’ve never seen anybody pass all of these tests the first time they’ve added them to their database. Not once. So if you think you’re an expert developer or DBA, you’ve got everything set up perfectly, install this and you’ll find out you’re wrong, because your database is not perfect, something will fail, I almost guarantee it. If anybody does run this on their database and they get a full suite of tests, tweet me with a screenshot and let me know all about it, but that’s my challenge to you.
Anyway, I digress. So that’s your overview of kind of what unit testing is. Obviously those static analysis tests that I did, they’re general tests for health and what you should really be doing is testing your actual stored procedures and functions and things to set that they actually do what you think they do. I am tired of people not testing, and this is a blog post that I wrote, it’s gone out on the Twitter feed, if you’re interested, read it. It might be a little bit ranty, I apologize for that. finally, I realize I’m going over time now, so I’m just going to very quickly talk a bit about database drift, and…
Adam Machanic: [inaudible] bit more time.
Alex Yates: How long do I have? I thought it was a one-hour session or is…
Brent Ozar: You have up to 90 minutes if you want it.
Alex Yates: Fine, I planned for 90 minutes and I was over running but it turns out I have 90 minutes. I apologize for that, I probably should have known that. So what is database drift? Database drift is when somebody changes your system outside of your normal processes. The whole idea here is that we’re integrating our changes multiple times per day in Source Control so I make a change in Source Control, so drift is when somebody makes a change outside your process and I’ll remind you of our definition of continuous integration. Every change needs to be integrated in Source Control. So somebody is making a change outside of your process, it’s a problem. I don’t know if any of you work with this guy, but I certainly worked with him before, or her, or whoever it happened to be on that particular occasion. It always makes quotes sound really smart when you put them in quotes and reveal who said it afterwards.
So one of the things about database drift is when making changes directly on production, you’re making the decision that delayed due to poor cycle time is more expensive than the risk of making a mistake. Let me break that down a bit. Poor cycle time. What does cycle time mean? Well, cycle time is kind of a DevOps-y continuous delivery phrase, and the idea is it’s – how long does it take to go from an idea to Dev code to tested code to production to telemetry and monitoring and feedback on whether or not that idea worked. So if I want to add some new feature for my customer, what’s the smallest amount of work I can do, or maybe it’s a fix? If I want to implement a quick fix, how quickly can I make the fix, add it to Source Control, run my tests on it, deploy it to production and then find out from production, get some telemetry about whether or not that’s worked, how long does that take. And if going through my normal processes would take longer than a few minutes, I need to start thinking about how urgent is this fix.
If my cycle time – typically when I ask this at a SQL Saturday or something, I’ll get answers that vary from two weeks to a couple of months. That’s probably – 90% of people fit in that category. There’s always one smug person in the room that says five minutes, but the rule normally in my experience is between two weeks and a couple of months. Some people are a lot longer. So if let’s say it takes you a couple of months to go from development to production, if it takes you a couple of months and you’ve got an issue on production that’s costing you a zillion dollars a day, then I’m sorry, you’re going to make the change on production, because otherwise, the company goes bust. Otherwise, a customer is really upset with you, and you’ve lost face. You’ve possibly lost a customer. That’s a big deal. The other thing to say about this is who’s good at making this calculation in their head? Who is good at working out what the actual risk of making a mistake is when I execute this query? I don’t know about you, but I’m rubbish. I always think it’s going to work and then I’m always surprised when it doesn’t, and then I realize why it’s not worked, I’m like, oh yes, should have thought of that.
Adam Machanic: I just don’t make mistakes, that’s the simple solution to this so…
Alex Yates: In that case, you are a far better DBA than I am, and I personally…
Adam Machanic: You just admitted what no one else would ever admit, come on.
Alex Yates: I personally make mistakes, I make them all the time, and that’s why I want to implement processes like this to guard me from myself. So first of all, you need to know, understand the cost of the risk that you’re going to make a mistake and what the result of that is going to be. You also need to be good at estimating the cost the business is actually facing as a result of this bug. We all have a tendency to – whenever we see a problem on production, we all have a tendency to get very, very panicky, even if it’s not actually that serious, but we have to fix this right now. So it’s very difficult to carry out this calculation in our head, especially if we’re the person that’s responsible for looking after production and making sure that it works perfectly all the time. We have a bias towards wanting to fix it as quickly as we can.
The other thing to say is what we’re missing out here is the inefficiencies in our process, the fact that we’re basically screwing our continuous integration process by making the change quickly. Because the risk of making a mistake isn’t just me executing the code right now, it’s those developers that have a particular view of the world. They see their development database and they’re testing against their development database, they’re not testing against this hot fix change that I’ve made on production, so they’re not testing against it, which means it’s untested code, which means when you deploy it to production, good luck. You’ve not tested it, you’ve got no real way to know what the developers have done is going to match what you’ve done, and all of a sudden we end up back where we started, it works on my machine; which is what we want to avoid. By the way, that was me that said that. So the bad stuff that can happen, let’s assume that you made your hot fix throughout your own production and let’s assume that it’s worked, and you save the day; well done you. But there are things that could go wrong. First of all, there is a risk that if you’re using a model based approach, if you compare what you have in Source Control to what’s on production, it’s going to make production look like what’s in Source Control.
So if your hot fix is not in Source Control, goodbye hot fix. If it was important enough to go outside your normal processes to fix it, you probably don’t want it to be blown away. The other mistake is a failed deployment. Let’s say is the deployment ready? And then somebody has added it to Source Control and you deploy it again, that can potentially fail, so you end up with a failed deployment. That’s not a fun day in the office. Or in many ways even worse one is the deployment might work, and then you end up – because you’ve not got the code in Source Control, the migration script runs anyway, we’re still fine, but we now have environment inconsistency. Our production environment is different from what’s in Source Control, and sooner or later there’s going to be a mistake, sooner or later there’s going to be a problem, and this is when we end up with development environments that feel like the wild west. They’re miles away from anything that’s on production and once we lose confidence in our development environment and our production environment actually are in sync with each other, then we have problems.
So what can we do? Well, the first step I think is to monitor for it. You can do it for free by – this is a great blog post by Phil Factor about how to use DDL figures and things to fire off alerts whenever something changes on the database, and there’s also a free tool from Redgate that you can use as well. So my point about drift is monitor for it, be aware of it, and rather than trying to battle down – lock down all the hatches and stop anybody getting in to production, which you probably should do, but rather than that being the primary focus, the primary focus should be setting up proper Source Control, build, and deployment techniques we’re not really talked about today. Because if you can get to a point where you can commit a code to Source Control, run your build, run your tests on it and then deploy, first of all, you’re less likely to make a mistake, but second of all, you’re less likely to need to make the change throughout the production anyway. If your cycle time is measured in minutes, why do you need to make the change throughout the production. So that’s my piece of advice.
So I’m going to finish with some final pieces of advice. It’s teamwork above tools and processes? So I have shown you some demos today of some particular tools, but I’ve mainly been trying to show you the different ways of working and the different patterns. So remember what we’re trying to do. We’re trying to continuously integrate our code, we’re trying to avoid two different developers or two different teams or two different silos working in different ways and going down different paths and then having to integrate it back together later. We’re trying to continuously integrate our code. Every time I make a change, I commit it to my central Source Control repository, add all the tests that run on it and I know as soon as I’ve committed, whether or not I’ve made a mistake. Because it’s a lot cheaper to fix a problem if I find out about it immediately after I’ve written it, while I still have the problem up here, before I context switch to something else or before it gets forgotten and three years later some customer submits a bug report, some new person on the support desk writes a ticket badly, because they’re always written badly. And then it ends up getting triaged on some backlog when some project manager has to decide how important it is to fix and then it gets given to a tester who – the person who wrote the code left the company three years ago so it takes them forever to actually understand what’s going on, and in the end it just never gets fixed and your customer’s annoyed. So it’s much quicker to find that problem immediately after you’ve written the code and immediately after you’ve introduced a bug.
So do focus on the overall guiding principle, the overall idea as your number one thing. The exact implementation comes second. Don’t try and do all the things at once. I mean, I’ve talked about a lot of things here, unit testing, Source Control, releasing, automating your builds, drift detection. Start with Source Control. You have to start with Source Control. If you don’t have a workable Source Control process, start with that first, because there’s no point trying to set up any automated builds until you can say with confidence what version x means and what version x looks like.
Adam Machanic: Hey Alex, we get a lot of customers who either for some reason decide either our stored procedures don’t need to be in Source Control but our tables do, or the other way around. Can you comment on that, please?
Alex Yates: They’re wrong. I’m sorry to say it, they’re wrong. Everything needs to be in Source Control. The stored procedures don’t mean anything without the tables, and the stored procedures can’t be tested if they’re not in Source Control. Everything should be in Source Control. You should be able to reproduce your database and your environment from Source Control. If you cannot produce a copy of your database entirely from Source Control, you’re doing it wrong, because it’s by putting things in Source Control that we can start putting proper tests against them, and we can start managing our database life cycle much, much more effectively. So please, everything goes in Source Control and that’s the number one…
Adam Machanic: Everything? 100%
Alex Yates: Absolutely everything. If you can script it, it goes in Source Control. Including server configuration as well, stick that in Source Control. Have a script that spins up the server from scratch, because then it makes it much easier for you to spin up entire environments on demand.
Adam Machanic: How about migration scripts?
Alex Yates: Well, that kind of comes into the model versus migration thing, and there is an interesting debate about do you do both, is both an option. And that’s an interesting one, because in a sense, your Source Control has to be your source of truth. If you have two different versions of it, a model and a migration, which one is the truth? If you end up making a mistake at some point, which is the truth? So things can get a bit muddy there. If you use a model based approach – obviously if you’re doing migration, your migration scripts go in Source Control. If you’re using a model based approach and by default you use software to work out your upgrade script, let’s say the software hasn’t worked, so you need to actually manually intervene at some point. At that point, that becomes kind of a mini development exercise in its own right. How can I – obviously the auto-generated code hasn’t worked because let’s say I’ve been splitting a column. Let’s say I’ve got a full name column, I need to split that into first name, last name and as an example, the process there that needs to happen is I need to create the new column, first name and last name, I need to look at the data that I have and work out how to split it. The obvious answer is find a space and take what’s before and stick that in first name and take what’s after and stick it into last name, and it would be wonderful if our data was that consistent.
In actual fact, you might have middle names in there, you might have prefixes in there, you might have names that don’t conform to Western standards, you might have double barreled names, so it takes a human being to understand the data, and work out the most appropriate way to move that data across, and it might not be quite that simple. So there might be several iterations that are trying to create a script. For example, just split a full name column into a first name, last name column. And that in itself might be a mini development exercise where you write it once, then you test it and then you write it again and you test it again. So maybe yes, if you’ve got something complicated like that, maybe that is a Source Control process that you need to go through and I’m not necessarily going to recommend a particular implementation for it right now because I probably need to know the specifics about your organization before I did that and exactly what you’re trying to achieve and – but in principle, keeping – if you’ve got a migration script like that that’s going to take several attempts to get it right and tested, then yes, why not stick that in Source Control.
Adam Machanic: And would you put reference data in that same category?
Alex Yates: Yes, so reference data – there are various different shades of reference data. So there are some organizations who have look up tables. Less than a thousand rows, product ID codes, country codes, things like that. Yes, that should go in Source Control as well. Now, all the different database tools have their own different ways of doing it, so with SSDT, you’re typically using merge statements and pre and post deployment scripts. With the Redgate tools typically it’s kind of right click add this table to Source Control thing, with migrations tools, you just need to add a migration script every time you modify the data in one of your reference tables. So there are different approaches depending on the tooling that you use, but those tables, it should be treated as schema in a sense. The tables where the data is maintained by developers, as opposed to data that just lives in the wild, yes that should go into Source Control as well, and you should find a way that is appropriate for your process to get that there.
There are also more complicated scenarios such as seed data, where you have the first x number of rows the same, and then after that it’s different. That gets a bit more complicated, there aren’t very many tools out there that are very good at handling that, so often that tends to be handled with migrations. People might just have a small migrations project using Flyway or DBR for local base or one of the other ones. They might just have that just for the data, if they’ve got something complicated like that.
Another case to talk about is where these reference tables are massive. You don’t really want to put a million rows of data into Source Control because that’s going to blow up your Source Control system, so there’s kind of a sensible limit as the amount of code you want to stick into Source Control. If you need a full on SQL Server database to hold that much data, then, probably, sticking it in Git is a bad idea. But as I say, if you’re talking about less than a thousand rows, it should be fine.
So got a bit side tracked there, but general piece of advice, it’s teamwork and the concept over tools and the exact implementation. Don’t try and do everything at once. Start with Source Control and looking for drift and just being aware of when it’s happening and dealing with it appropriately. By the way, when you do get drift, when a production database does drift or maybe it’s a test database or something else, you should immediately do one of two things. If you want to keep it, stick it in Source Control. If you don’t want to keep it, roll it back. You should immediately do one of those two things as soon as you see some drift has occurred.
Once you have got this whole process up and running, don’t forget testing. There are not enough people out there who are properly investing in writing unit tests for their database. TDD absolutely applies to databases, it’s a cultural shift because it’s one of these move slow to move quickly things, hare and tortoise type logic going on. You need to write the test first and then do your code and then you run your tests after. Go and read a book on it, but testing is really, really important and we don’t do it well enough by and large in the SQL Server community. [inaudible] databases from other systems. So one of the things that’s really hard about continuous integration is when you’ve got lots of databases that depend on each other with circular references and complicated web dependencies. One of the reasons that’s hard is because from a Source Control point of view, you want to be able to spin up a database of this version or that version, especially from a QA point of view, that should be a fairly easy task to do. Take this databases, build it. Definitely from a continuous integration task, I want to be taking this database and building it. And if in order to do that you need to build a server with a hundred different databases with a completely complicated web of dependencies, that makes the whole ordering and the provisioning of that server really, really, really complicated and difficult. So do try and avoid it.
If you do already have a complicated web of dependencies, can you draw a diagram for which database depends on which one and understand the order? If so, print it out, stick it up everywhere and make sure nobody makes that more complicated or worse because at that point, the only way to really do continuous integration is to have an integration server where all of the dependencies exist. At that point, you can just build databases there and all the dependencies are already there, but it does make things more complicated.
Another thing, people see the benefits of continuous integration, testing, validation, the agility that it allows us to work with. People see the benefits of that and they think, that would be really great for my legacy complicated database project. If I’ve got a really complicated legacy database that’s 25 years old, that the people wrote it originally don’t even exist anymore, when you’ve got one of those really hard systems where you blow on it in the wrong way and it breaks, a lot of people see the benefits of continuous integration there. The problem is, if they’re simultaneously going to learn how to use Source Control systems and build servers and to build their database and scripts, if they’re going to do that at the same time as dealing with all the legacy crud on their actual database, very quickly that project falls down and it doesn’t work because you’re trying to do too much at once.
What’s far more effective is next time you have a Greenfield project come along, even though you don’t feel you need it there because you’ve not deployed to production yet and so it’s easy to make changes quickly because you wrote the code yourself and you’re the best developer, everybody else is rubbish – even in that scenario, start on the easy stuff first because it’s going to be a learning process to get your head around things like Git and Jenkins and whatever else you decide to use. So start with an easy project, get your head around how the process works, how it looks, sensible ways to apply it within the context of your organization, and once you’ve got your head around that, then go and tackle the legacy beast. Because by that point you’ll already have got your head around how continuous integration works before you start applying these challenges to the big legacy database. And finally, use the right processes for your database and your team. What I’m really saying there is pick the right model or migrations approach or hybrid, as James will show you next week.
Finally, yes, ask me for help if you want any of my help. The guys at Farm Credit made a big difference when they rolled us out there. It was a long project, they had some big complicated databases and some big complicated teams to work with, so this wasn’t an overnight job, it took them some time, but the people in this photo have kind of got a reputation now for being the ones that helped to turn that around. They went from a time when their rate of failure was very, very high and now it’s drastically lower. The number of failed deployments is significantly lower now because they know by the time they get to deployment time, it’s been built, it’s been tested, they know which builds are the good ones and which are the bad ones, and so their rate of failure is much, much lower.
So that has been very good for all of the people in that photo. All of them have gone on to great things. It was also the inspiration behind, partly, the company that I set up, DML Consultants, and also the Kickstarter package that we now offer. So I’d like to sum up by just talking about a few of the things that we’ve learned through this whole ordeal. Siloed sparrows such at DLM, heated hippos are close minded, clever people consider options, use appropriate tools and only hire Devs who write tests. Finally, they make some pain on that smug face, and this stuff does make a really big difference. All of my references are available at DMLConsultants.com/CI. If you’d like to work with me please do get in touch and I’ve got a few moments now for any questions.
Brent Ozar: Nicely done sir, nicely done.
Adam Machanic: I actually have one question. Is there such a thing as a purely model-based approach and what I’m thinking about really is changes that require data to migrate, which is always a difficult area with something like SSDT. So like I want to add a new column and in some cases I want to set it to one and then in some other cases I want to set it to two and a few other cases I want to set it to three and I have to write a script for that, and I can’t really model that.
Alex Yates: So exactly. So the problem with model is there are some scenarios that it just doesn’t work with, because model does not handle the transition, it relies on software to do that, and so all of the model-based tools, whether it’s Redgate, whether it’s SSDT, whether it’s something else, if they’re viable, they do have some story about how to handle that. So in SSDT, you’ve got pre and post deployment scripts, you’ve got the refactor log, with Redgate you’ve got migrations scripts, with ReadyRoll you’ve kind of got a hybrid approach.
Adam Machanic: [crosstalk] … pre and post migration script even help with that though, because what you need is – you need to add the column, then make some changes – let’s say you want to set it at NOTNULL. That’s an even better scenario, so you want to add it at null, make some changes then set it to NOTNULL, you can’t do that pre-deployment, you can’t do that post-deployment, you’re in a weird…
Alex Yates: What you would do in that scenario is your pre-deployment script adds a new column as NULLable, mince out the data, your pre-deploy script, you then make it NOTNULLable, and then you let the software work out the rest of the other stuff you’ve got going on. But of course if that’s in a pre-deployment script in SSDT, which is the same pre-deployment script that gets round every time, so you need to wrap it up in guard clauses, if statements, if this column – if this column does not exist, then do this, and you end up with this massive, massive, massive thing, and eventually you want to go and clean it up and delete some things from it, but then it depends on how many target environments you have. Because let’s say for example you build a database that is then deployed out to – is packaged up into some software that your customers download, is executable or something, then how do you know what version all of your customers are on, so you kind of have to keep it there indefinitely. So over time your pre-deployment script just grows and grows and grows and grows.
So yes, the model-based approaches don’t solve that problem well. On the other hand, they do solve a lot of the problems associated with database development and dedicated large teams very, very neatly. The migrations tools do not solve those problems very well the moment you start to scale. If you’re a small team, if you’re one developer, migrations is fantastic. The moment you start having multiple teams in multiple locations working on different streams of work, then that becomes really bad. So I guess in one sense what I’m saying is, both of them suck. You just have to pick the approach that sucks less, and the one that is more appropriate to the problems that you have and more appropriate to the problems that you’re better able to mitigate. And do take a look at the hybrid options because the hybrid options do – well, they’re a double-edged sword as well, because on the one hand, they solve a lot of problems, but on the other hand they’re more complicated scenarios, so it’s not very in keeping with keep it simple stupid. So you end up with different problems for that reason, so yes, pick your poison.
Brent Ozar: Peter Schatts asks, “Is there such a thing as writing tSQL unit test when you don’t use stored procs?”
Alex Yates: So I heard someone make the case that if you have a WHERE clause in the view, you should have a unit test for it, so it is an open debate at the moment. I’m not pretending I have all the answers, I’m not pretending to be such an expert that I know all the answers at all. I don’t have all the answers, but I do know that applying the principles in this presentation and doing what’s right for your organization makes a lot of sense. If you don’t have any stored procedures do you have any functions, is there any other logic in your database anywhere? Primarily it is those logical parts of the database that we’re talking about. Primarily when we’re talking about unit testing it’s kind of what code exists in your database beyond just the columns and the tables.
Brent Ozar: Andy Leonard says, “When you say pick the one that sucks less, are you talking about database DevOps versus SSIS DevOps or…?”
Alex Yates: I think possibly Andy is a better person to [inaudible] than I am. I am not an expert in the BI world. I know that it’s a lot more painful than the SQL Server world – than the relational database part of SQL Server world, and so I would not want to comment on that. I would ask Andy.
Brent Ozar: Switched over just to show Slack, just since there were a few people talking about it SSIS is hard, difficult. Peter follows up with – he says, “We do have views but they’re not pretty, at least they’re not too deeply nested.” Alright, well nice session today Alex, thank you very much for hanging out and presenting from Germany today and go enjoy yourself a few beers and thank you, Adam, for co-hosting too as well.
Adam Machanic: Thank you, Brent.
Alex Yates: I felt you were quite kind to me today, thank you very much.
Latest posts by Alex Yates (see all)
- Getting CI right for SQL Server - June 2, 2017
- DevOps 101 for data professionals – how your jobs will change - January 20, 2017