Enjoy the Podcast?
SQL Server Reporting Services developers who are unable to change the structures of their SQL source data but still want their SSRS reports to run blazing fast.
SQL Server Reporting Services (SSRS) is an easy-to-use tool for automating reports and creating highly visual dashboards. Although SSRS is easy to learn there are many tips and tricks that can improve your report building experience, not to mention make your reports run blazing fast!
This rapid-fire session goes over my learnings from the past six years of developing high-performance SSRS reports, including topics like multivalue parameter efficiencies, how to best utilize subreports, and performing SQL CRUD operations with SSRS.
Each rapid-fire topic includes sample data and an SSRS reporting example that users will be able to try out for themselves.
Why I Want to Present This Session:
For years, SSRS was the only tool I had available for presenting automated reporting to customers. I quickly hit the limitations of the software and needed to push SSRS to its limits so it would do what I needed it to do.
I have presented this topic many times before and I am excited to be able to share this knowledge now with a wider audience.
Slides, demos, scripts can be found at: https://bertwagner.com/presentations/
Brent Ozar: In this session at GroupBy, Bert Wagner is going to be talking about high performance SSRS. So Bert, take it away.
Bert Wagner: Great. Thanks, Brent. I’m excited to present this high-performance SSRS. In summary, this is what we’re dealing with in the presentation today. It’s this. Hopefully, many of you are familiar with it.
If you’re not, maybe if you’re on an older version, you’re more familiar with that. But this, these guys here, are our enemies. So when talking about performance, if you’re not familiar with these screens, these are the SSRS loading icon spinners that you see as you wait forever for a slow report to load. The reason we want to focus on performance, I like to break it down just by, in terms of who your audience is.
For me as a developer, performance is huge because I want to build something that’s going to work today, and I want to build something that’s going to still work in the future. Otherwise, I don’t want that technical debt of, as soon as we get more users or something like that, the product breaks down and then I have to fix it again. That’s not fun. From a DBA standpoint, I’m not a DBA, but I sure know DBAs who hate getting tickets cut, saying that CPU or memory usage is high, temp DB is filling up, because you’re doing all types of crazy sorting and other actions that you’re not supposed to. So, part of performance tuning will be that.
Then, I guess most importantly, if you think the customer is always right, think about your users. I’m sure many of us can sympathize with if we go to a website or something, and it’s taking too long to load, I have a tendency to just exit out of it. And sometimes you can’t if you need to access something. But having to wait a long time really stinks. And I think earlier today—who was it, the Stack Overflow?
Brent Ozar: Greg.
Bert Wagner: Greg, yeah. He’s talking about 10, 20 milliseconds. That’s crazy. Full disclaimer: we’re not going to get these reports down to 10 or 20 milliseconds, but we’re going to do our best.
Just a brief background. I’m a business intelligence developer, like I mentioned, at Progressive Insurance. I’m mostly familiar with 2008, 2012 SQL SSRS. I’m more recently getting into 2016. Good news is I’ve tried all these examples out in all the different environments. They all work just the same. The concepts are the same, so it doesn’t really matter what version you are on.
The data that we’re going to be using today is actually the Stack Overflow data dump, which is from you, Brent, from your website there. I’ve linked to that in case you want to download it. I did also include scripts to actually reduce the size of the data because I think all uncompressed it’s around 100 gigs, and my laptop cannot spare that much space. So if you are like me on a small machine, you’ll be able to use those scripts that reduce the data a little bit as well.
If anyone is not familiar with the Stack Overflow database, if you missed the presentation earlier, or if you’ve just never used Stack Overflow, it’s a question-answer forum. Essentially, users ask questions. Other users, like probably many of you watching, respond to those questions as answers, and then the best answers get voted on. You gain reputation and points and things like that. Hopefully, that’ll make sense in the demos. All the code, I’ve already posted it, both on the GroupBy session website and on my personal website. So all the slides, the demos, the scripts that we’re looking at, they’re all on there, so you’ll be able to download and play with those yourself.
Just a brief look at what we’re going to be talking about today. We’ll start more generally at a high level, and then as we go through the presentation, we’ll look at more specific use cases that I run into quite a bit. There’s some really interesting things there. Anything with an underline is going to be a demo, so we’re going to be jumping around between Visual Studio, SQL Server Management Studio, a lot today. And that’s it, so let’s just dive right in.
The very first thing I want to talk about is just SSRS usage data. SSRS has data logging that it does for you. This is important to be aware of because if you’re trying to tune performance you need a baseline. You need to know, what are you measuring? Are the changes you’re doing improving the performance? Are they hurting it? The only way to know that is by taking a look at some of the statistics on how long the reports run. Fortunately for us, that’s built into SSRS in 2016. It’s nice, there’s a view that just shows you this relevant data. In older versions, sometimes there’s an SSIS package you have to run to create the reporting structure for that data. But the key things, the key metrics we’re going to look at today are these time retrievals. It’s in the screenshot at the top of the screen there. Time data retrieval, that’s just referring to how long it takes for the data to get from your data source. So in most cases, especially today, it’s SQL server. How long it takes to get from the SQL server to the reporting server.
Time processing is taking a look at how long is your SSRS report processing that data. So if there’s anything happening in the report where it’s running expressions, or if it’s sorting your data, aggregating your data, that’s where all the time processing goes. That’s how that gets counted. Then finally, time rendering is how long it takes to output that data onto your screen, whether it’s via the HTML Report Viewer, or it’s an Excel file, CSV, whatever you’re outputting your data as.
So, with those three metrics, we’ll be able to get a pretty good idea of how our reports run. Those by no means are the only metrics in this SSRS usage data. There’s plenty others. I’ve included a link to the Microsoft site that kind of explains them all. But if you’ve never looked at them, there’s some really good ones in there, with memory usage and report size. It’s really nice to be able to go look and measure how poorly your reports are running.
We’ll take a look at that table in just a second in our first example. I won’t be going through the usage data for every single report, because we would just run out of time. But feel free afterwards to go through this report, the demo reports, and take a look at that data for yourself, and definitely use it in your own reporting environment. So, briefly covered, and this is where I’m saying I’m probably in Gianluca’s circle of hell.
Sometimes you’ll see a SELECT * in one of my queries. But we want to avoid doing that and it’s for the same reason Gianluca said, “You’re bringing back way more data than you need. If you’re building a report, that’s only displaying five columns of data, your table’s got 40 and you’re saying SELECT *, that’s a lot of extra data that’s getting sent across the network. Assuming you’re on different servers, you don’t need that.” Things like just performance on the SQL server itself. Your indexing will suffer, you’ll do more scanning because probably your tables are not built to bring back all the columns all the time.
Then just from a more maintainability aspect, if you say SELECT * and your table has 40 columns today and in the future you add five more columns, there might be some things that break, because you’re now bringing back more columns than a report expected. That’s true of everything, it’s not just reporting services. If you’re writing queries for any other kind of applications, just don’t use SELECT * it’s a bad idea. Also, it doesn’t really tell developers of intentions, so if someone is inheriting this report from you later down the line, or even if you look at the report three years down the road, you’re not really helping yourself by not providing information and contexts about what data is important and why you’re selecting the SELECT * on its own doesn’t do a great job at explaining that. So we don’t want to use it, that will definitely help performance.
Traffic and specialization. What this boils down to is using the right tool for the job. So we’ll jump into Visual Studio here in a second for the demo, but essentially let’s say, in this example we’re going to be creating a report to help us get better at answering questions on Stack Overflow. Like I said, if you answer questions you build reputation points. So we want to be able to answer questions that are posted that don’t have answers for them yet. So we’re going to try to build a report of questions that we might be able to answer that don’t have good answers yet, so that we can get all the points. So let’s just go ahead and jump into the demo here.
By the way, here is my SSRS solution. All of the report names have our prefix with a number, those correspond to the numbers and the slides. Pretty obvious, but I figured I’ll mention it. If we take a look at this report, and we’ll run it in a second here. But let’s look at the dataset first. It’s really basic, select some columns from this post table. There’s no filtering going on, nothing happening in the query, we’re just doing our select from. We’ll take a look at another example of why maybe there’s a better way of doing this. But our dataset here, our filtering is actually happening in this filter’s section in the dataset.
You can see we’re filtering on things like accepted answer ID equals zero, so there’s no accepted answer yet. Answer count is less than or equal to three, so there’s not a lot of answers yet for the particular question, and in the tags we’re saying, okay, there should be a SQL server tag because that’s what I’m familiar with. Those are the kinds of questions I want to answer. So by putting all this information here at the dataset level, our query will run on a SQL Server, but then the filtering is going to happen in the dataset in SSRS.
We also, just for fun, since we want to sort the questions by timestamps in my tablix properties here, I decided to do the sorting here in SSRS. Some of you have used SSRS in the past and are familiar with this. If we go to run this now, you might already know what’s going to happen. But if I run this report, you’ll see, there’s our enemy, our loading icon. This should take a little bit of time here. What’s happening is SQL server is just running that basic SELECT FROM, selecting I think a million rows data from the table moving it to the SSRS server. Then SSRS is actually filtering and sorting all that data based on those predicates that we provided.
Now, SSRS is great at building reports, it’s great at making it easy to build reports. I would argue there’s no easier way of building reports as quickly and with such flexibility as SSRS. But it is not good at manipulating large amounts of data. That’s not what it’s built for. SSRS is built for reporting and displaying data. SQL Server on the other hand, relational database management system, is excellent at handling large amounts of data. It has plenty of efficiencies that are built into it to manage that data and to do those large operations like filtering a million rows of data down to only the ones that are relevant, especially using other things like indexes to help it perform even faster.
So in this report, we’re really not using any of those advantages that SQL Server offers us. We’re just saying SELECT FROM, give me everything, give me a million rows from that table and bring it back into SSRS which will do the filtering and finally render our report. So, the report finally opened, I kind of killed some time by telling you what was happening, but if I was sitting there waiting, I’d be really bummed out about that performance.
If we actually go the execution log view here, and we just run a query. I’m just filtering it on the last ten minutes of data because we don’t need to see all the junk that was in there before. We’ll see here is our report that ran and our three columns of interest are right here. So these are in milliseconds. We could see, okay, it took two and a half seconds about to get the data from our SQL Server to our SSRS server.
Not too bad I guess, but our time processing. This is where SSRS is filtering on those three predicates that we supplied, this is where SSRS is sorting the data. That took over 20 seconds, took 22 seconds, and then to finally render the final report it took another five seconds. So pretty slow, we could see most of our time is spent here in this time processing and that’s in the SSRS server. By far, in terms of performance for SSRS, this time processing column is where you can make the biggest impact. If you’re smart about how you’re designing your reports and building your reports, you can trim this down significantly as we’ll see shortly.
And there’s some other columns like I mentioned like what the report size, how many bytes it is, and how many rows. This is interesting to note because this brought back all million rows to the SSRS report before SSRS filtered it out, so not a great idea.
Our alternative is, we can have another report built. So this is part two here, where we use the right tool for the job. So if we open up this dataset—same that we were doing before—it’s still doing the same exact select from. But you’ll see I’ve added the where and order by clauses here. So we’re doing our filtering on these columns, our accepted answer ID, our answer counter tags, we’re also ordering by its descending create date, that’s all happening on the SQL Server. So before the data even gets sent to us in the SSRS server, all that data is going to be trimmed down. And I think there’s only 4,000 questions that get brought back. So 4,000 rows of data coming back to SSRS is much faster as we’ll see in a second than bringing back a million rows.
You can see I took out from the filters here, there’s no more filtering on the dataset in SSRS because that’s what SSRS is really bad at. I’ve also removed the sorting from the tablix. So if we go back to our report server and we try to take a look here. Let’s open this guy. Still a loading screen. If I have access to the SQL Server, I could obviously add some indexes. I didn’t for this demo, but it could obviously be way faster. But just by removing all that processing from SSRS, you could see this came back significantly faster. Same number of results and everything. But jump to our usage data table, we’ll now see another row. If we look at this top entry here for comparison, so our SQL query took a little bit longer. It was no longer just doing a SELECT FROM, there’s some where statements it had to process. It took about five seconds, but our time processing in SSRS is a quarter of a second basically, significantly fast. The time rendering, significantly faster as well.
And you can see, even though the report size is about the same, the row count is drastically lower. So there’s way less rows going to the SSRS server. So the moral of this story is don’t use the wrong tool for the job, use the right tool for the job. In this case, SSRS is great for displaying data and SQL Server is great for processing it and doing all the heavy lifting.
All right, multivalue parameters. There are a few different ways you can handle multivalue parameters in SSRS. There’s good ways and bad ways, once again. I just want to go over those different ways because they all have their pros and cons and multivalue parameters are something that I use a lot of with my SSRS reports.
If we go ahead and jump back here and open up this report. Let’s just preview this in here. So when you preview it in here, this doesn’t capture the usage statistics but going forward, we won’t be checking back to that table every single time. This report, pretty basic. We have some post types that are coming back from the Stack Overflow database and this is a multivalue parameter. If we just select some more, this report will show same data. It’s just showing us what got filtered from those parameters. Each column here in the report is showing a different method of getting that data.
So, if we go and look at how the report is actually designed, we’ll see we have some datasets. This first one is just for the parameter, the drop-down parameter to get those values from the database. We don’t have to take time to look at that, but our first column here is filtering that data using an expression filter. This will look a little bit familiar to the last demo where we’re doing a select from our post types tables here. Instead of filtering here at the dataset level, we’re actually doing it here on the tablix, which to my knowledge, doesn’t make a huge difference. They’re both just bad. We don’t typically want to do them but here is our expression saying, okay, let’s filter our ID column on this, whatever the parameters that are passed in from the top of the report.
In this case, you saw when we ran the report, it ran fine. It ran fast. I still would rather people didn’t do it this way, but there are some advantages where if you want to use certain expressions to do certain things to that data, it’s a little bit more flexible to filter this way. But if you’re bringing in more than 10 rows, something to really reconsider is, is this what you really want to do? Because performance is really bad as we saw on the last demo.
So, the second method is the query text method. So if we open up this dataset and we take a look at the query text window here, same query, select from, but now we have a where, where we’re filtering on our parameters from our multivalue parameter dropdown. So cool, we took the concept that we learned in the last demo and we’re doing all the heavy lifting on the SQL Server.
Brent Ozar: Kenneth has a good question. He says, “When is it appropriate to have SSRS do the filtering itself instead of passing it to the database?”
Bert Wagner: I mean, it’s really oddball cases. 99.9% of the time, I’d say it’s not worth it. If there’s certain things you want to do maybe with expressions, like if you dive into the tablix, there are some things that you might be able to do a little bit easier if you’re getting to have all the data available to you, it just depends. Sometimes you’ll be constrained with—I guess it depends on your environment. If you are constrained by what you are able to do in your SQL Server, then SSRS might be a place to do it. But really, you shouldn’t do it. It’s not efficient at all.
Brent Ozar: George has one, he must have a bad or a particular environment. George says, “Only when the dataset is derived from a stored procedure and the BI dev can’t touch the stored proc.” So it sounds like he feels that particular pain.
Bert Wagner: Yeah, it happens. Ideally, that’s a great example. If you can’t modify the stored procedure and you’re stuck with the data coming back, then not much you can do.
Brent Ozar: In Slack, Andy Mallen adds one. He says, “A shared dataset with cached results is a great scenario for filters on the reports.” Oh, okay, that makes sense.
Bert Wagner: Cool. Method two then, if we just jump back to that, was the query in the query text editor. So, this is great. Our heavy lifting, so to speak, is being done on the SQL Server, so it should run much faster as we saw in the previous example. But, it kind of stinks that you’re using this query editor window here. I hate this thing, I’m sure many people do too. It’s because it kills your formatting. If you want to add comments, if you want to format your queries, it’s super ugly.
You can format it, you could spend all the time in the world formatting your queries in Management Studio and paste it in here, and then after some point it’s just going to look like garbage again. So from just a maintainability standpoint, I really hate using this window in here. But, I also have to say sometimes I’m really lazy, and it’s just really easy for, if you know a query is quick and easy and simple, it’s much easier just to write your select statement in here, and go with it, and pretend that you’ll re-factor it later and change it to a stored procedure. But, that’s not a good practice. That belongs in the previous presentation of worst practices.
The last dataset here is using our stored procedure. This is doing the same thing as before, the same query, but now we’ve put in a stored procedure. So we’re going to get lots of benefits where in the future we might have some flexibility in terms of what we want to add to that procedure. If we want to add commenting and different things like that, it’s much easier to do in the stored procedure.
If we want to, for example, reuse a stored procedure, if we have a dataset that we want to use on a couple different reports, it’s much easier to centralize that code in a stored procedure than have to copy and paste a query from dataset to dataset among reports. It’s definitely the best option, a little bit more work because you have to do the very laborious task of going to Management Studio and writing ‘create procedure as’ but definitely what you want to strive for.
All right, parameter sniffing. Once again, since there are so many parameters that I could use in SSRS reports, multivalue parameters, I experience parameter sniffing a lot. I know parameter sniffing is a topic that had lots of coverage in other people’s presentations. People do whole presentations on parameter sniffing, so we’re just going to do a very brief overview of it in case someone has never heard of it before. But you basically, you probably have a parameter sniffing problem if you are running a report, and it runs for you in five seconds and it runs great, and then someone else is running that report, and you get an email from them saying, “Hey, why is this taking five minutes to run?” That could be a sign that you’re dealing with a parameter sniffing issue.
Basically, what parameter sniffing is, is that SQL Server’s SQL query engine tries to cache query plans or does cache query plans the first time that they run. So if you have a parameter, it will cache a query plan for the first value that was ever passed into that stored procedure, or query, it doesn’t have to just be stored procedure. But if we’re using stored procedures, like we should, then it’ll be a stored procedure. What then happens is if a different parameter value gets passed in the future, that same initial query plan is what’s going to get ran, even though that query plan might not be optimal for this new parameter value that we’re using. So there’s a couple ways to fix it.
We won’t go through ever single possible way out there, because, one, I don’t know them all, there’s so many. But at least this will give people an idea of how to deal with this problem, how to recognize this problem and how to deal with it. So, if we look at this demo script, I’ve included the execution plan here, the actual execution plan. In this case, let’s say we’re going to be planning a conference in the future and we’re going to use Stack Overflow user locations to determine where’s a big user group of where we want to hold a physical conference.
So I have two queries here. Really basic, they’re doing the same exact thing. They’re pulling from the users table where location equals something. My first query, it’s looking at users that have China set as their location and second query is looking at where users have India set as their location. So, if we run this, we see for our China result set, we have 383 rows and for our India result set, we have 3,000. So, a whole order of magnitude larger in that India value, all the rows make up a much more significant portion of the table. If we look at the execution plans for both of these queries, we’ll see that for the China query the SQL Server is doing an index seek with key lookup and for the India query it’s deciding okay, since it’s a much bigger portion of the table, we’re going to use a clustered index scan to bring back that data, it’ll be more efficient.
So, these are kind of what we are now going to refer to as the most efficient execution plans for these queries, or at least what SQL thinks off the bat. Now, if we parametrize this query as you see here, and we’ll run China and India again, we’ll see we get the same exact result sets, right? Nothing changes but since we’re now using this location parameter, parameter sniffing kicks in. We see that for both of these parameters, for both the China and India queries, they’re both now doing index seeks. Even though we know from above, that the India query is better off doing a clustered index scan.
But because this China query ran first right now, SQL cached the plan and it cached it as index seek. If we put this in a stored procedure, it doesn’t matter if you’re using a query or a stored procedure, nothing is immune. So here we have once again, index seeks for both. But now if we clear out the query plans, and obviously, don’t run any other stuff in production especially from now going forward we’re going to start doing some stuff that could make your DBAs really unhappy or if you are a DBA I guess it’ll still make you really unhappy.
If we get rid of all the query plans and now we run the India parameter first, we’ll see that—at least we should see, never can be sure with a live demo, that the clustered index plan is what got brought back. So since we ran the India parameter this time, the clustered index scan is what got cached by the query engine and so when we ran the China query below, it’s also using the clustered index scan even though we know that that is not the most efficient plan for that parameter. So, what can we do about this? This is just a few things to maybe start out with, not an exhaustive list by any means and there are lots of caveats with using all of these.
So, let’s start with the first one. I see a lot of just, “Oh, let’s throw a with recompile on everything.” Right? Fixes the problem. Awesome. There we go, we got our efficient execution plans once again, our index seek for China and our clustered index scan for India. The problem is, with recompile forces the query engine to try to determine the best—I guess to rerun and determine the optimal execution plan every single time which determining that execution plan is a really expensive process. So, that’s why it gets cached in the first place is so that SQL Server doesn’t have to constantly do it every single time. With recompile, you lose all those savings and SQL is having to rerun that expensive operation to calculate execution plan. So even though it may look like you are getting the more efficient plan here, you are not using the benefits of the server’s caching capabilities for these plans.
Another option though to be aware of is basically this option recompile statement in here. This does the same thing as the with recompile outside of the execute statement but if we put it inside the stored procedure, it’s more centralized. So if we’re calling the stored procedure from all over the place, it’s a shared stored procedure used in many different reports. This is an easier way of fixing it in one place and having the fix be applied everywhere. Fix once again, meaning that you are losing all ability for SQL to cache those execution plans and it’s having to recalculate them every single time.
A third option, which I think is slightly better, is using this optimize for hint, and if we alter our stored procedure to that. What this optimize for does is it says we’re going to create the execution plan. We’re going to cache it for the value that we pass in as the parameter here as part of this hint.
So in this case, I’ve passed China in, which if you remember, that’s the index seek query plan. So if we run this, even though India is being ran first, we’ll see we have both of our index seeks query plans. The way I made this decision is I tried a bunch of different values in here, did some research. Now at the top of this query document—I turned statistics for time and I/O on. I did some research and I figured even though this index seek query plan isn’t the most efficient for the India value, it is way more efficient than trying to do a clustered index scan for those 300 rows for China.
So I decided that that was the best option in this case and you can do lots of different things. If you know most of your results that are going to come back are, let’s say, around 10 rows and then there’s one parameter that brings back 100 rows and has a different execution plan, you can optimize for a parameter that you’re going to hit the majority of the time. So at least you’re benefiting from SQL’s caching of that execution plan and only compromising when you’re passing in that occasional variable that is going to get a less efficient, less than optimal execution plan. So, that’s another great way.
There’s a lot of other options out there too. You could use if-then statements in your SQL queries to call different stored procedures with each stored procedure kind of optimizing for different parameter values. You can set your incoming parameters as local variables and that’ll accomplish similar things. Like I said, people have done whole sessions on parameter sniffing. If it’s a problem you are having, hopefully now you’ll be able to recognize it and for sure do your research and test out different solutions before you implement any of them. But if you can fix that, your reports will definitely perform better.
So, explicitly defining property values. In all of our SSRS reports, if we look at one of these, any time we click on a text box or anything like that, we have all these properties down here and there’s a lot of these that are like “can grow,” “can shrink.” If you’re dealing with images, there is image auto size. If you want ultimate performance, you should not be using those auto properties. You should set explicit values. Basically, any property that needs to be determined at runtime is causing the SSRS server to perform additional calculations. If you can, let’s say, specify an image with 500 pixels, that’s one less thing the reporting engine needs to figure out while it’s trying to display the report on a user screen.
If you know your text box can be limited to a certain size, you don’t need to—you can set that and don’t allow it to auto grow or auto shrink, and that’s one less decision, one less step that the reporting engine has to do when it’s trying to render that report to the user screen. Setting these properties here, I included a link to Microsoft’s website where they go through many of these properties, maybe all of the properties, and what their effects are on rendering time. This isn’t going to save you minutes of time, but if you’re running a report that’s running thousands of times a day by hundreds of users, thousands of users, every second counts. If you can eliminate a second or two, if you can shave that off your report time, your server is going to be under much less load because of it.
So, this is where we start getting into really fun processes. So, stored procedure CRUD operations. CRUD stands for create, read, update, delete. Most people think SSRS can only render data to the screen, and that makes sense, it’s a reporting tool. It’s meant for reporting on data, but there are many instances where getting some kind of user feedback, some user input, would be very beneficial, especially from a performance standpoint.
For example, if you have users who are different user groups who are using the same report, they might be using that report in different ways. User group A might always want the data sorted a certain way, user group B might not even care about one of the subreports that are part of your report because they never use that data. User C might be doing something else. But for maintainability you just have one report and you have all these different user groups using that report. It would be nice if you could have some user preferences, and if you don’t have access to creating some other way to get input data from users, like SharePoint or InfoPath, or building a website to capture that data, then all you’re left with is reporting services. Good news is though you can capture input data, so we’ll take a look at that now.
For this example, let’s actually start with the final product because I think it will make a little bit more sense here. In this example, what I’m doing is I’m saying okay, from Stack Overflow I have some favorite posts. I created a favorites table in my database and I’m just basically storing the post IDs for whatever post I mark as a favorite. So you could see I’ve marked some SSRS-related posts as favorites. I have the option to un-favorite them if I want to.
I also have just another query running down here that is showing related posts and the SSRS posts as well, maybe I want to favorite them, so let’s try that. If I have this post ID here, so it’s 653 and I press the favorite button, it will actually go, and behind the scenes it is inserting that post ID into a table. You can see there it is, 653. Now my query which is querying from that favorites table is pulling in that new entry because I marked it as a favorite. If let’s say I accidentally did that, I don’t actually like that post, I can click un-favorite, and behind the scenes SQL just deleted that entry from my favorites table and so now it no longer appears as a favorite.
How do we do that? A lot of people I talk to, they have no idea that you can do this with SSRS. Admittedly, I don’t know if this is an intended feature or not, but it is possible and it has been possible for many versions. If you’re really desperate, this is by no means should you use this as your first preference. Like I said, SharePoint, InfoPath, websites definitely much better, but if you are stuck like I have been in the past, then this may be viable.
Let’s take a look at what this report looks like here. There’s a couple key things to be aware of and the first is let’s look at our data. The first thing to be aware of here is our datasets, some of them have columns, so our favorite post dataset which is returning our list of favorite posts here, has columns that we’re loading into our tablix. Our related posts as well has columns that we’re loading into our tablix. You’ll see we have two datasets here which are not returning any columns at all and those are stored procedures. So the set un-favorite stored procedure is deleting from our database and the set favorite stored procedure is inserting into our database.
The key thing to know with SSRS is that as long as a dataset has all of its parameter values set, and in case of datasets that don’t have any parameter values, those datasets will always run. So datasets in SSRS will always execute, and that’s really important too just as a side note. If you have datasets in SSRS that maybe you created them because you were going to use them in a report and then you’re not using them at all, get rid of them because every time someone is opening a report that dataset is running unnecessarily because you’re not doing anything with that data. So get rid of unused datasets. Datasets always execute as long as all their parameter values have been filled.
The second key thing to be aware of is in our data source, if we open up these properties, there’s this check box down here called ‘use single transaction when processing the queries.’ By default, this is always off and I highly recommended that that stays that way most of the time because what that does is when that box is not checked, your datasets will, if they are able to, execute in parallel bringing your data back faster. When we check this box the datasets execute in sequence. So in this particular case, that is exactly what we want to do. We don’t want the datasets here running in parallel and you’ll see why in a second, we want them to run in sequence.
The order here of these datasets does matter. By the way, there’s no way to change the order of these datasets which is very frustrating, at least in the GUI. So the way to do that, in case no one’s ever done it is if you actually right click on your report and go to view code, you can find the names of your datasets here and if you just reorder the XML code and then reopen the report, it will reorder the order of the datasets. But yeah, not very fun to do that.
So, moving on to the body of the report now. If we take a look at these links that I created, if we go to the properties and we look at the actions on them, you could see when we click on the link we’re just going to reload this report. We’re not doing anything else, we’re just going to run this report again or passing in a couple of parameters. The first one is the post ID from the row. If we’re looking at the report here, there’s going to be multiple rows. Let’s pass the post ID of the row that we clicked so we know which post to un-favorite in this case. The other thing we’re passing is this action string and so in here the value just in here is delete. You’ll see why that’s important in a minute but it is—we’re telling it that this is the delete action.
Then similarly, in the favorites link here if we go to the action section, we look at what’s in here, we’re running the same report, passing the post ID, but our action here is insert since we’re going to mark something as a favorite, we’re going to insert a post ID into a table. So if we go look at our scripts for all of this, see here’s our favorites table, super basic. If we look at the data in it, there’s right now there are two posts in there. Here are our stored procedures, our select users favorites, right? Those are your favorite posts, they’re just basic select query here joining on our favorites table that we created with the actual post table from Stack Overflow and we are limiting it to our user ID that’s being passed in.
So these are my favorite posts. If Brent went to this report and he marked his favorites, he would have his own favorite posts and that’s where these preferences can then make reports run faster because now you have additional data for you to be able to write more efficient queries with based on a particular user who is accessing the site. Then this is the related post query. Once again, nothing really special here. I just hardcoded some related posts. It’s not doing any kind of magical technique of looking at post tags or anything like that. The interesting stored procedures are our insert and delete ones. So you’ll see our insert, it’s getting three parameters passed in: User ID, post ID, action. Then it’s running an insert statement, which is basic too.
The key thing here is that action, which is if you remember, that’s the parameter that we had on those links and either the favorite or unfavorite links. It’s checking to see “Okay. Let’s only run this insert if the insert is the value of our action parameter.” Likewise, for the deleted post stored procedure, we’re only going to delete it if the action that’s passed in is the delete. So we finally have all of our pieces put together here to kind of explain what’s happening.
When we click favorite, this report reloads. And since our datasets are running sequentially, we first run the set un-favorite stored procedure but since our action for our favorite link says insert, nothing happens with the set un-favorite stored procedure. It hits that if condition, doesn’t come back as true, so it doesn’t do any deleting. On our set favorite, it runs that stored procedure and since the action parameter in our favorite link is insert, we insert that row into the database.
Brent Ozar: That’s really cool.
Bert Wagner: Then, since we’ve now either inserted or deleted the data, depending on which one we clicked, we rerun our queries for our favorite posts and related posts. If we added something it gets displayed properly on screen, right? A lot of people don’t believe me, but if we look at—if we run the select statement, so we only have two IDs in here, two rows. If we now run this favorite button and we go back then to the database and rerun our query, we could see that truly did. I am not faking it or anything like that. It is truly inserting into the database. So, yeah, you could do inserts, updates, deletes, you could really do anything on a SQL Server that you can do from a stored procedure.
So, not just preference screens but you can save other data additional like user history, parameters that they run in a report. Even though the SSRS usage data has that by default, those parameters are concatenated together with, I forget what, ampersands or something, so it doesn’t make it very easy to search. So if for whatever reason you needed better logging, you could implement that yourself. If you need to run some kind of queries, not regularly not to the point where you can schedule them, but you could create a dashboard with a bunch of links that will allow you to run queries when you manually need to in a very easy way.
Brent Ozar: I love it. That makes me want to write reports. That’s awesome.
Bert Wagner: There you go. So once you learn, so obviously it’s a little complex to setup. There’s a lot of things that can go wrong, so definitely test. But if you need to capture user input, you could definitely do it and that’s the way to do it. We use that a lot where I work. It’s awesome, it’s great and it’s really fun to build those reports too.
Brent Ozar: Several people have asked questions. Anne says, “Have you ever done it with one stored procedure for both cases instead of breaking it up into two stored procedures?”
Bert Wagner: Yeah, yeah. So there’s a lot of different ways you can do it. In terms of needing like an action parameter, there is really multiple variations on accomplishing the same thing. I just kind of chose what I thought would be easiest to understand in this demo for the largest number of people. The key thing to remember though is just check that box saying the datasets are going to run in sequence and inside your datasets, then order them in the correct order so that if they are running in a sequential process like in an ETL, your final results will look like you intend them. But yeah, there’s definitely more than one way of doing it.
Brent Ozar: Faig says, “There’s a nice stored procedure from Pavel Pawlowski to generate four CRUD procedures for a table. We use that as a starting point to expand tools to 13 different kinds of objects.” Holy mackerel, wow.
Bert Wagner: Wow.
Brent Ozar: I am guessing that’s complex.
Bert Wagner: Yeah, that sound’s interesting.
Brent Ozar: Yeah.
Bert Wagner: Okay. So, the next level of SQL hell here, er, sorry. [Laughter] I mean, Dynamic SQL for performance. Yes, so before we even talk about this, I know Dynamic SQL has a very bad rap and rightfully so. There’s a lot of stupid things you can do in it. There is a lot of bad code you could write. From a security standpoint, we have SQL injection, and there is a lot of problems that you can run into. But if we are talking about writing the most efficient reports possible, then dynamic SQL can help us in certain scenarios.
For example, let’s go look at a report. In this report, what I’m doing is I have a list of users from Stack Overflow Let’s just select a few of them, and we are getting kind of like a baseball card stat sheet on them, right? In the columns running across the top we have each user and we want to see different attributes, so we have years of wisdom. I couldn’t bring myself to write age there, and we have when they last accessed the site, and how many uploads, downloads they have, what their reputation is. Then we have some coloring going on for the max value if you are looking across all our different users. So this is an easy way to kind of compare user statistics on different users.
Now if we select additional users, like there, this report gets wider. Let’s select some more so we get some better colors. There we go. Report gets really wide, but as humans, it’s much easier for us to visually compare data side-by-side like this than it is if we were to store it vertically. It would be really hard to jump around from one user and then skip five rows of attributes down to the next user to compare who has the bigger reputation score.
So, even though in our database, storing data, that’s maybe more normalized or more vertical, it is easier for us to write queries on. In the end, if we are building these reports for users, we need to do what’s best for the user. In this case, building these kinds of really wide, dynamically width tables in SSRS is very painful, unless you do it with Dynamic SQL. In order to get really great performance, the only way to do it is with Dynamic SQL. So let’s go take a look at why because there’s a few elements here in action together.
Okay, here is our report. You can see our tablix is super basic on the page, it’s just this labelled C0, C1, C2. Here is our parameter query for our dropdown menu, so that’s just getting a list of users. Here is our dataset which is bringing back our users. So, the only caveat here is you need to know what the maximum number of columns that you will get back is. In this case, I limited my user parameter query to 30 users, so I added 30 data columns to my dataset. You can see in here, all this is doing is calling the stored procedure. If we go look at that stored procedure, here it is. This is definitely not a tutorial on Dynamic SQL, we are going to briefly go over this, but the idea with Dynamic SQL, in case you have never used it before, is you are programmatically building a query string and that allows you to do lots of fun stuff that you can’t just do with regular SQL syntax.
In this case, building a table like this, we could have used pivots-on-pivots, but eventually to make the width the way we need, you still need to add Dynamic SQL to it because you won’t know the maximum, or the maximum could change. So all of this here is where we are gathering our data, we are building up our different variables, where we are storing query strings, and then we are finally putting all our query strings together. Once again, I can’t stress, don’t let user input parameters and values come into here without you checking them or sanitizing them, or, yeah, you are going to have some serious problems when someone drops a table on your database.
But anyway, that’s our stored procedure. If we actually just look at the output here, this is the data that’s coming back for these four user IDs that I have here. We have two columns here, which are interesting. We have a column count column, which this is telling us how many of these C number columns we have. So, four users—sorry, one, two, three, four—our C0 column is our different attributes or properties for each of these users. Their stats, I guess. Then we also calculate their max value here and we put that in a column. There’s a couple of things going on here. One, you can kind of tell we are formatting our data in SQL Server. This kind of goes against my very first slide or second slide back there where I had SQL Server shouldn’t be used for formatting and displaying data. But in this case, if we want ultimate performance, this is what you have to do. You start formatting your data to the point where we are just going to put it into SSRS and drop it in there. So SSRS just has to display it, it doesn’t have to do any monkeying around to get the data into this format. And so that’s why we have our column names in the first row and our row attributes in here.
Another thing that we do is max value column. So if you remember in the report, we highlight some of the cells green to indicate which ones had the highest value for vote counts and things like that. So SQL Server, especially in larger datasets when you get out of the demo realm and into the real world of data, calculating aggregate functions is much faster in SQL Server than it is in SSRS. So we can pre-calculate what the max values are and store them in this column. Then use those to color the correct cells green.
So then if this is our stored procedure, so if we add more parameters to it, we’ll see that it is bringing back a wider result set. So then if we hop back over to SSRS, we have our, once again, our dataset with the max number of columns defined and we have our tablix. So, our tablix also has all of the columns of data inputted into it, which is a little tedious. Which, once again, I hate doing tedious things, but for performance you have to do it. The easier way of maybe building tables like this, because 30 isn’t too bad, but if you have to do a 100 or something, it gets really bad. It is just going to the XML view.
Going here you can copy and paste fields. Like if we go find, these are the fields of our dataset, but here are our tablix columns. Somewhere in here, you’ll be able to find—so here we have some expressions which we’ll get to. So you can find the fields in there and you can copy and paste the fields to make the job of creating so many columns a little easier. Once you do that, the two last things we have to do is, let’s say we’re bringing back only one user of data. We don’t want to show these other 29 columns on our page.
So on each of these columns, this is the one little bit of SSRS inefficiency that we’re doing, is we’re going to the column visibility for that column and we have an expression here, which it says, if we only have one column of user, so only one user was selected, we have that column count field which is coming from our stored procedure. We’re saying, okay, let’s show this column if there is at least one user. Otherwise, let’s hide that column.
Which, by the way, this is another huge pet peeve of mine, I hate that this is set expression for hidden. My mind doesn’t work that way, it doesn’t work in double negatives. I always want this to be set expression for show, because I always reverse these two. The point here is we’re hiding the columns that are unused. We don’t want the user seeing empty table cells because there is no data coming back. So you have to go through and add this expression to every single column which, once again, gets very tedious. We’re doing this in the name of performance. To help you out, I just recommend going on the backend of the XML code and copying and pasting, it’s a lot easier.
Finally, to get our green colors working, instead of having to calculate maxes and things like that in SSRS itself, we can just on our cell where we want to change the color, we can see we have an expression here for the background color. If we open that up, you’ll see all it’s doing is it’s checking that max value column coming back from our dataset to see if the current value of the cell column matches whatever the max was that was calculated in SQL. Much simpler operation, and if it’s true, then it’s changing the color to green, otherwise it’s keeping it white.
That final result is this nice table which adds dynamically as many columns as you want and colors the backgrounds without having to do any kinds of heavy calculations in SSRS. So once again, even though this is Dynamic SQL, not the best practice by any means, but in terms of performance, if you need to get the ultimate performance, there is no faster way of doing this than formatting and preparing your data as much as possible in SQL Server and then letting SSRS just display the data as simply as possible as it’s doing in this case.
We just have one demo left. Like I mentioned, the performance, the one downside of that report was adding all of the visibility expressions to those columns because if you’re bringing back one column of data and you’re having to hide 29 other columns, that takes some time in SSRS to process that, so one solution to that is subreport switching.
If you know that you’re going to have variable numbers of expressions, you can use multiple subreports to lessen the burden on SSRS. So let’s take a look at that. If we open up this example—let’s look at the finished product first. The finished product is exactly the same as our previous example. But if you run performance statistics on it, you will see that this one runs much faster. The reason for that is because we are using multiple subreports to divvy out how many expressions we want to run.
If we take a look at this report here, you’ll see that we have a subreport switching parent report, so we’ll open that first. We have this subreport switching parent report, and we have three child reports: child one, child two, and child three. So our parent, as you can see, has a subreport here. But we only see one subreport. That’s because all of the subreports are stacked on top of each other, that’s our little secret about this.
We take all of the subreports, we put them one on top of another, even though there is three individual reports. So what we’re looking at here, this report just captures the user parameter. It sends that data to one of these subreports, and each of the child subreports, the only thing that’s different about them is they’re all running our dynamic stored procedure, but child report one has only five columns defined. Child report two has 10 columns. Then child report three has all 30. You can divide these reports up however many times you want, you don’t need three of them, you could have two of them, you could have 20 of them. With 20 though, I hope you have lots of patience or need for that kind of performance.
What this accomplishes is, is that if we get a report where a user only selects three Stack Overflow users to compare, we’re going to run this subreport here. It’s going to fill up these columns one, two, three with data. Then an expression is only going to have to run on columns four and five. So hiding two columns in a tablix is much faster than having to hide 27 columns if we ran our original report here that has 30 of them. So that’s what we’re basically doing. We’re sending the data to the report that is best equipped to display the data with the least amount of additional expression running as possible.
So a couple key things to know about that is the way we want to do that is if we look at these subreport properties and we go, first thing is the visibility. If we don’t do anything with the visibility, all of these subreports are going to show up on the screen all the time. So what we want to do is if we’re looking at the visibility for child one, which can handle five users, we check the visibility expression. We say, okay, if there are more than five users, we’re going to true, we’re going to hide this subreport—drives me nuts—versus false. So if there’s five or less, we’re not going to hide this report. We’re going to show it. So the other two subreports have similar visibility expressions. So for this child two, which if you remember has 10 columns in it, and here’s our great nested if. If there’s more than 10 we’re going to hide, but if there’s more than five columns we’re not going to hide it.
This definitely took me a couple tries here to make sure it was working the right way. It was messing with my mind. But yeah, what’s happening there is only the subreport is showing for the one that can handle all of the data you need. Now the only other thing you need to do is even if you set those visibility options, all three subreports still run, all the time. Which is really defeating the purpose of this, where we’re trying to gain efficiency by only running one of the reports and in particular running the one report which minimizes the number of visibility expressions that we need to run.
In order to prevent that on each of these subreports, in addition to the visibility, we have a parameter here that we’re passing. We’re passing our parameter, our user dropdown value parameter, but we have an expression on it. And this expression is—let me choose a simpler one. Picked the hardest one. If we go look at that expression, so this is the expression for the child one subreport which only has five columns. We’re saying only send our user parameter if there are less than six users selected.
So if it’s less than six users selected, send our user parameter to it. Otherwise, don’t send anything as that parameter. What that does is, like a few slides ago when we were talking about the CRUD operations, I was saying every dataset always runs except if one of the parameter values for that dataset hasn’t been set. So by sending nothing, our datasets in our subreports—let’s take a look at one of those.
Each of these datasets, rather, the parameter here, gets set by default by its parent report. We don’t have any available, no default values. So then if our dataset in here is requiring a parameter value of a user, but we don’t send it from the parent report, then this subreport will still run but the dataset doesn’t execute, which means the report will not render. Which essentially accomplishes our goal of not displaying the data, not rendering that data if it’s not necessary. So the key thing to remember if you decide to split up your data this way is make sure you set that visibility expression on each of these subreports so only one of them shows at a time.
Additionally, and even more importantly I think, is make sure you pass nothing as a parameter value to the reports that you don’t want to run. Otherwise, all three of your child subreports are going to execute and you’re really losing any kind of performance benefit that you did by creating all these subreports in the first place. So divvying up the work into subreports like that is a really great way to reduce the amount of time SSRS needs to be running expressions unnecessarily.
Brent Ozar: Yeah, Carnegie is excited about that in the Slack room. He likes that a lot.
Bert Wagner: Oh, great. I’m glad to hear it. So, that’s it. Thank you, everybody, so much. I appreciate you listening in. I hope you learned some cool new tricks that you can use. Please go to my session website there in the GroupBy link. Rate the session, I love feedback. Also the presentation, slides, demos, everything, our linked there, they’re on my website. You can go there if you have any questions. You can email me, send me something on Twitter, I’m happy to answer them. So once again thank you and thank you, Brent.
Brent Ozar: Oh my pleasure. No, thanks for presenting. It’s been fantastic. Several people are commenting in. James says, “Good stuff.” Kathy says, “Very clever tips.” J.D. says, “This is pretty awesome.” George says, “Yeah, you did such a good job for a young guy.” Like whoa, okay, all right. Anne says, “Terrific session. Extremely well planned and cleanly presented.” Vincent asks, “What’s your favorite new feature of SSRS 2016?”
Bert Wagner: Oh, man. He’s got me there. Actually, I haven’t used 2016 outside of my regular laptop. There are some interesting features I know with like mobile, that I want to play around with a little bit more, to see how those work. But in all honesty, I haven’t used SSRS 2016 to its full potential yet.
Brent Ozar: Chintan says, “I wish my SSRS developer saw this.” Well, it will be on YouTube too. So watch groupby.org and we’ll have it up in like a week or two. Lee says, “In a reporting environment, is using with recompile really all that painful?” I’m guessing he means if his reports only run once a minute, twice a minute.
Bert Wagner: Yeah, it definitely depends on your usage. If you have reports that run a couple times a day, you’re probably not going to feel much pain, which is why then that might be a great option for you. If you have a report that’s running way more frequently with many different values, with recompile might not be the best choice. So it just depends, that famous SQL answer. Right?
Brent Ozar: Yeah, right. well, thanks a lot for presenting here today.
Bert Wagner: Thanks.
Brent Ozar: Thank you.
Latest posts by Bert Wagner (see all)
- DBAs vs Developers: JSON in SQL Server 2016 - March 21, 2017
- High-Performance SQL Server Reporting Services: Lessons Learned - November 23, 2016