Successful production deployments with columnstore index in SQL Server 2016


Audio podcast:

Enjoy the Podcast?

Don’t miss an episode, subscribe via iTunes, Stitcher or RSS
Leave us a review in iTunes

Target Audience:

Architects and DBAs

Abstract:

Imagine your Data Warehouse is growing exponentially to multi-terabytes and you have been tasked to make analytics queries even faster while keeping the cost of storage low.  Industry leading SQL Server 2016 columnstore technology can help you solve these by reducing the storage footprint by 10x (average) and speed up analytics queries up to 100x. SQL Server Tiger team has leveraged columnstore technology with many Tier-1 workloads to achieve these goals.  This session will cover three different types of workload. For each workload, we will cover the application architecture, challenges and how SQL Tiger team used columnstore index technology to address them.

Why I Want to Present This Session:

First, I want raise the awareness that the columnstore technology can manage large amount of data efficiently with significant speedup of  analytics queries. Second, the columnstore technology has been successfully applied in workloads other than DW such as running analytics on transactional workload without the need to move the data for analytics to a dedicated DW.

Additional Resources:

https://blogs.msdn.microsoft.com/sqlserverstorageengine/tag/columnstore-index/

Session Transcript:

Sunil Agarwal: So Brent and GroupBy organization, thank you for this opportunity. So I’ve been involved with Columnstore feature for 2016, it used to be run by the … If you recall the device which was changed to ASP, so for them Columnstore was a very important feature, so they drove the 2014 development of Columnstore and then it actually was given back to us and so the Columnstore that I’ve been involved with is learning from the customers to understand what challenges they were facing with Columnstore technology in adopting to their workloads, and bringing those solutions in 16. And I must say that the 16,  the Columnstore is quite complete. I mean, there are of course missing functionality, but it’s in a very good space right now, and because of this, as you will see, we got a lot more interest from customers to adopt this technology to improve the performance of their workload, and what I wanted to cover – everybody knows, I should not say everybody, but most people know the Columnstore technology is for analytics workload, data warehouse, you do aggregates, stuff like that, that’s Columnstore. No brainer. If you have a data warehouse, let’s use Columnstore and life is good.

So what I wanted to do was pick few customer cases, which you may not think of Columnstore as if for solution, and what amazes me is that even I tried Columnstore for those solutions with some reservation. I said let’s try it, and it turns out it was very successful. So if I have one message for the audience here is that just don’t think Columnstore as a very narrow solution for analytics. Just be open minded and see scenarios where it can be useful, and hopefully, the slides that I’m going to present here will give you some ideas about that.

So I assume that people know what Columnstore is, and I think in the audience if you don’t know what Columnstore is, I’m going to give you very, very high-level description what it is and my recommendation is to do – watch a couple of online presentations that have been done by Niko or myself or anybody else, and you look at the basic understanding and then I think you can watch this video after that. So what is the benefit of Columnstore and I want to have a very summary thing to say what we have done for Columnstore in 2016, and the success stories that I’ve chosen, these are the customers who are in production with Columnstore 2016, so Daman is a health insurance company in UAE, First American Title is a financial company in Los Angeles, M-Files, I think they’re from Finland, FIS is a finance company in I believe New York City, and there’s an insurance company, but I’m going to skip that part. So four customer stories.

So first thing I want to showcase you in case you don’t know much about Columnstore, why Columnstore is such a hot commodity, and I say hot – I mean, if you look at a lot of migrations that are happening with 2016, there are many big ones, one is Columnstore because it is giving amazing performance. Second one is R, which is the R integrated inside SQL Server and of course a lot of support availability things we have done. So Columnstore is a very, very powerful feature and we are seeing a lot of migration to 16 because of that.

So let me quickly show you a demo. This demo actually have shown many times, but I think it shows the value of Columnstore in a pretty big way. So we have this database called AdventureWorksDW. You must have been DW 2014, 2012, and this is 2016, and one funny thing about this one is it says CTP3. Now, ideally it should not say CTP3, this is actually final version, and there is some contention point between multiple teams to decide whether we should keep it at CTP3 and have – you know, we have worldwide trader as a new database, so don’t think of CTP3 as something half cooked. This is pretty much fully baked database, and it’s available for download. So I’m going to use this database, and I’m running behind the scene a SQL Server I think 2016 SP1, and if you want to see I can just do select star from @version. Brent one question I have, how much time I got?

Brent Ozar: As much as you want, they go between – anywhere between an hour to two hours.

Sunil Agarwal: Alright, so I’m just going to go easy on this one, I’m going to take my time here. But I promise to finish before an hour and a half, and if people have questions, how do they ask questions?

Brent Ozar: They have both Slack and GoToWebinar and then whenever they pop up I’ll interrupt you if it works.

Sunil Agarwal: Right, so as you can see, I’m running SQL Server 2016 SP1 with CU2 and pretty much it was released on March 6th, so that’s the version I’m running, and I think it is available to anybody who wants to try it out. So now what I want to do is run a semi complex, I mean, I’m sure in any production environment, you guys much more complex queries, but this table has a Fact table, which you see here, Reseller Fact table, and there are three dimension tables, right, so there are basically adjoining four tables and with a restriction that I want to look at the data, four years. 2006 to January 2010; any global sales – territory, region – and stuff like that.

So this is like a common query that you will see that is running in a data warehouse scenario, and this query that I’m running, in this case, the Fact table is a row store, and so this is if you have not an update Columnstore technology in your data warehouse, you’re either with row store, in many cases, I mean, I remember from 2008 we recommended people to use page compression for your row store to minimize the I/O that you need to do to run a warehouse kind of a query. So in this example, actually I do not have page compression, I just wanted to show the extreme of what can be achieved, because on my laptop I cannot really run big workloads, so basically, this is the row store table, and it is moody. Sometimes it takes 30 seconds, sometimes like one hour – not an hour, like over a minute. So here it’s saying it’s taking 53, 54 seconds, so what I’m going to do, I’m going to stop it because I don’t want to waste anybody’s time here, so basically it is definitely taking more than 59 seconds, right?

So what you can do – let’s say you had a data warehouse workload and where some queries are taking long time, and why this is so important now, because that data is growing. So some query that was running, say in two minutes yesterday and now your data has doubled over last year, the same query may take now four minutes because at some level the query performance will be proportional to the size of the data that it has to manage, right? So what you can do is you can take this Fact table and just say create a clustered Columnstore index, right, it is a clustered index in the sense this will be the master copy of the data, you’re just saying create clustered Columnstore index on that Fact table that I had as a row store.

Now, of course, if you had a clustered index – the row store clustered index on that Fact table, you will say drop existing right, so you just create that index. Now, this index is a bit more expensive to create because not only we have to create that index, we have to compress the data. So what we find is the clustered Columnstore index takes I think 1.5, like 1.5x of the time it will take you to create a clustered index. Something like that. So you create that index, and now what I’m going to do is I’m going to run the same query, and notice what I changed here is my Fact table has changed instead of row store, I am saying it is CCI. I’m going to run this query now, and what we will see is that this query is exactly the same, all I did was I created a Columnstore index on it, and for some reason – okay, so this query finished here in 11 seconds. Now, so at least we can say that the query ran six times faster than the previous one. Remember, the previous one I had shut down after almost 59 seconds. So here is the benefit. You are able to improve the performance of your query six times on my laptop, without changing your application. And I think this is the biggest deal for us because a customer who wants to upgrade say from 2014 or 2016 or wants to use Columnstore, they don’t have to change the application. They have to of course create the index, but no changes, and that’s why it becomes so much easier to adopt.

And what we have seen is – and this is an unbelievable number, some queries are running 300 times faster. I’m not saying every query, some queries. Some queries run two times faster, some queries are ten times faster. So on average what we are seeing is 10x, in some cases even larger, I’m going to show you some examples from our customers, which is even 100x. So that kind of performance gain, and the funny part is no change to the application. So that is one part, and remember we were saying compression. So now if I want to show you the space used, these are the two tables, one with Columnstore, one without Columnstore, the top one is without Columnstore, right, that’s row store. And what you see here is that the size taken by this row store is there are 2.5GB and here it is 700MB so it’s like three to four times the space savings. And again, this is a generated data, so it does not compress as well, but many production workloads we are seeing approximately 10x or more data compression.

So what you are getting here is reduction of your storage 10 times, and the speed of a query performance 10x, 40x, 30x, whatever it is, big way. So you win on the both sides, you’re not only reducing your storage, you’re improving your query performance in a big way and without changing the application. So this indeed is the power of Columnstore index.

So on that note, let me go to the presentation. So this is a value proposition. Now, if I want to show you the numbers, this is the TPCH benchmark, the benchmark that is done by the NC standard, a lot of our hardware partners, you know when they come out with the new hardware, they want to showcase the meanest workload they can run and say what performance and they can get. Now, notice something. This is 10TB of data warehouse workload. 10TB, right? Very common. I mean, I have not seen many, many data warehouses above 10TB, not that they are not there. On 10TB, SQL Server holds a world record on the last seven numbers. See SQL 16 and so on, so we are at that level, the market leader at data warehouse on a SMP box.

I’m not talking about scale out model, SMP box, and then if you look at the bottom number, which is 30TB, again, we are the leader with 2016. So this technology that we got is an industry leading technology. Now, you can say you know, columns are Columnstore, right? But actually if you think about – Oracle has a Columnstore, HANA has a Columnstore, DB2 has a Columnstore, and it’s just that the implementation of those Columnstore are different with different organizations, and we have – clearly our implementation is superior, and one key reason of that superior implementation is something called batch mode execution. I will not go into the batch mode here, but I think the – we have done some pretty fundamental improvements to Columnstore implementation that helps us lead the world market with this phenomenal numbers. So that’s where we are with the TPCH workload with Columnstore.

Now, I was telling you about a lot of improvements we have done with Columnstore, so we did improvements both in the performance side as well as functionality side. So on the performance side, what we have done is we do a string predicate pushdown to the scan node. What it means is the string – the reason I call out string predicate, because in a traditional way, what you used to do – what SQL Server used to do, it will get the row out of the scan node into a filter node, which is where you apply the predicate. And it will apply the string predicate, string comparison is expensive as you know, and then if the row qualifies it goes up, right? So what we are doing with 2016, instead of applying this string predicate at the filter node, we push the string predicate to the scan node, and by doing that, we are – it depends on the data. But we see a significant improvement in the query performance and the magic is this: if you know about Columnstore, the Columnstore strings, it is stored in a [inaudible]. So when we do the string comparison, we compare the [inaudible] entry the other than the actual row entry. And if the string is repeated ten times, you only compare once, worst as ten times right, so that is the kind of gain you get.

Aggregate pushdown, same kind of things. Again, these things happen behind the scene, you don’t have to worry about if you were moving from 2014 to 2016, this happens automatically. You don’t have to worry about it, and aggregate pushdown, new batch mode operators, sort is a new batch mode operator. We have done Windows aggregates as a new batch mode operator, and then we have done the SIMD, which is a Single Instruction Multiple Data, so basically same structural cycle, you can execute multiple instructions. So all those things are in-built with 2016 to give you amazing performance.

In terms of functionality, what we have done is we allow a non-clustered – so I’m going to go slow on this one. We allow non-clustered index on a clustered Columnstore index. Now, it may sound mouthful, but think of this way. You have your traditional – like you have row store table right, you have row store table, on which you have clustered index, and then you have multiple non-clustered indexes. Before 2016, if you create a clustered Columnstore index, we did not allow you to create a non-clustered index on it. Now, you can say what’s the big deal, in many workloads yes, there is no big deal, because many queries are running data across millions of rows, you don’t use non-clustered index to scan millions of rows right, you’ll be dead. So it works fine, but what actually is interesting is that many workloads, sometimes you know, it’s not a pure data warehouse workload.

Sometimes they need to do quality search, they have to do short range queries, and without a non-clustered index it would have been impossible to run that query efficiently. With non-clustered index, you can even do parameter key, foreign key constraints on a clustered Columnstore index, which was not possible before. So that we allow, this is actually a huge improvement for us, and because we have this non-clustered index, we can leverage the index as a scan path to enforce low level locking. Previously it was not possible. The locking as at a much, much higher regularity, which we call row group – I can’t say that word, row group regularity. It was at a much hard level. Think of like a row block versus page block, in the context of Columnstore, data rows and data row groups, there are no pages, per se. So we were taking a lock at the row group level, which was a billion rows together. So you can think of that, that’s a very coarse level of lock, so this NCI allows us to do that.

And the third, this will take a non-obvious thing to a lot of people, AlwaysON, we have readable second read. This is something we came out in 2012, the clustered index – Columnstore clustered index – clustered Columnstore index, if you had on a table, you could not run a query on it on the secondary replica. With 2016, we removed that restriction and the reason we were able to do that was I don’t know how many of you know about readable second read, but the queries that are run on the readable second read are under Snapshot isolation, and since Snapshot isolation was not supported until 2016 on Columnstore index, you could not do that, and now, because it’s supported, you can do it. So that’s what we have done.

Brent Ozar: There’s a question from Anna, Anna asks, “It looked like on that last slide that a lot of it is moving towards OLTP support for Columnstore indexes. is that true?”

Sunil Agarwal: Anna, this is a great question. So we believe right, the – you know, what is a data warehouse workload? Data warehouse workload is insert end queries, right? Because you insert the data on a nightly basis or hourly basis, and then you do the query that is a traditional data warehouse workload. But what we find is there are workloads which you can say OLTP workloads, which are predominantly insert only and queries of course, but they do 5%, 10% updates. Now, those updates are quite expensive if we did not have NCI because it did not know how to get to the row that you want to update. You don’t know how to do the whole table scan, right, so yes, what we are recommending that CCI, the clustered Columnstore index could be used for OLTP workloads which are not as heavy for updates.

Then, we came out with a third thing which we call real time operation analytics, and what exactly is, think of this as a – you know, what people do traditionally today, they run a transactional workload in their OLTP system, right, like order entry system, and on a nightly basis they do a ETL from the operational system into a data warehouse to run the queries. Now, the reason it is done, because the analytic queries are expensive. You don’t want to run those queries on your transactional workload, otherwise your transactional workload will slow down because of locking, blocking, whatever not, right, so people have traditionally segregated transactional workload and the analytics workload.

Now, think of this, the Columnstore index can run the query much, much, much faster. So there are a category of those workloads where you can in fact run the analytics on top of transactional workload. That means for those you do not have to move data through ETL to a separate data warehouse. So what we came out with was updatable, non-clustered Columnstore index, and the reason we say it is updatable, because non-clustered Columnstore index was available in SQL Server 2012 but it was not updatable. Now, imagine if you create a non-updatable index on a transactional workload, you’ll be dead right? Nothing can be inserted, so it was not useful. But now we have it, and I have a clear example – two examples. We have customers have used NCCI on their transactional system to get rid of ETL, get rid of data warehouse, so it’s a very powerful thing.

And finally, one thing which I wanted to point out is the Columnstore index creation or rebuild is offline operation. In fact, we fixed the non-clustered part in 2017, SQL Server 2017, but it is an offline operation, and you can think of if I have a billion rows in my Fact table, rebuilding that index could be expensive, and the reason you rebuild indexes is because index can get fragmented. Now, what we are providing with 2016 is you can defragment Columnstore index on line. Now, what does it mean, a Columnstore index being fragmented? It is quite different than your traditional row store index. In the lingo of Columnstore, index is fragmented if it has a lot of deleted rows, because in Columnstore, when you compress that data, you can’t really do a lot because imagine this right, you have compressed data, and somebody comes and says I want to delete this row number thousand from this compressed portion. You have to decompress, remove that thing, and then recompress, right? It’s a very expensive operation. So what we do is we just mark the row as deleted and when you do the query, we just subtract that row from the results. Over time, you can imagine that there are a lot of deleted rows, dead rows because they are taking storage on disk, they are taking memory, and they’re adding to the filtering overhead right, so we actually provide an automatic way to defragment the data, so that can happen.

I should take the point automatic back because we took it back. You need to run, alter, rebuild, something like that. So that will take care of that, but this is at a very, very high level, the features that we have added to 2016. I told you about the performance, so here is a very interesting chart. The TPCH number that we have, what we did was we ran the same workload, TPCH on the same physical hardware, notice this hardware here, same hardware, no changes to the workload, and we got 40% better performance. That means you upgrade from 14 to 16 you get such a good performance, and why this magic is happening? Because of a string predicate pushdown, NCIs, aggregate pushdown, many, many thing right, so this is what you have been telling customers, that number one, if you are running on 2014 and you are happy, you’ll be happier when you run 16 because on the same hardware, I mean, you get 40% better performance. I mean, nobody will complain about that, right? That kind of stuff.

Okay, so this is at a very, very high level what we have done for Columnstore. Now, let me go through this four workload that I have chosen for you guys, hopefully they will be compelling enough for you to think about Columnstore. So the first one that I chose is a traditional data warehouse where you can close your eyes, you can say I want to use Columnstore, and nobody will complain. So this is that workload. So in this case, it’s a health insurance company in UAE, it’s called Daman, and they have a traditional data warehouse with hundred target tables, and they get data from multiple places, and these tables have billions of rows. Very, very large database system, and it is like in Kimball architecture, you have the star schemas, but in an organically grown data warehouse, the schemas become very mumble jumbled right, but it is – you can say it is like a star schema and – with a lot of stuff. You can imagine the tables, multiple joins, and this company, Daman, was running their data warehouse until last year on Cybase, 16 not one.

They were running on Cybase, and I would say they had done a lot of engineering, they installed Cybase three, four years before that, and they were getting good numbers but they had to put a lot of time to tune those queries. So this is how their data warehouse looks like, they had this – I mean, I won’t even guess what these are, but pretty much data is coming from multiple sources right, to go through whatever ETL chain, it lands into this database, and this is where they are reporting queries and these are the users. So the focus of this application migration was this guy. The UDM was running using Cybase, as I said before, and this is the query. They have claims and the members and whatever not, and these are big tables and you can say these are the dimension tables. So here you can say that each query, typical query it was three or four joins. This is a typical query, and this is a more complex query. This is what I was talking about organically grown schema, it gets complicated. So this is the kind of queries, and what this customer found was their data was growing and their queries were running slower and slower and slower, and they just did not know how to tune it.

So they contacted this company, I think it’s called – consulting company, they asked them to help them design their system for the next five years, because Cybase was not scaling for them, and their requirement was you know, I want better performance because if the performance is good, I can do more queries. Because imagine this, if you run the query, it takes five minutes, chances are you will not be as agile in your analytics. You will say well, this is good enough. But if I can see the result in ten seconds, you may be tempted.

Brent Ozar: Yes, you ask a lot more questions.

Sunil Agarwal: Right, so they believe that the mode inside, the mode in back of the business, and they wanted the response time in seconds per common query. Remember I was talking about common queries on the right top, and the bottom one is more like a reporting query that are done every night. So seconds response time, and I think the challenges I was saying was they could not scale the technology for the volume of data they were expecting from now until the next five years. So what they did was they hired this company and they actually chose, they had six contenders, and they had in fact Oracle – sorry, they had Oracle, other big vendors they can think of, they were there in the run, and around beginning of 2016, they contacted us and we work with them very closely to give 2016. They were also trying 2014 because it was in production and it was within the timeframe, so they were not thinking of 2016, but somehow we contacted them or they contacted us and they said okay, we’ll do a POC with 2016, and between March and June they did the POC and in June or July they went in production because the numbers were so good, and I’m going to show you the numbers.

So this is a solution they came out with, they had the Fact table, they had multiple Fact tables, they used CCI, stand for clustered Columnstore index, and they also needed to create a non-clustered index on top of it because some queries could benefit with a non-clustered index. Okay, and even the dimension tables, some of them were so big, they used clustered Columnstore index, and the rule of thumb for the Columnstore index is the number of rows should be at least a million rows because then we compress, we compress in a unit of million rows. If you have hundred thousand rows, it makes no sense to compress those rows because it’s not big enough. So anyway, so they did that, and this is the most interesting slide that I want to talk about. So remember they had like five or six contenders, and so they ran those queries, the left thing that you see here, this at the bottom says 2.8 minutes, these are the reporting queries that are more complex, and on the right side it is 1.1 second or 25 second, these are the typical user queries that they wanted to run in seconds.

Now, the thing I want to point out on the left picture is that 14 is this blue guy. It was running that query in 26 minutes, this is not seconds. 26 minutes, right? Because they were very complex queries, billions of rows, and 2016 no changes, ran the same thing under 2.8 minutes. So SQL Server 16 gave them a performance gain of almost 10x. Now, this competitor, I can’t tell the name because legally we cannot, competitor was running at 3.2 minutes. So imagine this, when they were evaluating 2014, this competitor came out good here and not here, right. So they were in the six, what do I do, what do I do, then 2016 came and notice it is leading the pack. 2.8 minutes for very complex queries, and 1.1 second for the typical user query for which they wanted to get the query performance in seconds. So based on these numbers, they decided that 2016 is the right product for them, and they migrated the load to 2016, and I would say this, they are primarily a Oracle Cybase shop, and they were so impressed with this number they are in fact looking at other applications in their organization which they can move to SQL Server 2016. So that has been the power of this technology.

Brent Ozar: Nice, and I guess it’s probably saved them a ton of money because they probably didn’t have to rewrite a lot of queries either. I mean, if they’re coming from Cybase.

Sunil Agarwal: That’s right, that’s right. So Cybase is a cousin brother for SQL Server right? Exactly, so yes, this is what it is, and when they ran the number – this was the POC. Then they ran the numbers in the production now, so at half queries you can see that they don’t have the numbers for this daily runs okay, but they have numbers like weekly reports, 60% improvement, like notice SQL 16 it takes ten hours, Cybase 28 hours, business monthly thing, ten hours here, two days in Cybase, so it will only run on the weekend.

Brent Ozar: Nice.

Sunil Agarwal: So this is a huge amount of gain. And again, the thing as I showed you in my simple demo that I had, all they did was change the table to use clustered Columnstore index rather than a row store. Now, this – what happened was this person was here in Redman to give us a live demo of their queries, so this is a typical query that they had, notice it has a join on three or four tables, this alert table which was a fact table 2.75 billion rows, and they ran the query live, and I think I want to show you – I can show you that it ran in three seconds in the bottom, because trust me, it ran in three seconds, and they couldn’t believe it that this can run so – this is the numbers they showed in the production system so I just wanted to capture the screenshot because I cannot show the demo myself. So that’s the power, and learning for us was – as you can imagine, they would do a lot of like – this is an insurance company right, so you file a claim, there could be some change, you need to update those rows and updates are done as delete inserts. So what we did was when they were having these nightly changes, we did some tricks about how to you handle that kind of thing. So what we did was we did delete in first pass and an insert in the second pass because our delete insert – because if you do updates row by row it’s more expensive, so we did some tricks like that and it worked out well for them. And in the aftermath – and this is I think Brent, you might have seen as well, our old cardinality estimator is giving better query plans for many workloads, and so what they did was they used the old cardinality estimator with SQL 16 to get these numbers.

And I want to show – this is a case study, this is live, you can check it out, and these are their vice president – in this case, architect said they got dramatic improvement in the performance over Cybase right, this is one example they quoted, and instant answer to their queries. Remember it was 1.1 seconds in the POC, so they actually are able to use this performance to their advantage to increase the impact of their business. So this is a very, very actually good case study for us because not only showed what Columnstore can do for a real workload, it also helped us compare Columnstore with our competition. So that was one example, now any questions on this slide or I go to the next example.

Brent Ozar: No, you’re – there was one. Joe says, “Do you expect 2016 SP1 to improve these benchmark numbers even further?” I’m sure he’s seen like Pedro Lopes talk a lot about the improvements in SP1 for example.

Sunil Agarwal: Right, so no. We have not done any improvement on Columnstore that I recall right now. Our – a lot of our improvement that we have done that Pedro talked about was on supportability, like making the query plans much richer in terms of diagnostics information, or the clone database or stuff like that, but not for Columnstore per say. A lot of performance improvement that we are doing will come in 2017 and the question gets asks is will you backboard those enhancements to 16 or 14. Chances are low because in many cases, they require a lot of changes to the code, and bringing the code to previous release is risky, and also we want people to move to the newer version of the release.

Brent Ozar: Cool.

Sunil Agarwal: Okay, alright. So this was one scenario, where in all of this customer – the story is this, they were working on this problem, which I’m going to describe to you with Microsoft, they tried different technologies, and I don’t know how, but we got contacted. They said okay, see if Columnstore can help, so they reached out to us, and I looked at the workload and will show you what the workload is, it was not obvious to me that Columnstore will be helpful, and I’m going to show you here.

So what First American title company, most of us or some of us have dealt with real estate at least once in our lifetime, you have to fill a title right, get a title of the property. So this is that business, and what they do is they provide this service to the realtors in US, so what they have, they have a database of 150 million properties in USA, and it is just coincidental that they have 150 attributes of each of those properties. So 150 million properties, and 150 columns. So columns could be the price of the house, square footage, units sold, number of bedrooms, stuff like that. So those are the properties, and then you know, if you want to search for a house, you will search I want to search in this geographic region, this kind of accommodation, price, whatever not, right? So people will run those kinds of queries. You are not looking at aggregates. You are saying show me the properties that meet my criteria, so there is an aggregation. Now, in many cases, people are interested in year it was sold, what is the tax, I mean like on average, they were running with ten plus columns in terms of attributes they want to see of a given property, and the results were returning approximately a hundred rows or more.

Okay, so this is what it is. Now, when you think about a data warehouse, this does not look like a data warehouse, right? Because you’re not doing any aggregates, you are just doing selections with predicates, right? Now, other thing was their daily ETL was one million rows. Now, you will wonder what is changing in the property, right? What is changing? Well, think about this, yearly the tax is increasing. If you have 150 million properties, there are 150 million updates, at least per year. And then people are improving their property and like adding a room, whatever it is, so on average, they were doing a ETL, which is one million rows. ETL – most of them were updates, they were not new properties. Of course, new properties were there but most of them were updates, and as you can imagine, updates are expensive. So we were not thinking of Columnstore. So this is what they said was. They said, well, this is not a Columnstore scenario, and what they did was they said, what are the common queries people do? What is a common query people do? Okay, people are looking for queries on these 20 properties – 20 attributes of the properties, like the square footage, let’s do this, let’s create an index on each of those properties and when I run the query if it is matching those properties I will be able to run those queries faster, which I think is true.

But if somebody comes from left field and uses properties that nobody queries too much on, it’s going to run slow because that will mean I have to scan the full table scan, 150 million rows, right? Clearly, they could create index on each of those attributes, but that will be suicide, because if you create 150 indexes, ETL will – who knows what will happen, right? So that was the problem they had, and what happened when they contacted us, they were working with moving this solution – they were running actually this solution on SQL Server 2014, believe me, 2014, and they were not using Columnstore because they did not think it was Columnstore useful here, right? And they were also saying this is not working for us, let’s move to a comparative platform which has whatever not, right? So somehow we got involved and their success criteria was that 95% queries should take less than five seconds, which makes sense. I mean, you are searching for a property, if it is going to take you 20 minutes to get back what it is, you will not – you’ll forget about it, right?

Okay, no query should take more than 10 seconds, and they should support up to 400 concurrent users, actually sooner. So this example for SQL that you select list, you search for a property, where you can say home including percentage 50, whatever it is, mortgage this, mortgage that, and it was taking eight minutes to run on their system with SQL Server 2014. This is another example, where some other thing, maybe here there was an index available, and the query ran in 10 seconds. So you can see the numbers that they were getting were way, way slower than what their success criteria was.

Okay, this is what they were right, so we reached out to them and what happened was they had this traditional row store right, here you can see the yellow guy is a row store, relational table, this space, clustered index, they had a B tree indexes on it, they had a full text index on top of it because they were searching where – you see what I’m saying? Something like that. They also had a spatial index. Why? Because they were also saying I want to find a property and they had a graphical interface, you can say you draw a circle in this area, right? So they had a spatial index. So this was actually the solution they had for SQL Server 2014, which they were in production with that. So what we did was we asked them to create guess what – a non-clustered Columnstore index, NCCI. Okay?

Actually, Anna was asking about OLTP workload, so OLTP workload operation analytics can be achieved through NCCI, so we created NCCI. So what that means is, everything that was working before like works like before, because we did not remove the structures. Now, any query that requires a Columnstore will automatically run here much more efficiently. This is what they did. No changes to the application. Now, clearly, in this case, we did not support a spatial columns in non-clustered Columnstore index, so we removed those, and so how did the magic happen? Why did the queries run so fast? I’m going to show you the numbers.

So this is where some background in Columnstore is useful. So in Columnstore, imagine here, we have 150 million rows right, so what we do when we create a Columnstore index, in this case, NCCI, we group the rows in units of one million rows. This is the goal, not every time we can get that unit. So let’s assume we had 150 units, 150 million rows, 150 units. Now, for each attribute of the property, except for strings, we had the min value and the max value. This is a common metadata that we have for Columnstore, so we had a min value and a max value. So now what happens with people put searches on the property to say the bedroom is three square feet and so and so, we were able to look at that metadata that we kept for each of those 150 units, and we were able to eliminate a lot of them, and we then have to search a very few number of rows for the qualifying properties. So that means we did not have to create indexes, non-clustered indexes to speed up those searches, because Columnstore maintained that kind of information automatically, which means they removed all of the BT indexes they had for query performance, and in fact their ETL performance improved because as you can imagine, more D3 indexes, the harder it is to do ETL right?

So this is what they did, but they did not want to go to 2016 because 2016 was not out there, and First American had this policy that they will not go to the new release of SQL Server until SP1, right? So you’ve heard about that right, I mean a lot of customers – yes, okay. By the way, that has changed in a very dramatic way because of Cloud offering, but anyway, so they had this corporate policy and so we said okay, we need to do this thing on 2014. But 2014 did not have NCCI, so what did we do? So this is what I actually wanted to impress upon our audience here is you have to think about solutions a little bit out of the box. So what we did was we said – in their case, the storage was not an issue for them. So what we did was essentially we created two row set tables, one was clustered Columnstore index on the left side, and one was whatever they had before right. Okay, now they put a layer of course in between, which we will figure out if the query should go to Columnstore or the row store. And this is exactly what they did, and I asked them how expensive was it for them to do this, and they said literally three weeks, and they were able to do this, because this was there already. This data was just a select star insert into Columnstore, you’re done right? And this is where the logic was put, and I’m going to show an example how they did it, and this is how it looks like.

For example, if I am searching for a property select list and this is Normandy, whatever the database was called, some property, their property ID is in this, and here you have like city like this, and the bedroom like this, so what they did was this part was executed inside the Columnstore, and this was on the row store. So that’s how they did it, and the numbers are insane. The numbers are insane okay. So imagine this what they did was – I’m going to go back to this slide, they did the nested query that I was showing you here, and they got the property IDs and then they fed that into this guy and they got their results back.

So the row store, which they were using before, we moved to Columnstore. The space compressed it was 560GB, and with CCI, it became 44GB. Now, they moved to a solution with 2014, so now their storage is actually 560 plus 44. You agree? Because they have CCI as well as row store. Yes, because they have to duplicate, but when they move to NCCI case, the total storage will increase only by 44GB because NCCI will be the same size.

Brent Ozar: It’s affordable, it’s cheap.

Sunil Agarwal: It’s really affordable. Storage was not the issue for them. And the CPU consumption and their query went down from 23% to 7%, because Columnstore execution is much, much faster. So you will say 23% was not a big deal, which is true, but this is average. They had peaks and valleys right, I mean, when the fall begins or when the summer begins, people are doing not for property buying and selling right? The peaks of the workload was high, so they had provision for the max. Now, this is, of course, a very good example for them to move to Azure, where you don’t worry about the max right? But of course, if they have a policy of going to SQL Server with SP1, they will not go to Cloud right away because Cloud is same thing every day, right?

Brent Ozar: That’s true.

Sunil Agarwal: So anyway, this is one example where Cloud could be extremely helpful. So this is all good, right? CPU, but here’s the phenomenal thing. The query performance on average from two minutes to two seconds, that is 60 times faster. So they were so happy with this, I mean, they are in production and of course you can imagine it created a way in their organization to see what else can we do with SQL Server and workloads, which can leverage this kind of technology.

So one example that I want to make here is that this problem that we were solving did not look like a Columnstore problem, but it worked out well in this case. And you can see how we improvise the solution of creating two copies of the data to get around NCCI, stuff like that, so very, very successful for us.

Brent Ozar: While you’re on there, there’s a good question from Ratan. Ratan asks, “Seeing that there’s two different copies of the table, makes me think of – do you recommend different storage configuration for Columnstore tables versus row store tables, like should you put them on faster or slower sands?”

Sunil Agarwal: Right, I would say this right – if you look at the total size of Columnstore is 44GB, right? If you’re running this application on a 256GB machine, chances are your Columnstore will be all in memory, right? It’s not going to take a whole lot of memory. So I would say this, that if your working set can fit in memory with Columnstore, then you don’t have to worry about storage aspect as much. Of course, ETL or loading will be whatever it is, so we have not seen a significant guideline on that this should be on this storage or this should be on this storage. Having said that, if you have a partition Columnstore index table, of course older partitions you want to put on a storage device right, that’s a typical partitioning thing. So in this example, they did not put Columnstore on a different storage. In fact, it was on the same storage.

Brent Ozar: And he also asks, “Is there different maintenance between Columnstore indexes versus regular indexes?” And we touched on a little but…

Sunil Agarwal: Let me describe – I think that’s a very important point. So let’s look into what is a fragmentation. Why do we do index maintenance, right? So what happens is if you have a B3 index where data is stored in an older way, and if I insert rows in between, the pages will get split and then the new page will get allocated from the end so the database of that index gets very fragmented. That means the range queries will run slow because I have to run around the whole disk to figure out what the pages are, right? So we of course in the B3 indexes, we want them to be physically contiguous, right? The range should be physically contiguous for the best performance. So there’s a fragmentation, this has been there since Cybase days, and you done defragmentation.

Now, in the Columnstore, the difference is, I’m going to go back to the picture if you allow me, I want to go to this picture. Okay, here, this is a Columnstore, these red guys are read only guys so when you do delete and update, these guys don’t change. They are physically contiguous or whatever it is. They are sitting there like that. The fragmentation in Columnstore is the deleted rows. If I delete rows, I have to create another structure on the side that I have not shown here, which is tracking the rows that were deleted, right? When I update a row, I delete same structure, and the insert happens into it. The question would be, hey how can you insert into a compressed store? We do not. Of course, we create – it’s a very complex data structure if you think about, we have a row store hidden inside the Columnstore index. That row store is landing zone for the new rows that are coming in, and once that landing zone becomes a million rows, we try to compress them. So what happens over time, like I was updating, doing ETL, a million rows per day, so – and you can think of it this way, another example right, after one year right, let’s say tax on all the properties change. That means I have to update all my rows. That means all the red guys that you see here will be marked deleted because everything has been deleted. So it will be fully fragmented. So what we do is we track how many rows are deleted on a given unit of million rows right, we track that information, and when you’ve done the reorganize command, we have somesort of policy based defragmentation and what it does is it removes the rows that are deleted and recompresses.

So for example, if you have two of these guys, vertical guys, a row store, a million rows here and a million rows here, if I have let’s say a simplistic way, 500,000 rows deleted from the first one and 500 from the second one, when I do defragmentation, I’m going to combine them into one and I’m going to throw away the deleted rows away. We can combine the two – I think up to ten of these guys, so by doing that, essentially you are removing the deleted rows and that is the defragmentation of the Columnstore. So it is quite different than what you have for the row store.

Brent Ozar: So it’s not just making each individual row group smaller, it’ll also combine row groups?

Sunil Agarwal: Combine, yes. Combines them as well. So here’s the interesting part, I think since you mentioned. We have this rule of 10%. So if I have a million rows, and if I delete more than 10% rows, that means 100,000 rows, only then I will remove those rows. I will take that row group, I’m going to compress it back to 900,000 rows. And if I can combine with something else I’ll combine, but the goal is the row group cannot exceed million active rows.

Brent Ozar: Then Damian asks, going back forward a couple of slides, he says, “When this company says that they sent some of the queries to the row store table and some to Columnstore, was it just a matter of rewriting these queries so that it has like the sub-query approach or did they actually have an intercept layer in the app that would send different queries in different places?”

Sunil Agarwal: I actually don’t know, because they were very confidential about how they were doing their – they just send us this example. But think of it this way right, they have to intercept. I mean, the query, the interface, the page which was form based, right, you essentially say bedroom size so and so, basically comes into some app tier layer, and then they are generating SQL query, so it has to be done at that level, but it was done very complex, because they were able to do that with one person in three weeks.

Brent Ozar: Okay.

Sunil Agarwal: Yes, anything else?

Brent Ozar: Nope, that’s all.

Sunil Agarwal: Alright, so far so good, I mean you guys are liking it?

Brent Ozar: Yes, they seem to be paying attention.

Sunil Agarwal: Okay. So the next thing that I want to show you was a very interesting application, and this is a hybrid application, hybrid in the sense they had both transactional workload and the analytic workload, alright? So there is a company called FIS, forget the full name, it’s financial something something, and – so they actually have they have is an – I’m not an expert in this security market so I’m going to hand wave here. Basically, they have these traders who are making trades, and they also need to see the trend. I mean, if I bought Microsoft share, thousand shares or ten thousand shares, how the securities are trending in real time, and based on that I can make some day trading or hour trading, whatever it is, I make some decisions.

So from my perspective, whatever he explained to me, it was clear they have transactional workload, where they are buying security or selling securities, and also they want to do analytics on those things that are they’re doing as well as being done in the market, right? And the way they were doing this was they were moving the trades every five minutes, so they are running ETL program every five minutes into a data warehouse, and the traders, when they were taking the trade, they were making the trades in real time, but the trends they were seeing were five minutes older.

Now, according to them, each minute can make a difference of huge amount of money, they wanted to have this analytics to be done in real time. That means zero ETL time, and the only way that is possible is if you run analytics directly on the transactional tables. Now, before Columnstore, that was not really possible because queries will run slower, they will mess up their transactional workload and so on, right? So that was not an option, so they were actually in a very hard and a rough place, they said well, we can’t mess up the transactional system, but our traders want real time analytics, what do we do?

So that was a problem, and I think this is an important point, because HANA if you know about, HANA actually was the first – I won’t say the first, but the first big one that allowed customers to run OLTP workload and analytics on the same tables. That is what HANA’s big deal is. If you know about Oracle 12C, they came out with an in-memory option, some of you may have heard about it, what it is, is they are creating a non-clustered Columnstore index on transactional tables, and they maintain that index in memory, stuff like that. So every vendor actually is recognizing that there are scenarios where analytics cannot be – they cannot have any latency in analytics. They want to do analytics in real time. But that was their problem. So as I was telling you about, they run this – they were running on 2012, row store, transactional processing, ETL every five minutes.

And the funny part was – and this is something I don’t fully understand, their transactional volume was 1.2 million rows per eight hours of trading. That volume looks low, so they were not really trading Microsoft shares, they were trading something else, but the point was they were 1.2 million rows per day. So that was the extent of their transactional workload. 1.2 million rows, and every day it becomes like zero, and then start again. So the question really was, if you have 1.2 million rows, the Columnstore index will not really help you. Because if you create a Columnstore index on that table, at least for the first six, seven hours, the rows are less than a million rows. It will not be compressed, so you will not get any benefit from Columnstore. So this clearly did not meet our criteria, and again, this is what I’m saying, think outside the box, so what we did was – so I told you what the challenge, this was a challenge. They wanted to do analytics in real time. So no Columnstore benefit, 1.2 million rows, starting from zero every day, and so what do we do?

That’s our question, and so first thing we said, let’s not worry about 1.2 million rows getting compressed only towards the end, let’s compress the rows and see if that meets the requirements of your dashboard. What they had was a dashboard where they see all these trends, they had eight forms – I would say, what is that called, I don’t know what the term is that, you have a webpage and you have eight…

Brent Ozar: Eight like portal parts or graphs or…

Sunil Agarwal: Yes, they had eight portal parts, and for each portal part, they were querying the SQL Server to fill in the information there. But they had eight different queries they were running, and that thing has to be refreshed under three seconds. That was their goal, that I should be able – if I go and say I need to have all eight queries run under three seconds. So what we did was we said well, we will try non-clustered Columnstore index even though we know 1.2 million rows over the course of eight hours does not make sense with Columnstore, but we’ll try that, and also we will try in-memory OLTP because hecaton with Columnstore index. So those were the two solutions that came to our mind, and we said let’s do the testing.

So first thing what we did was we said well, let’s take the best case, because if everything was compressed, you run your report on the dashboard and if the dashboard can be rendered under three seconds, then at least this technology has legs to run on. Otherwise, it’s not even a solution for you, right? So they compressed everything and they ran their dashboard and everything was running fine, so it was very clear to us that only if Columnstore was compressed from row one onwards, which is not possible, then their solution would be cool, right? We could do that.

So here’s what we did. We created an NCCI on the column, on those row store tables, that was not a no brainer. You had to create an NCCI because transactional workload has to run, we had to run analytics. What we did was we forced compression every hundred thousand rows. Manually we said every hundred thousand rows, so that means every 45 minutes, just run the forced compression, and what is a forced compression command? Actually it is a reorganize command with a force option. This is available in 2016, it was undocumented in 2014, but it is available as documented. So all you do is alter index, the non-clustered Columnstore index, I forget the syntax but reorganize with forced option. So what happened was, every hundred thousand rows, it was getting compressed. So instead of getting two row groups, I was saying each row group is of size one million, still getting at most two, we were getting ten because we were compressing every – actually we are getting 12, because every hundred thousand rows. And so that’s what we did, and what we found was that the performance was still within the SLA because clearly hundred thousand rows were not compressed. Hundred thousand rows I compress, then the next hundred thousand rows are not compressed, those will be slower, forced will be faster. So the performance will not be as good as it was when everything was compressed. And this is the number they got, so on the thing that you see on the highlighted is the improvement with Columnstore NCCI, this is the performance with a row store, and this is the performance with the NCCI, non-clustered Columnstore index. And you can see it has improved in a pretty dramatic way, depending on 12x, 18x, whatever not, and the queries that I’m showing you on the left side are those queries I think that you use right round, filling those forms or filling those portal parts. And in the number that you see on the right side, this is the graphical representation, the green window is with NCCI and you can see the fluctuation right? Because when the data gets compressed it runs faster, when the data is not compressed – you know what I’m saying? So it will slow down.

Brent Ozar: And at only 41 bath requests a second, I mean, you can afford to rebuild it in the middle of whenever.

Sunil Agarwal: Which one?

Brent Ozar: If you’re only getting it looks like on average they’re getting 41 queries per second, and if you divide out 1.2 million…

Sunil Agarwal: Yes. And so they’re live in production, and this is the case study they have and I think if you – I think your attendees will be able to download the deck right? I’m going to send it to you. And this is what the vice president, Aaron, he was actually one of the persons whom I worked very closely with, he is the only vice president I know who is as technical – yes, he was very deeply technical, and in fact, the funny part he says was he didn’t have to involve all our DBAs, and I said that is insane, I will not create an index without a DBA without knowing about it, but he said involve DBAs, but that’s what vice president can do…

Brent Ozar: You don’t usually want to see vice presidents adding indexes, but…

Sunil Agarwal: But anyway, so anyway, so the thing was, they were able the run the queries that much faster, and without changing anything. I mean, it was like – I mean, they were just impressed. In fact, Aaron will be presenting with us in Ignite conference in September about this thing. So I think let me summarize this part, and I think my time is – I have another customer – I’m going to keep going.

So I think the interesting thing about this one was think of this right, it was obvious that we should use non-clustered Columnstore index because it was real time analytics. You have transactional workload, and you want to do analytics on top of it, but it did not make sense because the number of rows was much, much lower. And we got around that issue by forcing compression every hundred thousand rows, just have a batch job, an agent job that is running every 45 minutes and compressing the rows, and it worked out well. So they were pretty exciting and they are live in production, so yes, it is very exciting.

So the last one that I want to show you about is again, real time operational analytics. It’s a company called M-Files. M-Files is a competition to Share Point, and interesting thing is I think the problem that I see with Share Point is when you upload a file directly, it is very hard to see that file with a different property. For example, if I have you know, three departments, I have department one files, department two files, if I want to see all the files that have a certain property, I mean, there is no easy way to find that in Share Point. So what M-Files solved that problem saying we will store the files in a very – in one place. However, what we will do, we will keep the metadata, which is the property of those files in SQL Server tables, and now we can show you the organizational files in any which way you want to see, and then you can just click on those properties and refresh the file from the file store. The files were not part of SQL Server.

So here example, I’m showing you a proposal here, and they have these different properties with the red tags here, and these properties can show into some directory searcher like structure they show to the navigation bar to their application, and they can do whatever they want. Okay, so this is their application, so what they do is, let’s say I want to upload this document, like a Share Point, I’m uploading this document, they have all these properties, so what happens is when you insert a document, when you upload a document, they are inserting approximately 60 rows into different metadata tables that contain the properties of those tables.

Now, if I change a property of a document, I have to go around and change a lot of updates – you see what I’m saying? So there’s a lot of insert delete updates that are going on, this is a transactional workload in those meta tables that are going on, and now if you have that, you can see the – you can render the structure of the documents I’m showing in the left example like this, or by proposal by year and month, you see what I’m trying to say?  I can see the same document in different ways and forms. This is what they have.

Brent Ozar: Business intelligence for documents basically.

Sunil Agarwal: That’s right, and this solution they actually implemented on SQL Server 12 or 14, and they were fine, this was working fine for them. And what they found was as number of documents increased beyond 10 million, 10 million documents and above, the metadata manipulation was much, much slower, and they could not meet their SLA, and their SLA was that insert, when I upload a document, it should complete within one second. That metadata update, and mass updated document they had much larger changes to the document, that should take two to three seconds. So basically, I upload the document and everything should be done within three seconds for me.

So those were the requirements, and the listings that they were doing, listings actually were – I’m going to show you a query, there were no aggregates in that query, they were saying show me GroupBy department – you know what I’m trying to say? Those kinds of properties, I want to show that. So the rendering of the metadata they were showing was using analytics kind of queries. Now, they wanted to see those listings in real time, and what they were finding was the number of documents were increasing and increasing, they could not render those structures in real time. That was the problem. So if their customer was going to go beyond 10 million, they basically said create two stores and stuff like that, those were like hacky workarounds right, it doesn’t work.

Okay, so what do they do? So like this is a work around they had, and they were saying their customers were thinking of a hundred million objects in one ward, so the question really was once you go beyond 10 million, they were slowing down, and their customers were expecting 10 million same tier. So that was the problem they had, and in this scenario, they evaluated Hadoop, HP Vertica, SQL Solutions, they did many things because they were in a fix. They said how do I help my customers? And the challenge they found with these technologies was their non-existing support for [inaudible], right? I mean, they want to have full [aramid] consistency, full properties that transactional system give you, they did not have those in a very strict way.

And also, they wanted solutions that could be on premises because some customers of theirs was on premise and some customers wanted them to provide as a service. So I just upload my document to your service and you manage everything, I’m just going to do my business. So they were looking for a vendor that could do both, and SQL Server or Microsoft for that matter, actually had both of these things. So I remember they came to our lab in the summer of 2015, and we were working on NCCI and we said this is a – because you’re running analytics kind of queries, so NCCI is a good solution, transactional workload because you’re updating a lot of rows, so let’s do that. And the beauty was when you create a non-clustered Columnstore index, there’s no application change, it’s an index. And another thing that is important is a lot of times, you have a transactional workload but people have in that transactional workload some indexes created only for queries, because you may want to query.

Now, what happens is when you create a NCCI, some of those indexes might not be needed. In fact, we have one customer where they created a NCCI and they dropped three other indexes that were not needed anymore. So the overhead became even lower to maintain an index. The point I’m making is just because you’re adding a NCCI index does not mean you have one more index, it could be effectively a lower number of indexes.

So anyway, they created this index, this is how it looks like. You have this B3 indexes, which they had for their application, and the metadata table which is a metadata of the documents, and then we created this NCCI, and the results were phenomenal for them, and they had an environment with 77 million rows and the total database tested was 5TB without compression and notice, this database does not have the files. And it is true that you create a NCCI, which is index plus one index, and maintaining NCCI is a lot more expensive than a traditional index, so the impact on the OLTP workload was less than 5%. That was one of the big concerns they had was if I create a NCCI, will my OTLP…

Brent Ozar: Much slower.

Sunil Agarwal: Yes, how much slower, because they still had this SLA of one or three seconds I was showing in the beginning, and this is the number they got. That after – OLTP performance 5% overhead was okay, and OLAP queries which is rendering the directory structures, stuff like that with different properties, red shows before and after is – green is after. So from 6.5 seconds to 0.5 seconds, that is 13 times, and then 2.6 second to 0.5 seconds, so 52 times. So they were able to get this kind of a gain without changing their application. So they were just – I mean, they were pretty excited. Hey, I could solve my problem without – and this is the kind of query they were running. This is like a query that runs over NCCI. So I don’t want to understand what this thing is, but I mean [crosstalk]. But you can see, this is a mean query, will take lots of joins, and there is no aggregate here. These are like top, show me, whatever it is, right?

Brent Ozar: Individual rows, yes.

Sunil Agarwal: And this is what they said, this is their actually founder and CTO, he was here, and he said 10x faster, I mean, I highlighted a few key items and I think the most exciting part for them was no changing of the application.

Brent Ozar: I bet the storage overhead was pretty small too, I mean, if you’re talking about…

Sunil Agarwal: Exactly. Think of this right, if I assume 10x compression and you’re not getting NCCI on every column right, NCCI you will [inaudible] column too, right, actually I don’t know about how much storage overhead they had, but I should find out, but you’re right.

Brent Ozar: It’s got to be low.

Sunil Agarwal: Yes, so this pretty much are the four customers that I wanted to talk through. Just want to summarize for you is at additional data warehouse where they just move from row store to Columnstore index, and the number they got were much, much superior to SQL Server 2014, we understand somewhat why, and also against all our competition. So again, of course not every workload will be best for SQL Server, but the important thing is this was one good production quality workload, not a benchmark, where we could show that we beat the competition hands down.

The second example that I showed about the title company where it didn’t look like a traditional data warehouse, Columnstore was able to solve their problems. Third one was that 1.2 million rows which did not fit the Columnstore model, but we made it work, and this one, the real time analytics. So I think there are many more customers, we have – I don’t recall how many that we are working directly with, but there are many customers I see through the post and everything, they move to Columnstore index and they are just happy. So I hope this sort of gave you some idea about how Columnstore technology can be deployed in real workloads and playing some tricks and make your customers successful. So Brent, that’s all I have, thank you so much for your time. Any questions, I’ll be happy to answer.

Brent Ozar: Yes, very nice. There are several people that asked or had said they were really impressed by the unusual solutions along the way. Joe asks, “In the example of” – and I forget which example he was looking at, it was the guys who did the forced rebuilds – he said, “Do you know what drove the performance gains with such small row groups?” He said, “Is it mostly batch mode processing that enabled those kinds of speed ups?”

Sunil Agarwal: That’s right, so yes, I think so. Think of this right, batch mode processing gives you at least four to six times better query performance over without batch, and the reason I’m making that statement is because we have a trace flag. SQL Server uses snowball trace flags, so there is a trace flag which is of course, you can turn off batch mode processing, we don’t recommend that, turn off and see the query performance before and after, so batch mode is a big deal. Aggregate pushdown is a big deal. So some of those improvements, so it’s good. So think of this right, the unit of process is like a row group. Now, so at that level, I’m thinking that if you have a thousand row groups versus ten-row groups, the query that you’re running across all row groups will be much, much faster with ten-row groups because – you see what I’m trying to say?

Brent Ozar: Eugene asks, “If someone understands how the analysis services tabular engine works, how much of that knowledge transfers to SQL Columnstore and vice versa?”

Sunil Agarwal: Right, so Eugene, that’s a great question. So think of this right, analysis services is a great modeling tool. I mean, you can model – I have not used the technology at a level where I can say I’m expert, but you can model a lot of business scenarios, stuff like that. Now, when you run the query, most commonly, customers have used this multi-dimensional model where you pre-aggregate the data and then you just run queries on those things, and the reason the pre-aggregation was done because the native technology inside SQL engine or Oracle engine or whatever you name it, did not have the features that could run the query at a level where I can get response in seconds. So they had to pre-aggregate the data, put in multi-division cubes and the nightly report will just run those guys and then you get the results back.

Now, the challenge with that one was if I wanted to do drill down or if I wanted to do something else and there is no cube, I have to contact my DBAs or whoever the guys are, and what I’ve heard – I could be wrong here, it can take weeks before your query can be run because they have to pre-aggregate the data the right way. So that was a model I’m definitely I know in 90s I used to work for Cybase and it was like that. Now, with the Columnstore, people are questioning, do I really need to pre-aggregate if I can run the query a hundred times faster, it’s good enough, right? So analysis services in 2016, they have a tabular model, you know about that, what they did was they have a direct query mode, and the direct query mode with analysis services 2014 did not generate good SQL queries. I would say, I can write a bad SQL query and I can write a good SQL query for the same result. So the queries in many cases were not optimal, and of course, if you give a bad query to SQL optimizer, it can do something but not – it cannot change it numerically, right?

In many cases it can, but the point was the quality of the query was bad. So in 2016 of analysis services, they improved the quality of those queries, and what they ran was they ran the – on SQL Server 2014, this is an interesting point, so I hope I get it right. They used SQL Server 2014 engine with Columnstore and they ran the workload with analysis service 2014 in their query mode, as well as the 2016 analysis services in direct query mode and they got between 5 and 15 times faster results because the queries were much higher quality. So the point I’m making is the SQL Server 2016 analysis services direct query mode gives you much, much better query performance because of the quality of the query, and that combined with the Columnstore, in many cases, people are saying I will move away from storing the data in pre-aggregate form in the analysis service cache, I go to direct query mode, and the beauty of that one is – I just want to answer in a very long way, the beauty of this one is you know, when you pre-aggregate the data or you load data from SQL Server onto analysis services, there is a latency in the analytics because you have to move the data from ETL to the warehouse and then from warehouse to the cube. Now, the dev query mode, you can query the data as soon as it lands into the data warehouse, and you can slice and dice to the lowest granularity. So I think our recommendation is for people to definitely move to direct query mode, what I’ve heard is those kind of modeling is a lot more chatty, which is not going to work very well, so some customers I know have done it successfully, and in the case of Daman that I showed as an example before, they in fact are running direct query mode. They are not using analysis services, they have these complex queries that are generated by their GUI tool, or the application tier, and they are directly turning into the Columnstore; and the performance is pretty good.

Brent Ozar: It’s a great testament to the work that the team did, there’s so many people have been happy directly just dumping any kind of data warehouse and going forget it, let’s just query the data live and it’s worth it.

Sunil Agarwal: It’s worth it, yes.

Brent Ozar: Well, thank you so much for your time today Sunil, really appreciate it. Excellent presentation, people are welcome to give virtual applause now over on Slack.

Sunil Agarwal: I can hear it, I can hear it, thank you.

Brent Ozar: There we go, thanks everybody for hanging out with us today and we’ll see you guys over at the next GroupBy.

The following two tabs change content below.

sunil agarwal

, ,
Previous Post
Green is good, Red is bad – Turning your Checklists into Pester Tests
Next Post
The Smart Way to Troubleshoot SQL Server Cluster and Availability Group Outages

7 Comments. Leave new

I haven’t used columnstore indexes yet, but am interested in how to use them within SQL Server for analytical work in the OLTP database. We do a lot of that, as we have not taken on implementing SSAS yet. Lots of stored procedures that run every 10 minutes to do aggregations and calculations and persist them in “store” tables, that reports can use as a data source (because those procedures take too long to run in real-time). If columnstore would let us run them in real-time with good performance, that would be a big win.

Reply

Sounds like a great session. As Sunil is so modest I’d like to point out that he is the Program Manager at Microsoft who has been responsible for columnstore indexes since SQL 2012. So vote for & watch this session if you want inside insights into columnstore technology and how to implement it in the real world.

Reply

I have faced problem of the wait type SOS_SCHEDULER_YIELD during use of noncluster columnstore index as mix mode OLTP and Reporting environment on same server as mentioned above post.
Is there any solution for SOS_SCHEDULER_YIELD wait type. I have tried also maxdop etc..

Reply

I have faced problem of wait type SOS_SCHEDULER_YIELD during use of noncluster columnstore index as mix mode OLTP and Reporting environment on same server as mentioned above post.
Is there any solution for SOS_SCHEDULER_YIELD wait type.
I have mentioned here my scenario.

Operating System: Windows Service 2012 R2 Standard
Processor Intel(R) CPU E5-2650 V3@2.3HGz (2 processors)
CPU Core 20 core, 40 logical processor
RAM 64GB
Database Application:SQL Server 2017 Standard

CPU Core Allocation: Automatically set processor & I/O affinity
 RAM Allocation for SQL: 56 GB
 Cost Threshold for Parallelism: 50 (Changed 5 to 50)
 Max Degree of Parallelism: 4 to 8 (Diff approached)
Database Compatibility SQL Server 2017 (140)
Table Data Count: 350 million

Temp Database Files: 8 to 12 (Diff approached) with 64MB Auto Growth

Problem Faced (Locks/Waits):
 SOS_SCHEDULER_YIELD: 60 to 70 %
 PAGEIOLATCH_SCH: 5 to 10%

During the execution of any heavy report, all OLTP or any request locks or waits. As a result
Execution Timeout Expired. Most of waits are SOS_SCHEDULER_YIELD.

Please suggest me.
Thanks

Reply
Mukunda Bhattarai
June 24, 2018 1:02 pm

Excellent presentation sir.

Reply

Leave a Reply to Mukunda Bhattarai Cancel reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Menu