Saturday, December 3, 2016

Data Channel - Interview 01 - Andreas Wolter on Columnstore

Dear all,

Welcome to the first ever interview of DataChannel. I am honored to have Mr.Andreas Wolter, a German MVP, MCM, MCSM for the first interview. Detailed profile of Andreas provided below.

Andreas Wolter has over 16 years of experience with SQL Server and earned the Master-Certification both on SQL Server 2008 (MCM) and SQL Server 2012 (MCSM)– as one of only 2 experts worldwide. Besides that he has also been awarded with the MVP for SQL Server since 2014.

He is the founder of Sarpedon Quality Lab, originally from Germany, specializing in Development and Optimization of SQL Server Database- and Datawarehouse-architectures with focus on performance, scalability and security.

Email Address: Andreas.Wolter@sarpedonqualitylab.com
Website: http://andreas-wolter.com/
Facebook: https://www.facebook.com/andreas.wolter.01
LinkedIn:https://www.linkedin.com/in/andreaswolter

Topic of discussion is Columnstore Index. 

Interview video recording provided below:




Excerpts from the Interview provided below:


 What is Columnstore Index, how is it different from traditional Index?

ColumnStore Index is supposed to be optimized for access of large volume of data in the range of millions of rows. Columnstore index is optimized for single column access queries which are common in datawarehouse style applications. Columnstore Indexes design combined with underlying compression helps in fetching large volume of data at a much faster pace from OLAP style queries. Columnstore index has improved so much more in SQL Server 2016 to make it much more flexible.


What are the standard gains for applications using columnstore index?

First, application workload needs to be suitable to columnstore index.  The major gains would be

Compression: If the application is a typical datawarehouse application with "star schema" design, with 20 million or more rows, one is likely to receive 50% data compression by using columnstore index. Effectiveness of compression also depends on the data type of columns

Speed: On the speed front, it can be double or triple times or sometimes even 100 times faster, depending upon how the query has been designed and how the query was performing previously.

 How does SQL Server make such a improvement possible? What is the architecture of columnstore that makes it possible? 

It is a fundamental change in the data storage making it possible. Columnstore is hugely compressed resulting in massive reduction of I/O. Even today, in the era of SSDs, disk IO is the slowest part of the system. Compression in columnstore does provide half of the improvement in columnstore. And of course, there is more to it too.

How does one pick the tables that would benefit from columnstore? How does one design the application to leverage upon the benefits of columnstore?

Ideally datawarehouse style applications would benefit. Tables running into several million rows and wide tables with many columns (for ex 50 columns) would benefit from columnstore index
The width and length table need to be huge to benefit via columnstore index. The type of queries commonly executed by the application is also important. Columnstore would help queries fetching larger number of rows and wouldn't make a difference on queries reading 1 or 2 rows. Especially queries which involve aggregation functions with group by clause reading lots of rows would benefit the most.

How are the columnstore indexes stored on the disk? Are they differently stored compared to traditional Indexes?

Totally different. Only thing that is common is 8KB Pages for storing data. Unlike traditional Index pages which contain data of multiple fields and multiple data types, columnstore stores only one type of column data on a single page (One data type). These pages are organized as segments and each segment contains data from just one single column. This makes large compression possible.
Second part is, when fetching the data, if the query is selecting only 5 columns, column store reads the only the segments that contain the 5 columns. The traditional index would be forced to read all the columns as each page contains all the columns of the table or index.

Columnstore Index is one of those technologies which has improved a lot since 2012. Can you drive us thro the lifecycle of improvements it has had till 2016?

Columnstore was actually first introduced in SQL Server 2008 in the Parallel Data warehouse edition of SQL Server. 
In SQL Server 2012, once the nonclustered columnstore was put on the table, the table was write protected and supported only read only queries. Due to this limitation it very few people adapted or started using it.

In SQL Server 2014, clustered columnstore was introduced. Clustered columnstore by design contains all columns of the table. Clustered columnstore was updatable. However, Non clustered columnstore was still read only.

In SQL Server 2016, both clustered and non clustered column store indexes are updatable. However, one can't mix columns on clustered and non clustered indexes. Meaning, one can either have a clustered columnstore index or a non clustered columnstore index on the same table.

We understand that clustered columnstore index would contain all the columns of the table. Do I still need traditional indexes?

Yes. Good question. You will still need them if you have queries which pick smaller set of data. Traditional non clustered indexes are required for single row or fewer row lookups. Fetching lesser number of rows or specific rows from compressed segments would be harder and traditional indexes would help those requests.

What is Real Time Operational Analytics?

Let’s split this into 2 parts - Analytics is what we have discussed using all along using columnstore. Real time operational data refers to the data that is constantly changed via inserts / updates / Deletes. 
Real time operational analytics refers to the same table being used for two kinds of workload - both regular operations like insert / update /deletes and also datawarehouse style analytic queries. This has been made possible in SQL Server 2016 via the "Delta" store in columnstore indexes. "Delta Store" is an area of columnstore, which contains the hot data or recently changed data in an uncompressed format. This combination of static data on compressed segments and hot data on "Delta" Store makes real time operational analytics possible in SQL Server 2016.

The other buzzword is In Memory OLTP. Can it be combined with columnstore? How would applications benefit out of these two technologies?

The other much improved technology of SQL Server 2016 is in "In Memory". Yes, In SQL Server 2016, one can combine the columnstore technology and In memory table, which gives faster access to real time data. "Real Time Operational Analytics" with "in memory OLTP", makes the access to operational data lock free, latch free and also let analytic workload benefit via columnstore technology

What are the warning signs in using columnstore? What are the "Dont's" while using columnstore?

The queries/ workload used should suit the design of columnstore index. 
1) Application with most of the queries using "select * " or many columns may not benefit via columnstore index
2) Database schema design also matters. Shouldn't be a too complex schema design like snowflake / galaxy schema design. Datawarehouse applications using star schema design are most suitable
3) Functions or queries used also matters - For ex: - functions like min / max are suitable while commands like "distinct" are unsuitable for columnstore

Tuesday, November 29, 2016

SQL Server 2016 SP1 – Rocking RELEASE!!!

Last week, I did a presentation titled "SQL Server 2016 SP1 – Rocking RELEASE!!!" to Singapore SQL User Group. The Presentation was about the things I covered over here and few more cool features released with SQL Server 2016 SP1. The best part of it was it was recorded :) This is my first presentation that has been recorded and uploaded in public platform. Obviously excited sharing it to all. Special thanks to Paul Lorret and engineers.sg team for recording it. 
On the presentation, it was certainly not my best as I had just 2 days to prepare. But still worth sharing and had fun doing it  :) Link provided below.



Slides and scripts can be downloaded from here

Saturday, November 26, 2016

Data Channel - Launch

Dear All,

I am starting a YouTube channel for SQL Server and Microsoft Data Platform Technologies called "Data Channel". What this channel is going to contain is Interviews with SQL Server and Data Platform experts like MVPs, MCM , Community Leads and Microsoft Professionals too. Interviews are generally 15 to 20 minute long,giving us a quick understanding of the subject discussed. The intention behind starting this channel is over a period of time it would develop into a huge volume of useful technical information for our community. I have shot a quick promo for the same..Take a look at it below.






If you like this idea, please subscribe to the "Data Channel" over here.  Please share your comments and follow this space for more updates on the channel. 

Regards,
Nagaraj



Sunday, November 20, 2016

SQL Server 2016 - SQL Service Doesn't Start after GDR Security update

Recently, after an security fix GDR (https://support.microsoft.com/en-us/kb/3194716) on  SQL Server 2016 RTM, SQL Service refused to start. Though, the patch completed successfully as per 
"C:\Program Files\Microsoft SQL Server\130\Setup Bootstrap\Log", the Event viewer under "System" gave the following error

"The SQL Server (I2016) service terminated with the following service-specific error: %%945"
Error didn't make much sense

The eventviewer under "Application" section gave a lot more meaningful error

FileMgr::StartLogFiles: Operating system error 2(The system cannot find the file specified.) occurred while creating or opening file 'D:\MSSQL2016Root\MSSQL13.I20161708\MSSQL\Binn
\mssqlsystemresource.ldf'. Diagnose and correct the operating system error, and retry the operation.





From the error it was fairly clear that "mssqlsystemresource.mdf", "mssqlsystemresource.ldf" were missing. A windows search of those two files showed that the files were mysteriously moved to "C:\Program Files\Microsoft SQL Server\130\LocalDB\Binn"

Even though, there is no local DB installed, the security fix had likely moved the "mssqlsystemresource.mdf", "mssqlsystemresource.ldf" to a different folder causing the SQL Server to fail while starting. 

The fix was simple. Copy "mssqlsystemresource.mdf", "mssqlsystemresource.ldf" files from "C:\Program Files\Microsoft SQL Server\130\LocalDB\Binn" to its original location which was "D:\MSSQL2016Root\MSSQL13.I20161708\MSSQL\Binn\". After the file copy SQL Service started as usual. 

The above issue occurred on Windows 10 laptop with SQL Server 2016 Developer edition. Please note that, GDR was applied automatically by windows patch update. Yes, we would not schedule auto update of SQL Security Patches on production machines but suspect the issue would have occurred even had the GDR been manually installed.

 Hope this post would help someone, trying to troubleshoot SQL Service failure issue after applying security fix.

Thursday, November 17, 2016

SQL Server 2016 SP1 Release - Thoughts and Comments!!!

SQL Server 2016 SP1 has released last night and most of the features that were previously part of enterprise edition have been made available in standard edition now!!!! By far, the biggest change any service pack brought about in any version of SQL Server!!!

List of items provided below which moved from enterprise to standard provided below. Read about the announcement over here

  • Change Data Capture
  • Database Snapshot
  • Columnstore Index
  • In-Memory OLTP
  • Row Level Security
  • Dynamic Data Masking
  • Always Encrypted
  • Fine Grained Auditing
  • Data Compression
  • Multiple FILESTREAM Containers
  • Partitioning
  • Polybase
That's quite a change and I am super excited looking at the list of items. Quick comments few of those items.

Column store Index:

A game changer technology especially in data warehouse applications now moves to standard. But, the concern is, it is a memory driven concept known to consume significant amount of memory. The memory limit of 128 GB still holds good on standard edition. To add to that, SQL Server 2016 SP1 sets a hard limit of 25% of maximum memory set for SQL Server for the  memory used by column store index. For example if 48 GB is allocated to SQL Server, column store can use maximum of 12 GB. This implies, theoretically column store index can maximum take only  32 GB of RAM. However, the memory used by Columnstore is not counted within buffer pool's quota of memory.  So it implies, if 48 GB of memory is set for SQL Server, then SQL Server's total memory usage can go up to 60 GB ( 12 GB for column store + 48 GB of buffer pool )

Though the extended memory usage provided for Columnstore does sound useful, it may not be so effective as column store index is at its best only when it operates on large volume of data with sizable amount of memory.

In-Memory OLTP:

In my opinion, one of the best moves of SP1. Reason is even though "In memory" is ideally suitable for systems with few hundred GBs of RAM, it can prove handy for some of the mid size ones too. 25% of max memory hard limit also applies for "In memory" tables (similar to column store). For ex, lets say we have a 64 GB of RAM for SQL Server, and then 16 GB would be available for "In memory" tables.

1) Though the memory available may sound to be small, it can still be useful to place the most important and most accessed tables take advantage of "In memory OLTP". Please note that most important tables of the application may not be the largest table by size.

2) "In memory OLTP" can help applications get rid of locking and latch contentions almost completely.


3) Remember, "In memory" is not just about placing tables in the memory. SQL Server reads and writes to them differently enhancing the overall performance of the application


So, this is an enhancement which I will be pushing my application teams to take advantage of for sure.

Security n Encryption Related Features: - Row Level Security / Dynamic Data Masking / Always Encrypted / Fine Grained Auditing

It is a excellent move to place security related features in standard and other editions. My personal take has always been that security related features should be available at all editions of SQL Server. The reason why security related features needs to be available for all is application developed may not have high performance requirements demanding enterprise edition but would need key security features of SQL Server. For example, a small shopping cart application which just needs SQLExpress, deals with sensitive data ( credit card numbers, account balance etc..) requires  features like "Always Encrypted" or "Row Level security" for developing secure applications.

Partitioning:

Another welcome change as Standard edition does deal with terabytes of data and partitioning does make lot of their lives easier

Change Data Capture / Database Snapshot :

Commonly used features in development or staging environments and making them available across all editions makes it better for environments which use different editions for Development and Production.

What remains in Enterprise:

High Availability feature like "Always ON Availability Groups" ( barring basic AG which was made available in standard since SQL Server 2016 RTM ) and "Online Index Rebuild" remain in enterprise.  Capacity limitations of lesser editions ( max 128 GB, 24 core processor limit on standard etc.. ) still remain. This makes absolute sense, as these features are primarily required by applications with extreme performance requirements and they should ideally be on Enterprise editions ( without capacity limits ).

Also notice that the features on enterprise only are more of "Infrastructure" related instead of "Development" related. This also is a move in right direction as one needs to have the same editions at development environment and production environment to make application development effective.

Overall, indeed, one of the best announcements in the entire history of SQL Server. No doubt, it is going to make several applications move to SQL Server 2016 soonest and makes my job of convincing application teams easier :)

Sunday, November 13, 2016

Residual Predicate and "Number of Rows Read" in Execution Plans

What is Residual Predicate?

Lets say one gets a query to fine tune. While checking the execution plan, if there is a Index seek, it is common for DBAs to think that the query performance is good and acceptable. Though in most of the scenarios, the idea of looking for index seeks is acceptable, there are quite a few scenarios where index seeks simply doesn't mean optimal performance. One such scenario is explained below. Consider the following query

SELECT *
FROM [Production].[TransactionHistory]
WHERE Productid = 801
AND TransactionID % 3 = 0

Table has a non clustered index on "ProductID" and clustered Index on "TransactionID".

 
 
 
The picture above as expected indicates an Index seek.
 
 

Screenshot of Index Seek Operator's details provided above.  The seek predicate section at the bottom indicates that "Index Seek" operator was used for "ProductID = 801" filter alone.

Observe the section marked in red. "Predicate" section shows
"[Production].[TransactionHistory].TransactionID % 3 = 0 ".
 
What it implies is the index seek filtered only for "ProductID = 801" filter condition. Additional filtering (outside the index) had to be done  for "TransactionID % 3 = 0", after the Index seek operation. This additional filters for the rows that are extracted from "Index Seek" are termed as "Residual Predicates". If the work done by residual predicate is too high then it implies that the Index is not effective.  
 
"Number of Rows Read" n "Residual Predicates"

On the last post, I wrote about "Number of Rows Read". Just to recap, "Number of Rows Read" indicates the number of read by the operator. "Actual Number of Rows" is the rows returned by the operator. Observe the section highlighted in Green in picture above.

Number of Rows Read: 519
Actual Number of Rows:171

The above numbers imply that Index seek operator's seek predicate ( "ProductID = 801" ) filtered 519 rows. The additional filter " "TransactionID % 3 = 0" filtered it further to 171 rows.

The difference in "Number of Rows Read" and "Actual Number of Rows" is due to the additional rows filtered for "Residual Predicate". "Number of Rows Read" information on execution plans has made it much easier to track the additional costs incurred due to "Residual Predicates"

Thursday, November 10, 2016

SQL Server 2016 - “Number of Rows Read” on Execution plan


One of the coolest things to release with SQL Server 2016 was “Number of Rows Read” information in execution plan. If you are wondering, what it is look at the picture below




What “Number of Rows Read” operator actually talks about is the total number of rows read by the query plan operator. This is not the same as the total number of rows returned (as output) from the operator.  Total number of rows returned is provided by “Actual Number of Rows”  on the query plan.

For ex:  Let’s say we have the following query


SELECT  *
FROM [AdventureWorks2012].[Production].[TransactionHistory]
WHERE  Quantity = 2 

 Assume the above query does a table scan and filters the data. Then the “Number of Rows Read” would be total number of rows in the table.  “Actual Number of Rows” would be the total number of rows filtered by the "where" clause.

Obviously, the operators with significant differences between “Number of Rows Read” and “Actual Number of Rows” are potential areas to watch out for while query tuning. On upcoming posts, will cover scenarios where “Number of Rows Read” can be put to good use.