Friday, April 30, 2010

Plan Cache - Part 3 - What gets recompiled


As a continuation of series of posts on Plan caching, let us get started with Part -3
Earlier posts are given below:

Plan caching - part-1 - Introduction
Plan caching - part-2 - What gets cached

Last post explained how stored procedures,prepared statements are cached. But there are a few scenarios where caching doesn't work, and object/query plan in the cache gets recompiled. Recompilation is the process where SQL Server finds that the plan in the cache is no good/invalid and creates a new plan for the query submitted. So lets get started. The following scenarios cause recompilation.

* A change in the structure/definition of the object
If the structure of the stored proc/table(Add/delete column, constraint,
index) has changed since last execution then the plan is obviously
invalid and hence it has to be recompiled.

We are using the same stored procedure test_plan_caching defined in the last post. Consider the following script.


DBCC freeproccache --Clear the cache as usual
EXEC Test_plan_caching 'moscow' --Call the SP

GO

ALTER PROCEDURE [dbo].[Test_plan_caching] @city VARCHAR(50)
AS
  SELECT supplier_xid,
         product_xid,
         trans_city,
         qty,
         comments,
         balance_left
  FROM   supplier_product_customer
  WHERE  trans_city = @city
  ORDER  BY supplier_xid DESC
GO 

--Forcing an alter to the sp, though no logical change has been done 

EXEC Test_plan_caching 'moscow'--Call the SP

SELECT TEXT,
       usecounts,
       size_in_bytes,
       cacheobjtype,
       objtype
FROM   sys.dm_exec_cached_plans
       OUTER APPLY sys.Dm_exec_sql_text(plan_handle) AS sqltext
WHERE  TEXT NOT LIKE '%dm_exec_cached_plans%' 




Alter made on the sp forced the optimizer to recreate the plan.
Similar changes that can cause such recompilations are adding/dropping an index,changing a trigger/proc/view definition etc.

* A change in the connection parameters using SET command .

Changing the following connection parameters using SET keyword can cause recompilation.

ANSI_NULL_DFLT_OFF/ANSI_NULL_DFLT_ON/ANSI_NULLS/ANSI_PADDING/ANSI_WARNINGS
/ARITHABORT/CONCAT_NULL_YIELDS_NULL/DATEFIRST/DATEFORMAT/FORCEPLAN/LANGUAGE
/NO_BROWSETABLE/NUMERIC_ROUNDABORT/QUOTED_IDENTIFIER


DBCC freeproccache
GO

EXEC Test_plan_caching 'San Franciso'
GO

SET ansi_null_dflt_off ON
GO

EXEC Test_plan_caching 'San Franciso'
GO 




Query stats DMV clearly shows two plans. One for the plan generated before setting ANSI_NULL_DFLT_OFF and other for the plan with different value on ANSI_NULL_DFLT_OFF.

Note that a change of option 'ANSI_NULL/QUOTED_IDENTIFIER' doesn't affect a Stored proc or Table valued function. But it does affect Adhoc queries or prepared statements.

* A Forced recompilation

When a Object is marked for recompilation using sp_recompile stored procedure or when with recompile hint is used then query is recompiled.
Example:

EXEC Sp_recompile 'test_plan_caching' 


The statement above ensures that next call made to test_plan_caching results in a recomplie.

When a procedure is created with an option WITH RECOMPILE, any call made to the stored proc
is always recompiled.

usage of hint OPTION RECOMPILE causes a statement level recompilation.
The scenarios where these options are useful are potential posts on their own. Will cover them soon.

* A change in statistics, caused by changes done to most of the rows in the table.
A rough algorithm of how SQL Server decides whether the table needs an updated statistics is given below. The algorithm changes is based on RT factor ( Recompilation threshold ) which depends on the number of changes made to the table.
If a table contains less than 500 rows then atleast 500 changes are required to cause a recompilation.If a table has more than 500 rows, then number of changes required are 500 + 20 % of rowcount.

As the algorithm changes with version and service pack releases, its approximate.

Saturday, April 24, 2010

Plan cache - part 2 - What gets cached



In general, we can classify queries submitted into three catogories.

They are
1) Adhoc queries -> The ones submitted thro query analyzer or the queries from applications which are not prepared statments ( executed as just statement without setting any parameter ).
2) SQL objects -> calls made to Stored procs, Triggers, UDFs, system and extended procs
3) Prepared Statements -> Sp_executesql with parameters,Prepared statements from applications.

Calls to sql objects and prepared statements are more likely to be cached. Let us see it in detail.Consider the following adhoc query:

DBCC freeproccache -- Just to free the cache before we start the experiments.


SELECT supplier_xid,
       product_xid,
       trans_city,
       qty,
       comments,
       balance_left
FROM   supplier_product_customer
WHERE  trans_city = 'San Franciso'
ORDER  BY supplier_xid DESC 


Let us change the parameter and execute again


SELECT supplier_xid,
       product_xid,
       trans_city,
       qty,
       comments,
       balance_left
FROM   supplier_product_customer
WHERE  trans_city = 'Perth'
ORDER  BY supplier_xid DESC 


To check whether the query was cached or not let me fire the query on DMV cached plans.


SELECT TEXT,
       usecounts,
       size_in_bytes,
       cacheobjtype,
       objtype
FROM   sys.dm_exec_cached_plans
       OUTER APPLY sys.Dm_exec_sql_text(plan_handle) AS sqltext
WHERE  TEXT NOT LIKE '%dm_exec_cached_plans%' 




The usecounts column has a value 1 which indicates clearly that the query was not cached. By default, adhoc queries are cached only when there is a exact match. A hange in parameter, even a change in case or additional space in the query results in sql server not reusing the query plan prepared eariler.

Let us make it a Stored proc and try the query again.

CREATE PROCEDURE Test_plan_caching @city VARCHAR(50)
AS
  SELECT supplier_xid,
         product_xid,
         trans_city,
         qty,
         comments,
         balance_left
  FROM   supplier_product_customer
  WHERE  trans_city = @city
  ORDER  BY supplier_xid DESC

GO

EXEC Test_plan_caching 'San Franciso'

GO

EXEC Test_plan_caching 'Perth'

GO 


A query to DMV cached plan shows that the query plan has been reused.



Now let us try sp_executesql method. Consider the following script


DECLARE @SQLString NVARCHAR(1000)
DECLARE @paramstring NVARCHAR(100)
DECLARE @city VARCHAR(100)

SET @SQLString = 'Select supplier_xid,product_xid,trans_city,qty,comments,balance_left from ' + 'Supplier_Product_Customer where trans_city = @variable order by supplier_xid desc '
SET @paramstring = '@variable varchar(100)'
SET @city = 'San Franciso'

EXEC Sp_executesql
  @SQLString,
  @paramstring,
  @variable = @City

SET @city = 'Perth'

EXEC Sp_executesql
  @SQLString,
  @paramstring,
  @variable = @City 


Note that we have changed the parameters by setting a new value to input variable and executed the same query.


DMV cached plan shows that query is getting reused. Bear in mind that mere usage of sp_executesql without using input parameter doesnt reuse the plan.

Note that the object type column on DMV cached plan indicates the kind of object that has been cached. Proc refers to procedure, Prepared refers to prepared plans created by sp_executesql and by application queries and Adhoc refers to adhoc queries.

There are still a few exception scenarios where a stored procedure plan doesnt get effectively reused.We will explore them more in the next post.

Friday, April 16, 2010

Plan cache - Introduction


When a query is submitted to SQL server, the query optimiser checks the syntax, verifies the objects and prepares a query plan. A query plan prepared is stored in a area of memory called plan cache . A plan placed in plan cache will be reused when a same query or 'similar query' is submitted to the query analyzer. Preparing a plan everytime can be expensive and time consuming task
and hence the query optimiser attempts to resuse a plan when it is supposed to.

A quick example.

DECLARE @dt DATETIME

SET @dt = Getdate()

SELECT supplier_xid,
       product_xid,
       trans_city,
       qty,
       comments,
       balance_left
FROM   supplier_product_customer
WHERE  trans_city IN( 'San Franciso', 'Perth' )
ORDER  BY supplier_xid DESC

SELECT Datediff(ms, @dt, Getdate()) 

First, the above query is executed. The table contains about 110,000 rows and the query returns about 45000 rows.The query returned the results first time in 1217 ms.

Let us execute the same script again. The script completes in 626 ms which is about half of first execution's time. When executing for the second time,The table's query plan is already in the cache and hence it is reused. We can check whether our query is cached or not using the dmv sys.dm_exec_cached_plans. The following query helps us find the same.

SELECT TEXT,
       usecounts,
       size_in_bytes,
       cacheobjtype,
       objtype
FROM   sys.dm_exec_cached_plans
       OUTER APPLY sys.Dm_exec_sql_text(plan_handle) AS sqltext
WHERE  TEXT NOT LIKE '%dm_exec_cached_plans%' 





Observe the result provided above. The text column provides the query and usercounts indicates how many times the plan has been used. For the above script, usecounts value is 2 and hence it indicates that the query has been reused.

In general, the queries that are 'likely' to be cached and used are as follows.

* Stored procedures, Triggers, Table valued functions.
* Prepared Statements used from applications ( from java/.net/c# etc )
* Queries executed using sp_executesql with input paramters.

As always there are a few exceptions and thats the reason the word 'likely' is in place.

What is written here is just the tip of the iceberg and we will explore more and more on the caching, pros and cons of caching,and the way plan cache works in the next few posts.

Monday, April 12, 2010

Job still runs inspite of disabling


When one has to disable a job/group of jobs, common way is to do it from
Enterprise manager or using the Graphical user interface. At times, when one is
trying to update for a group of jobs then one way people attempt to do it
is by updating sysjobs system table directly. ie.,

UPDATE sysjobs
SET    enabled = 0
WHERE  name IN ( 'job a', 'job b', 'job c' ) 


This is a very dangerous way of doing it, as it doesn't disable completely.
Worse is that when you update sysjobs directly,the job appears to be disabled
to on the GUI, but still the job continues to run as scheduled.

For example consider the following job Test.


The job is currently enabled.

UPDATE sysjobs
SET    enabled = 0
WHERE  name = 'test' 


After executing the query provided above, the job appears to be disabled as shown below.


However, the job continues to run as scheduled. The reason is that the properties
and other attributes of the job are kept in memory and a update to sysjobs
doesnt update the SQL Agent. The correct way of enabling/disabling a job
or updating any property of a job will be to use system stored procedure sp_update_job

Sample call for disabling a job is shown below.

EXEC msdb.dbo.Sp_update_job
  @job_name = 'test',
  @enabled = 0 

For updating a job schedule use sp_update_schedule/sp_update_jobschedule
For more details refer books online here

Also, just a short script which provides a script to disable all
jobs in your server.

SELECT 'EXEC sp_update_job @job_name = ''' + name + ''',@enabled = 0 ' + CHAR(10) + ' GO ' + CHAR(10)
FROM   sysjobs 

View the results of the script in text mode ( instead of grid)
to get the script correctly.

Wednesday, April 7, 2010

Sparse column - Maximum size of a row


Just to clarify, by including a sparse column, the maximum size of the row DOES NOT reduce to 8018 bytes.Quite a few sites have mentioned that sparse column reduces the max row size to 8018 , but its not true.What is true is that the total size occupied by sparse columns alone shouldn't exceed 8019 bytes. The row size ( sparse columns size + normal columns size ) can be greater than 8018 bytes and has the normal row limitation of 8060 bytes. The script provided below illustrates the same.


DROP TABLE [sparse_col]
GO
CREATE TABLE [dbo].[sparse_col]
  (
     [dt]        [DATETIME] NOT NULL,
     [value]     [INT] NULL,
     data        CHAR(500) NULL,
     sparse_data CHAR(7500) SPARSE NULL,
  )
GO
INSERT INTO [sparse_col]
SELECT Getdate(),
       0,
       'sparse_data',
       'sparse data'
GO 


The insert is successful without any errors.

Note that the sparse column 'sparse_data' has a size of 7500 bytes.
Total size of the row when all columns have non null value will exceed 8018 bytes.
Actual size of the row can be checked using DMV dm_db_index_physical_stats as shown below.


DECLARE @dbid INT;
SELECT @dbid = Db_id(); 


SELECT Object_name(object_id) AS [table_name],
       record_count,
       min_record_size_in_bytes,
       max_record_size_in_bytes,
       avg_record_size_in_bytes
FROM   sys.Dm_db_index_physical_stats(@dbid, NULL, NULL, NULL, 'Detailed')
WHERE  Object_name(object_id) = 'sparse_col' 




The size of the row is 8045 bytes, which is above 8018 bytes.Thus its clear that adding a column as sparse column doesnt reduce the size of the row to be 8018 bytes.

Now let us see an example which generates the actual error.
Run the following script.
The script used earlier is slightly modified by changing the 'data' column to sparse. Rest of the structure remains the same with no changes done to the length of the columns.So, the sparse columns on the table are data,sparse whose combined size are 8000 + few bytes used for internal use for storing sparse data.


DROP TABLE [sparse_col]
GO
CREATE TABLE [dbo].[sparse_col]
  (
     [dt]        [DATETIME] NOT NULL,
     [value]     [INT] NULL,
     data        CHAR(500) SPARSE NULL,
     sparse_data CHAR(7500) SPARSE NULL,
  )
GO 


INSERT INTO [sparse_col]
SELECT Getdate(),
       0,
       'some data',
       'sparse data' 

The insert fails with the error
Msg 576, Level 16, State 5, Line 1
Cannot create a row that has sparse data of size 8031 which is greater than the allowable maximum sparse data size of 8019.


The reason is that the total size consumed by sparse_data,data column is 8000 bytes + 31 bytes used for internal use which exceeds the total size allowed (8019 bytes) for sparse columns.

Reference: SQL Server 2008 Internals by Kalen Delaney :)

Friday, April 2, 2010

Sparse Columns


Sparse columns are a new column property introduced in SQL Server 2008, to improve the storage of NULL values on columns.When a column is defined as SPARSE, and if a null is inserted on the column then the column doesnt occupy any space at all.

Some facts:

* A Non varying column with a NULL value occupies the entire length of the column.( Datetime column with null occupies 8 bytes)
* A varying column(varchar) with a NULL value occupies a minimum of two bytes.

So, by defining a column sparse one can save the space that is wasted when NULL values are inserted.But Sparse columns come at a cost. When a sparse column
contains NON NULL value / Valid value, it consumes extra 4 bytes of space. So,
one is expected to use sparse columns only when 90% of the value on the sparse column is expected to be null.

Example:

Consider the following table.


CREATE TABLE [dbo].[sparse_col]
  (
     [dt]    [DATETIME] NOT NULL,
     [value] [INT] NULL,
     data    CHAR(500) NULL
  ) 


Data column doesnt have a sparse property defined on it. Let me insert 10,000 rows using the following script


DECLARE @id INT

SET @id = 1

WHILE @id <= 10000
  BEGIN
      INSERT INTO [sparse_col]
      SELECT Getdate(),
             @id,
             NULL

      SET @id = @id + 1
  END 


The spaceused occupied by the table is 6280 kb as shown below.



Let us recreate the table with 'data' column defined as a sparse column.

CREATE TABLE [dbo].[sparse_col]
  (
     [dt]    [DATETIME] NOT NULL,
     [value] [INT] NULL,
     data    CHAR(500)  Sparse NULL
  ) 

The same script provided above is used to insert the rows. The space occupied is just 264KB after switching to sparse column.



Size of a row :

Size of each row in SQL Server 2008 is limited to 8060 bytes. Sparse columns doesnt allow one to exceed this limitation.The size of all Non null sparse columns of a row is limited to 8019 byes.

Example:

Let us change the table definition , by adding addtional column

ALTER TABLE sparse_col ADD data2 CHAR(8000) Sparse NULL

INSERT INTO sparse_col
SELECT Getdate(),
       5,
       NULL,
       NULL 

The above insert works. But the below doesnt.

INSERT INTO sparse_col
SELECT Getdate(),
       5,
       'x',
       'x' 

The insert fails with a error indicating that the size of the row exceeded 8060 byte limitation.

Adding more than 1024 columns:
By default SQL Server 2008 allows 1024 columns per table. But, by using sparse columns one can increase the number of columns to 30,000. However, there can be a maximum of 1024 non sparse columns on the table and the rest have to be sparse columns. The sparse columns defined will have to be grouped using the Column set feature introduced in SQL Server 2008. Column set is a untyped XML column which will group all the sparse columns on the table. The Column set column is a virtual column which doesnt get stored in the table. For more details refer here.

A table with more than 1024 columns is called a wide table and it can be defined with the following script.


DROP TABLE [sparse_col]

GO

CREATE TABLE [dbo].[sparse_col]
  (
     [dt]                  [DATETIME] NOT NULL,
     [value]               [INT] NULL,
     data                  CHAR(500) Sparse NULL,
     specialpurposecolumns XML COLUMN_SET FOR ALL_SPARSE_COLUMNS
  ) 


A table with the column set feature is created using the above script. SpecialPurposeColumns is a column set which will be grouping all the sparse columns in the table.The script provided below adds more sparse columns to the table.


DECLARE @id INT

SET @id = 1

DECLARE @sql NVARCHAR(100)

WHILE @id < 25000
  BEGIN
      SET @sql = 'ALTER TABLE [sparse_col] ADD Col' + CONVERT(VARCHAR, @id) + '  int sparse null '

      EXEC Sp_executesql @sql

      SET @id = @id + 1
  END 

The above script adds 25000 columns to a table. One doubts whether its neccassary.


Sparse columns do come with many restrictions like sparse columns cant participate in primary keys,sparse columns cant have default values etc. For complete set of restrictions refer here