Monday, December 7, 2009

Temp table vs Table Variable - Part ii


Before reading this one,please read my previous post here which speaks about basic differences in temp table and table variable.

As mentioned last week, let me run thro the tests I conducted for understanding the performance of temp table against table variable.

Consider the following code:

DBCC freeproccache
DBCC dropcleanbuffers
SET statistics  io  ON
DECLARE  @dt DATETIME
SELECT @dt = Getdate()
CREATE TABLE #tb (
  supplier_id INT,
  product_id  INT,
  trans_city  VARCHAR(100),
  trans_date  DATETIME,
  qty         INT)
INSERT INTO #tb
SELECT DISTINCT supplier_xid,
                product_xid,
                trans_city,
                trans_date,
                qty
FROM   supplier_product_customer
CREATE CLUSTERED INDEX [CIX_supplier_product_customer] ON #tb (
      supplier_id,
      trans_city)
WITH (statistics_norecompute = OFF,
      sort_in_tempdb = OFF,
      ignore_dup_key = OFF,
      drop_existing = OFF,
      online = OFF,
      allow_row_locks = ON,
      allow_page_locks = OFF)
ON [PRIMARY]
SELECT *
FROM   #tb
WHERE  supplier_id = 1721
       AND trans_city = 'Mumbai'
SELECT Datediff(ms,@dt,Getdate())


As you may note, I am insterting a few rows on the temp table. And then I create a clustered index. Fire a straight forward select query.What matters in the code is the section after the insert into the temp table. ie is Creation of the index and the select query.I have noted down a few relevant parmeters for different row counts on the temp table. The query plan obtained is provided below.


Its as expected, a Clustered Index seek which means that the index is getting used.

Let me execute a similar query against table variable. Code posted below.


DBCC freeproccache
DBCC dropcleanbuffers
SET statistics  io  ON
DECLARE  @dt DATETIME
SELECT @dt = Getdate()
DECLARE  @tb  TABLE(
                    supplier_id INT,
                    product_id  INT,
                    trans_city  VARCHAR(100),
                    trans_date  DATETIME,
                    qty         INT
                    )
INSERT INTO @tb
SELECT supplier_xid,
       product_xid,
       trans_city,
       trans_date,
       qty
FROM   supplier_product_customer
SELECT *
FROM   @tb
WHERE  supplier_id = 1721
       AND trans_city = 'Mumbai'
SELECT Datediff(ms,@dt,Getdate())


Query plan obtained for the same is provided below. A straight table scan on table variable as there was no index.



Querycost/IO/time taken for 100/1000/10000/20k/100k rows have been recorded and tabulated below.Please click on the image to view results clearly.




The table compares Query cost, IO, time taken parameters of temp table and table variable.While calulating IOs, one need to take into account the IOs incurred by creating the index as well.So IOs for temp table are represented as 1 + 2, where the number before the '+' sign indicates the IOs incurred by creating the index.

For creating the index, one needs to scan the entire table once. So the above code on using temp table would incur IOs for scanning the table once + IOs for fetching the required row(s). On the other hand, as the query is simple,a table variable can obtain the required row(s) by scanning the table once. The above code using table variable would always incurr the IOs required to scan the entire table once,iresspective of number of in the table.So on IOs front, for a query which as simple as the above one, table variables would always consume less when compared to Indexed temp table.

For time consumed, table variable is faster even at 100k rows. Query cost,which is a number telling you how hard your machine needs to work to execute the query also favours table variables in this context.So when the query is simple and striaghtforward, table variables provide better performance.

Now for the another scenario where a slightly complex query is executed against temp table / table variable.The query attempts to find the city which obtained maximum sales for each supplier.The query against temp table is provided below.


DBCC freeproccache
DBCC dropcleanbuffers
SET statistics  io  ON
DECLARE  @dt DATETIME
SELECT @dt = Getdate()
CREATE TABLE #tb (
  supplier_id INT,
  product_id  INT,
  trans_city  VARCHAR(100),
  trans_date  DATETIME,
  qty         INT)
INSERT INTO #tb
SELECT supplier_xid,
       product_xid,
       trans_city,
       trans_date,
       qty
FROM   supplier_product_customer
CREATE CLUSTERED INDEX [CIX_supplier_product_customer] ON #tb (
      supplier_id,
      trans_city)
WITH (statistics_norecompute = OFF,
      sort_in_tempdb = OFF,
      ignore_dup_key = OFF,
      drop_existing = OFF,
      online = OFF,
      allow_row_locks = ON,
      allow_page_locks = OFF)
ON [PRIMARY]
SELECT   supplier_id,
         trans_city,
         Sum(qty)
FROM     #tb outer_qry
GROUP BY supplier_id,
         trans_city
HAVING   Sum(qty) = (SELECT Max(sum_of_qty)
                     FROM   (SELECT Sum(qty) AS sum_of_qty
                             FROM   #tb sub_query
                             WHERE  sub_query.supplier_id = outer_qry.supplier_id
                                    AND sub_query.trans_city = outer_qry.trans_city) x)
SELECT Datediff(ms,@dt,Getdate())
DROP TABLE #tb


The query plan for the same query is provided. The query plan uses a Nested loop join to achieve the resultset.For each supplierId, the clustered index seek is used to fetch the corressponding rows from the inner loop.Clustered index scan is used at outer loop, so that Supplier IDs enter the inner loop in sorted order.So, in
short the clustered index is used effectively.



Similarly the query against table variable is also provided below.


DBCC freeproccache
DBCC dropcleanbuffers
SET statistics  io  ON
DECLARE  @dt DATETIME
SELECT @dt = Getdate()
DECLARE  @tb  TABLE(
                    supplier_id INT,
                    product_id  INT,
                    trans_city  VARCHAR(100),
                    trans_date  DATETIME,
                    qty         INT
                    )
INSERT INTO @tb
SELECT supplier_xid,
       product_xid,
       trans_city,
       trans_date,
       qty
FROM   supplier_product_customer
SELECT   supplier_id,
         trans_city,
         Sum(qty)
FROM     @tb outer_qry
GROUP BY supplier_id,
         trans_city
HAVING   Sum(qty) = (SELECT Max(sum_of_qty)
                     FROM   (SELECT Sum(qty) sum_of_qty
                             FROM   @tb sub_query
                             WHERE  sub_query.supplier_id = outer_qry.supplier_id
                                    AND sub_query.trans_city = outer_qry.trans_city) x)
SELECT Datediff(ms,@dt,Getdate())
SELECT Getdate()


The query plan is also shown below.Query plan is similar to the one generated for temp table but for few differences.One difference is a table scan at both inner and outer sections of Nested loop operator.Another major difference is additional sort operator required have the rows sorted before passing them to the
inner loop. sort operator can be extremely expensive when run against huge tables.



The results for both the scenarios are tabulated.The results now heavily favour temp tables.Please click on the image to view the results clearly.



On the IOs front temp table always requires lesser IOs as the index proves to be handy. The table variable can only scan and hence the number of IOs increases when the query engine has to read the table multiple times. There is not much of a difference in time taken between the two till 10k rows. But once the
row count reaches 20k the performance of table variable dips badly. Observe the results marked in Red, which show a major performance difference when the table grows bigger. At 1 million rows temp table takes a little less than
3 seconds but table variable takes over 75 seconds which is 25 times slower.

To Summarize, one can go for a table variable when

* Table is really small with a max of 1000 odd rows.
* No complex queries are involved and the table needs
to be read only once/twice.


A temp table is a ideal choice when

* Number of rows are more then 10k
* Queries are complicated and would require multiple passes.
* Query looks like an ideal candidate for indexing.

Anyways, one is strongly encouraged to test with both the options when scenario is not as clear as the examples posted above.

There are few more things to be written. Will do a part-3 as well.

No comments: