Sunday, June 6, 2010

Scalar functions - on where clause


Scalar functions as discussed in the last post can be a performance bottleneck.Another place where Scalar functions ( both user defined and system ) can become a performance bottleneck is when used on a where clause of the query. Scalar functions when used on a where clause of a query can make the optimizer pick a table scan, even if the index is present, making the index useless.

/****** Object:  Index [ncix_Supplier_product_customer_trans_date]   ******/
CREATE NONCLUSTERED INDEX [ncix_Supplier_product_customer_trans_date]
  ON [dbo].[Supplier_Product_Customer] ( [trans_date] ASC )
  ON [PRIMARY] 


A non clustered index on trans_date column of supplier_product_customer table has been created.When one needs to find the list of transactions for a particular day ( cutting the time part of the , the most common method of writing the query would be

SELECT *
FROM   [Supplier_Product_customer]
WHERE  Datediff(dd, [trans_date], '11/19/2009') = 0 


But,the problem with such a query is that it takes table scan instead of using the Non clustered index created on trans_date. The reason is that scalar function datediff used on trans_date column stops the query analyzer from using the index.



A better way of writing it would be

SELECT *
FROM   supplier_product_customer
WHERE  trans_date >= '20091119'
       AND trans_date < '20091120' 




The picture above shows a non clustered index seek instead of table scan. There are few other common scenarios where functions can be avoided. Let me briefly list them here

* Left() string fuction can be effectively replaced by like
'str%'

* Usage of upper/lower string functions can be avoided
when the database has a case insensitive collation.
* isnull(col1,'xyz') = 'xyz' can be replaced by col1 = 'xyz'
or col1 is null

Note that OR conditions do use indexes and but at times they don't.
Please check before use.
* Getting data older than 60/n days query.
Standard way of doing it would be

Where datediff(dd, trans_date, getdate()) > 60 

Replaced by

Where trans_date < CONVERT(VARCHAR, getdate()-60, 112) 


To Conclude, one should try the best to avoid having scalar functions in where clauses.

4 comments:

Anonymous said...

Thanks for the post!

Could where clauses possibly return the wrong information?


select m.id
from table m
where m.course in
('05','06','07','08','09','10')
and m.type = 'AuditOnly'
and m.gender = 'M'


returns zero, but:


select m.id
from table m
where m.course in
('05','06','07','08','09','10')

intersect

select m.id
from table m
where m.type = 'AuditOnly'

intersect

select m.id
from table m
where m.gender = 'M'


returns a few hundred. Why the difference?

Nagaraj Venkatesan said...

Interesting comment Anony. Thanks. The difference is because the way in which intersect operates. There is no problem with where clause.

Interesct executes each part of the query individually and then prints the results like a set interesection ie prints the rows that have the same value on the select column.But interesect doesnt check whether all the 3 conditions are satisfied on the same row. Meaning interesect doesnt check whether the your m.id at all the three queries comes from the same row. As where clause applies all the conditions at one go it checks whether all the conditions are satisfied for the same row. Grab a look at the sample query posted by me below..
create table #temp(id int, c1 varchar(5),c2 varchar(5))

Insert into #temp Select 1,'A','Z'
Insert into #temp Select 1,'B','Y'
Insert into #temp Select 1,'C','X'

Select * from #temp
where id = 1 and c1 = 'B' and c2 = 'X'

Select id
from #temp where id = 1
intersect
Select id
from #temp where c1 = 'B'
intersect
Select id
from #temp where c2 = 'X'

Please post if still not clear or if you have any more doubts.

Anonymous said...

Thanks! However, what if the ID is unique?

Nagaraj Venkatesan said...

If the ID is unique, then the results for the case explained should be same.