.NET Buzz Forum - New INTERSECT and EXCEPT operators in Sql Server 2005 T-SQL

Articles |
News |
Weblogs |
Books |
Forums

Artima Forums | Articles | Weblogs | Java Answers | News

Sponsored Link •

.NET Buzz Forum
New INTERSECT and EXCEPT operators in Sql Server 2005 T-SQL

0 replies on 1 page.

Welcome Guest
Sign In

Back to Topic List

Reply to this Topic

Search Forum

Threaded View


Previous Topic		Next Topic

Flat View: This topic has 0 replies on 1 page

Raymond Lewallen

Posts: 312
Nickname: rlewallen
Registered: Apr, 2005

Raymond Lewallen is a .Net developer and Sql Server DBA

New INTERSECT and EXCEPT operators in Sql Server 2005 T-SQL

Posted: Aug 3, 2005 1:00 PM

This post originated from an RSS feed registered with .NET Buzz by Raymond Lewallen.
Original Post: New INTERSECT and EXCEPT operators in Sql Server 2005 T-SQL Feed Title: Raymond Lewallen Feed URL: /error.htm?aspxerrorpath=/blogs/raymond.lewallen/rss.aspx Feed Description: Patterns and Practices, OOP, .Net and Sql	Latest .NET Buzz Posts Latest .NET Buzz Posts by Raymond Lewallen Latest Posts From Raymond Lewallen

This is the first post of many to come that talk about new enhancements in the Sql Server 2005 T-Sql language.

INTERSECT

Let say you have a products table. Iâm going to use the products table from the AdventureWorks database. Now letâs say you imported a list of products into the database into a table called productduplicates. Just for a reason, letâs say you manufacturer sent you a full list of products, and you need to see which products you already have in your database.

Normally, if you didnât have matching primary keys on both tables, like a productId, you would do an inner join on some columns that uniquely identify the rows, as such:

INNER JOINselect a.[name], a.productnumber
from production.product a
inner join production.productduplicates b
on a.[name] = b.[name]
and a.productnumber = b.productnumber

Execution plan for inner joinHash Match(Inner Join, HASH:([b].[Name], [b].[ProductNumber])=([a].[Name], [a].[ProductNumber]), RESIDUAL:([AdventureWorks].[Production].[Product].[Name] as [a].[Name]=[AdventureWorks].[Production].[productduplicates].[Name] as [b].[Name] AND [AdventureWorks].[Production].[Product].[ProductNumber] as [a].[ProductNumber]=[AdventureWorks].[Production].[productduplicates].[ProductNumber] as [b].[ProductNumber]))
|--Table Scan(OBJECT:([AdventureWorks].[Production].[productduplicates] AS [b]))
|--Clustered Index Scan(OBJECT:([AdventureWorks].[Production].[Product].[PK_Product_ProductID] AS [a]))

Now you can replace this with a simple INTERSECT statement like the following:

INTERSECTselect [name], productnumber from production.product
intersect
select [name], productnumber from production.productduplicates

Execution plan for INTERSECTHash Match(Right Semi Join, HASH:([AdventureWorks].[Production].[productduplicates].[Name], [AdventureWorks].[Production].[productduplicates].[ProductNumber])=([AdventureWorks].[Production].[Product].[Name], [AdventureWorks].[Production].[Product].[ProductNumber]), RESIDUAL:([AdventureWorks].[Production].[Product].[Name]=[AdventureWorks].[Production].[productduplicates].[Name] AND [AdventureWorks].[Production].[Product].[ProductNumber]=[AdventureWorks].[Production].[productduplicates].[ProductNumber]))
|--Table Scan(OBJECT:([AdventureWorks].[Production].[productduplicates]))
|--Clustered Index Scan(OBJECT:([AdventureWorks].[Production].[Product].[PK_Product_ProductID]))

Just like if you were using a UNION for two select statements, the same number of columns must exist in both select statements, and they must be of the same datatype. In both above examples, you get a list of products that exist in both tables, so you know which products from the manufacturer you already have listed.

I havenât seen any performance difference in the two, but Iâve only played with this on small tables. The execution plan does show that while you are doing a hash match inner join on the first query, the second query performs a hash match right semi join. Both hash matches perform 50% of the work in the query. The other 50% is split evenly between the table scan and index scan in both queries. So what does the fact that a right semi match is performed when using INTERSECT? I tells me its doing the same thing as if we were to use the EXISTS, like so:

EXISTSselect a.[name], a.productnumber
from production.product a
where exists(select b.[name], b.productnumber from production.productduplicates b
where a.[name] = b.[name]
and a.productnumber = b.productnumber)

EXCEPT

Same scenario, if you wanted to know which products you have that do not exist in the duplicates table, you would normally do a NOT EXISTS like so:

NOT EXISTSselect a.[name], a.productnumber
from production.product a
where not exists(select b.[name], b.productnumber from production.productduplicates b
where a.[name] = b.[name]
and a.productnumber = b.productnumber)

Execution plan for NOT EXISTSHash Match(Right Anti Semi Join, HASH:([b].[Name], [b].[ProductNumber])=([a].[Name], [a].[ProductNumber]), RESIDUAL:([AdventureWorks].[Production].[Product].[Name] as [a].[Name]=[AdventureWorks].[Production].[productduplicates].[Name] as [b].[Name] AND [AdventureWorks].[Production].[Product].[ProductNumber] as [a].[ProductNumber]=[AdventureWorks].[Production].[productduplicates].[ProductNumber] as [b].[ProductNumber]))
|--Table Scan(OBJECT:([AdventureWorks].[Production].[productduplicates] AS [b]))
|--Clustered Index Scan(OBJECT:([AdventureWorks].[Production].[Product].[PK_Product_ProductID] AS [a]))

And you get back the list of products that you have that arenât in the duplicates list. You can flip this around to figure out which products are in the duplicates that you donât have in your products.

Letâs simply the code a little bit by using the EXCEPT operator. This looks identical to the INTERSECTS query:

EXCEPTselect [name], productnumber from production.product
except
select [name], productnumber from production.productduplicates

Execution plan for exceptHash Match(Right Anti Semi Join, HASH:([AdventureWorks].[Production].[productduplicates].[Name], [AdventureWorks].[Production].[productduplicates].[ProductNumber])=([AdventureWorks].[Production].[Product].[Name], [AdventureWorks].[Production].[Product].[ProductNumber]), RESIDUAL:([AdventureWorks].[Production].[Product].[Name]=[AdventureWorks].[Production].[productduplicates].[Name] AND [AdventureWorks].[Production].[Product].[ProductNumber]=[AdventureWorks].[Production].[productduplicates].[ProductNumber]))
|--Table Scan(OBJECT:([AdventureWorks].[Production].[productduplicates]))
|--Clustered Index Scan(OBJECT:([AdventureWorks].[Production].[Product].[PK_Product_ProductID]))

Again, canât find a performance difference. In all queries shows, the reads and duration all average out to be virtually identical. Youâd have to test this on some large tables (high number of rows) to figure out if there is a performance difference at all.

Looking at the execution plans for everything, it doesnât seem that there is a any gain from using INTERSECT and EXCEPT over JOINS or EXISTS/NOT EXISTS except for one big thing: Much simpler code!

Read: New INTERSECT and EXCEPT operators in Sql Server 2005 T-SQL

Previous Topic

Next Topic


	Web Artima.com