The Artima Developer Community
Sponsored Link

.NET Buzz Forum
New INTERSECT and EXCEPT operators in Sql Server 2005 T-SQL

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Raymond Lewallen

Posts: 312
Nickname: rlewallen
Registered: Apr, 2005

Raymond Lewallen is a .Net developer and Sql Server DBA
New INTERSECT and EXCEPT operators in Sql Server 2005 T-SQL Posted: Aug 3, 2005 1:00 PM
Reply to this message Reply

This post originated from an RSS feed registered with .NET Buzz by Raymond Lewallen.
Original Post: New INTERSECT and EXCEPT operators in Sql Server 2005 T-SQL
Feed Title: Raymond Lewallen
Feed URL: /error.htm?aspxerrorpath=/blogs/raymond.lewallen/rss.aspx
Feed Description: Patterns and Practices, OOP, .Net and Sql
Latest .NET Buzz Posts
Latest .NET Buzz Posts by Raymond Lewallen
Latest Posts From Raymond Lewallen

Advertisement

This is the first post of many to come that talk about new enhancements in the Sql Server 2005 T-Sql language.

INTERSECT

Let say you have a products table.  I’m going to use the products table from the AdventureWorks database.  Now let’s say you imported a list of products into the database into a table called productduplicates.  Just for a reason, let’s say you manufacturer sent you a full list of products, and you need to see which products you already have in your database.

Normally, if you didn’t have matching primary keys on both tables, like a productId, you would do an inner join on some columns that uniquely identify the rows, as such:

INNER JOINselect a.[name], a.productnumber
 from production.product a
inner join production.productduplicates b
 on a.[name] = b.[name]
 and a.productnumber = b.productnumber

Execution plan for inner joinHash Match(Inner Join, HASH:([b].[Name], [b].[ProductNumber])=([a].[Name], [a].[ProductNumber]), RESIDUAL:([AdventureWorks].[Production].[Product].[Name] as [a].[Name]=[AdventureWorks].[Production].[productduplicates].[Name] as [b].[Name] AND [AdventureWorks].[Production].[Product].[ProductNumber] as [a].[ProductNumber]=[AdventureWorks].[Production].[productduplicates].[ProductNumber] as [b].[ProductNumber]))
  |--Table Scan(OBJECT:([AdventureWorks].[Production].[productduplicates] AS [b]))       
  |--Clustered Index Scan(OBJECT:([AdventureWorks].[Production].[Product].[PK_Product_ProductID] AS [a]))     

Now you can replace this with a simple INTERSECT statement like the following:

INTERSECTselect [name], productnumber from production.product
intersect
select [name], productnumber from production.productduplicates

Execution plan for INTERSECTHash Match(Right Semi Join, HASH:([AdventureWorks].[Production].[productduplicates].[Name], [AdventureWorks].[Production].[productduplicates].[ProductNumber])=([AdventureWorks].[Production].[Product].[Name], [AdventureWorks].[Production].[Product].[ProductNumber]), RESIDUAL:([AdventureWorks].[Production].[Product].[Name]=[AdventureWorks].[Production].[productduplicates].[Name] AND [AdventureWorks].[Production].[Product].[ProductNumber]=[AdventureWorks].[Production].[productduplicates].[ProductNumber]))
  |--Table Scan(OBJECT:([AdventureWorks].[Production].[productduplicates])) 
  |--Clustered Index Scan(OBJECT:([AdventureWorks].[Production].[Product].[PK_Product_ProductID]))  

Just like if you were using a UNION for two select statements, the same number of columns must exist in both select statements, and they must be of the same datatype.  In both above examples, you get a list of products that exist in both tables, so you know which products from the manufacturer you already have listed.

I haven’t seen any performance difference in the two, but I’ve only played with this on small tables.  The execution plan does show that while you are doing a hash match inner join on the first query, the second query performs a hash match right semi join.  Both hash matches perform 50% of the work in the query.  The other 50% is split evenly between the table scan and index scan in both queries.  So what does the fact that a right semi match is performed when using INTERSECT?  I tells me its doing the same thing as if we were to use the EXISTS, like so:

EXISTSselect a.[name], a.productnumber
 from production.product a
where exists(select b.[name], b.productnumber from production.productduplicates b
 where a.[name] = b.[name]
 and a.productnumber = b.productnumber)

EXCEPT

Same scenario, if you wanted to know which products you have that do not exist in the duplicates table, you would normally do a NOT EXISTS like so:

NOT EXISTSselect a.[name], a.productnumber
 from production.product a
where not exists(select b.[name], b.productnumber from production.productduplicates b
 where a.[name] = b.[name]
 and a.productnumber = b.productnumber)

Execution plan for NOT EXISTSHash Match(Right Anti Semi Join, HASH:([b].[Name], [b].[ProductNumber])=([a].[Name], [a].[ProductNumber]), RESIDUAL:([AdventureWorks].[Production].[Product].[Name] as [a].[Name]=[AdventureWorks].[Production].[productduplicates].[Name] as [b].[Name] AND [AdventureWorks].[Production].[Product].[ProductNumber] as [a].[ProductNumber]=[AdventureWorks].[Production].[productduplicates].[ProductNumber] as [b].[ProductNumber]))
  |--Table Scan(OBJECT:([AdventureWorks].[Production].[productduplicates] AS [b]))
  |--Clustered Index Scan(OBJECT:([AdventureWorks].[Production].[Product].[PK_Product_ProductID] AS [a]))

And you get back the list of products that you have that aren’t in the duplicates list.  You can flip this around to figure out which products are in the duplicates that you don’t have in your products.

Let’s simply the code a little bit by using the EXCEPT operator.  This looks identical to the INTERSECTS query:

EXCEPTselect [name], productnumber from production.product
except
select [name], productnumber from production.productduplicates

Execution plan for exceptHash Match(Right Anti Semi Join, HASH:([AdventureWorks].[Production].[productduplicates].[Name], [AdventureWorks].[Production].[productduplicates].[ProductNumber])=([AdventureWorks].[Production].[Product].[Name], [AdventureWorks].[Production].[Product].[ProductNumber]), RESIDUAL:([AdventureWorks].[Production].[Product].[Name]=[AdventureWorks].[Production].[productduplicates].[Name] AND [AdventureWorks].[Production].[Product].[ProductNumber]=[AdventureWorks].[Production].[productduplicates].[ProductNumber]))
  |--Table Scan(OBJECT:([AdventureWorks].[Production].[productduplicates]))
  |--Clustered Index Scan(OBJECT:([AdventureWorks].[Production].[Product].[PK_Product_ProductID]))

Again, can’t find a performance difference.  In all queries shows, the reads and duration all average out to be virtually identical.  You’d have to test this on some large tables (high number of rows) to figure out if there is a performance difference at all.

Looking at the execution plans for everything, it doesn’t seem that there is a any gain from using INTERSECT and EXCEPT over JOINS or EXISTS/NOT EXISTS except for one big thing: Much simpler code!

Read: New INTERSECT and EXCEPT operators in Sql Server 2005 T-SQL

Topic: Observation on CreaTEAvity Previous Topic   Next Topic Topic: Flashbacks (Warning: Political Post)

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use