ChatGPT解决这个技术问题 Extra ChatGPT

Difference between EXISTS and IN in SQL?

What is the difference between the EXISTS and IN clause in SQL?

When should we use EXISTS, and when should we use IN?


K
Keith

The exists keyword can be used in that way, but really it's intended as a way to avoid counting:

--this statement needs to check the entire table
select count(*) from [table] where ...

--this statement is true as soon as one match is found
exists ( select * from [table] where ... )

This is most useful where you have if conditional statements, as exists can be a lot quicker than count.

The in is best used where you have a static list to pass:

 select * from [table]
 where [field] in (1, 2, 3)

When you have a table in an in statement it makes more sense to use a join, but mostly it shouldn't matter. The query optimiser should return the same plan either way. In some implementations (mostly older, such as Microsoft SQL Server 2000) in queries will always get a nested join plan, while join queries will use nested, merge or hash as appropriate. More modern implementations are smarter and can adjust the plan even when in is used.


Could you elaborate on "When you have a table in an in statement it makes more sense to use a join, but it doesn't really matter. The query optimiser will return the same plan either way."? Not the query optimiser part, the part where you can use a JOIN as a replacement for IN.
select * from [table] where [field] in (select [field] from [table2]) returns the same results (and query plan) as select * from [table] join [table2] on [table2].[field] = [table].[field].
@Sander it doesn't: the first query returns all the columns from table, while the second returns everything from table and table2. In some (mostly older) SQL databases the in query will get implemented as a nested join, while the join query can be nested, merged, hashed, etc - whatever's quickest.
Okay, I should have specified columns in the select clause, but you should update your answer because it clearly states that the queries "will return the same plan either way".
exists can be used within a case statement, so they can be handy that way also i.e. select case when exists (select 1 from emp where salary > 1000) then 1 else 0 end as sal_over_1000
s
shA.t

EXISTS will tell you whether a query returned any results. e.g.:

SELECT * 
FROM Orders o 
WHERE EXISTS (
    SELECT * 
    FROM Products p 
    WHERE p.ProductNumber = o.ProductNumber)

IN is used to compare one value to several, and can use literal values, like this:

SELECT * 
FROM Orders 
WHERE ProductNumber IN (1, 10, 100)

You can also use query results with the IN clause, like this:

SELECT * 
FROM Orders 
WHERE ProductNumber IN (
    SELECT ProductNumber 
    FROM Products 
    WHERE ProductInventoryQuantity > 0)

Last query is dangerous because it might fail in the case subquery doesn't return any results. 'in' clause requires at least 1 argument...
@user2054927 Last query will correctly return no rows if the subquery returns no rows - nothing dangerous about that!
M
Michael Currie

Based on rule optimizer:

EXISTS is much faster than IN, when the sub-query results is very large.

IN is faster than EXISTS, when the sub-query results is very small.

Based on cost optimizer:

There is no difference.


Proof of your argument? I don't think IN would be faster than EXISTS ever!
@Nawaz How about the proof why IN is always slower than EXISTS?
Badly implemented query optimizer? I've seem something like this (though not exactly this situation) happen in a certain RDBMs...
EXISTS returns purely Boolean values, which is always faster than having to compare strings or values larger than a BIT/Boolean type. IN may or may not be a Boolean comparison. Since programming prefers EXPLICIT usage for stability (part of ACID), EXISTS is preferred generally.
Why was this upvoted so many times? There's absolutely no reason why this assumption-based statement should be generally true.
A
Alvin Thompson

I'm assuming you know what they do, and thus are used differently, so I'm going to understand your question as: When would it be a good idea to rewrite the SQL to use IN instead of EXISTS, or vice versa.

Is that a fair assumption?

Edit: The reason I'm asking is that in many cases you can rewrite an SQL based on IN to use an EXISTS instead, and vice versa, and for some database engines, the query optimizer will treat the two differently.

For instance:

SELECT *
FROM Customers
WHERE EXISTS (
    SELECT *
    FROM Orders
    WHERE Orders.CustomerID = Customers.ID
)

can be rewritten to:

SELECT *
FROM Customers
WHERE ID IN (
    SELECT CustomerID
    FROM Orders
)

or with a join:

SELECT Customers.*
FROM Customers
    INNER JOIN Orders ON Customers.ID = Orders.CustomerID

So my question still stands, is the original poster wondering about what IN and EXISTS does, and thus how to use it, or does he ask wether rewriting an SQL using IN to use EXISTS instead, or vice versa, will be a good idea?


I don't know about the OP, but I would like the answer to this question! When should I use EXISTS instead of IN with a subquery that returns IDs?
in the JOIN, you will need a DISTINCT
great demonstration, but pretty much leave the question unanswered
@RoyTinker the answer is an opinionated mix between "use X when it makes the query easier to understand than Y for the maintaining developer" and "use X when doing it makes it acceptably faster/less resource intensive than Y, which is causing a performance issue". Engineering is a compromise
@CaiusJard Right, agreed. The system I was building in 2010 was an in-house JSON <=> SQL ORM, so performance was the primary concern over how "readable" the generated queries would be.
n
nhahtdh

EXISTS is much faster than IN when the subquery results is very large. IN is faster than EXISTS when the subquery results is very small. CREATE TABLE t1 (id INT, title VARCHAR(20), someIntCol INT) GO CREATE TABLE t2 (id INT, t1Id INT, someData VARCHAR(20)) GO INSERT INTO t1 SELECT 1, 'title 1', 5 UNION ALL SELECT 2, 'title 2', 5 UNION ALL SELECT 3, 'title 3', 5 UNION ALL SELECT 4, 'title 4', 5 UNION ALL SELECT null, 'title 5', 5 UNION ALL SELECT null, 'title 6', 5 INSERT INTO t2 SELECT 1, 1, 'data 1' UNION ALL SELECT 2, 1, 'data 2' UNION ALL SELECT 3, 2, 'data 3' UNION ALL SELECT 4, 3, 'data 4' UNION ALL SELECT 5, 3, 'data 5' UNION ALL SELECT 6, 3, 'data 6' UNION ALL SELECT 7, 4, 'data 7' UNION ALL SELECT 8, null, 'data 8' UNION ALL SELECT 9, 6, 'data 9' UNION ALL SELECT 10, 6, 'data 10' UNION ALL SELECT 11, 8, 'data 11' Query 1 SELECT FROM t1 WHERE not EXISTS (SELECT * FROM t2 WHERE t1.id = t2.t1id) Query 2 SELECT t1.* FROM t1 WHERE t1.id not in (SELECT t2.t1id FROM t2 ) If in t1 your id has null value then Query 1 will find them, but Query 2 cant find null parameters. I mean IN can't compare anything with null, so it has no result for null, but EXISTS can compare everything with null.


This answer is reasonable synopsis of Tom Kite's sentiment (asktom.oracle.com/pls/asktom/…)
I think this answer is based on intuition, which is fair enough. But it cannot be universally true. For example, it is almost certainly not true of Ingres, which would parse both the equivalent SQL queries to be the same QUEL query, which lacks SQL's - ahem - 'richness' when it comes to writing the same thing multiple ways.
These 2 queries are logically equivalent if and only if t2.id is defined as "NOT NULL". To grantee the equivalency with no dependency in the table definition the 2nd query should be "SELECT t1.* FROM t1 WHERE t1.id not in (SELECT t2.id FROM t2 where t2.id is not null)"
M
Michael

If you are using the IN operator, the SQL engine will scan all records fetched from the inner query. On the other hand if we are using EXISTS, the SQL engine will stop the scanning process as soon as it found a match.


C
Community

IN supports only equality relations (or inequality when preceded by NOT). It is a synonym to =any / =some, e.g

select    * 
from      t1 
where     x in (select x from t2)
;

EXISTS supports variant types of relations, that cannot be expressed using IN, e.g. -

select    * 
from      t1 
where     exists (select    null 
                  from      t2 
                  where     t2.x=t1.x 
                        and t2.y>t1.y 
                        and t2.z like '℅' || t1.z || '℅'
                  )
;

And on a different note -

The allegedly performance and technical differences between EXISTS and IN may result from specific vendor's implementations/limitations/bugs, but many times they are nothing but myths created due to lack of understanding of the databases internals.

The tables' definition, statistics' accuracy, database configuration and optimizer's version have all impact on the execution plan and therefore on the performance metrics.


Upvote for your comment on performance: without focusing on a specific DBMS, we should assume that it is up to the optimizer to work out what works best.
P
Pete Carter

The Exists keyword evaluates true or false, but IN keyword compare all value in the corresponding sub query column. Another one Select 1 can be use with Exists command. Example:

SELECT * FROM Temp1 where exists(select 1 from Temp2 where conditions...)

But IN is less efficient so Exists faster.


A
Adriano Carneiro

I think,

EXISTS is when you need to match the results of query with another subquery. Query#1 results need to be retrieved where SubQuery results match. Kind of a Join.. E.g. select customers table#1 who have placed orders table#2 too

IN is to retrieve if the value of a specific column lies IN a list (1,2,3,4,5) E.g. Select customers who lie in the following zipcodes i.e. zip_code values lies in (....) list.

When to use one over the other... when you feel it reads appropriately (Communicates intent better).


M
Michal

As per my knowledge when a subquery returns a NULL value then the whole statement becomes NULL. In that cases we are using the EXITS keyword. If we want to compare particular values in subqueries then we are using the IN keyword.


n
nhahtdh

Which one is faster depends on the number of queries fetched by the inner query:

When your inner query fetching thousand of rows then EXIST would be better choice

When your inner query fetching few rows, then IN will be faster

EXIST evaluate on true or false but IN compare multiple value. When you don't know the record is exist or not, your should choose EXIST


r
rogue lad

Difference lies here:

select * 
from abcTable
where exists (select null)

Above query will return all the records while below one would return empty.

select *
from abcTable
where abcTable_ID in (select null)

Give it a try and observe the output.


Hmmm... Error: [SQL0104] Token ) was not valid. In both cases. Are you assuming a particular RDBMS?
V
Vipin Jain

The reason is that the EXISTS operator works based on the “at least found” principle. It returns true and stops scanning table once at least one matching row found.

On the other hands, when the IN operator is combined with a subquery, MySQL must process the subquery first, and then uses the result of the subquery to process the whole query.

The general rule of thumb is that if the subquery contains a large volume of data, the EXISTS operator provides a better performance. However, the query that uses the IN operator will perform faster if the result set returned from the subquery is very small.


T
TOUZENE Mohamed Wassim

In certain circumstances, it is better to use IN rather than EXISTS. In general, if the selective predicate is in the subquery, then use IN. If the selective predicate is in the parent query, then use EXISTS.

https://docs.oracle.com/cd/B19306_01/server.102/b14211/sql_1016.htm#i28403


It should be noted that even at the time you posted this answer in 2017, you were referring to an oracle product that was released 12 years prior and was already well past its end of life
R
Ranjeeth

My understand is both should be the same as long as we are not dealing with NULL values.

The same reason why the query does not return the value for = NULL vs is NULL. http://sqlinthewild.co.za/index.php/2010/02/18/not-exists-vs-not-in/

As for as boolean vs comparator argument goes, to generate a boolean both values needs to be compared and that is how any if condition works.So i fail to understand how IN and EXISTS behave differently .


S
Sean O'Toole

If a subquery returns more than one value, you might need to execute the outer query- if the values within the column specified in the condition match any value in the result set of the subquery. To perform this task, you need to use the in keyword.

You can use a subquery to check if a set of records exists. For this, you need to use the exists clause with a subquery. The exists keyword always return true or false value.


P
Pang

I believe this has a straightforward answer. Why don't you check it from the people who developed that function in their systems?

If you are a MS SQL developer, here is the answer directly from Microsoft.

IN:

Determines whether a specified value matches any value in a subquery or a list.

EXISTS:

Specifies a subquery to test for the existence of rows.


A
Axel Der

I found that using EXISTS keyword is often really slow (that is very true in Microsoft Access). I instead use the join operator in this manner : should-i-use-the-keyword-exists-in-sql


A
Adam

If you can use where in instead of where exists, then where in is probably faster.

Using where in or where exists will go through all results of your parent result. The difference here is that the where exists will cause a lot of dependet sub-queries. If you can prevent dependet sub-queries, then where in will be the better choice.

Example

Assume we have 10,000 companies, each has 10 users (thus our users table has 100,000 entries). Now assume you want to find a user by his name or his company name.

The following query using were exists has an execution of 141ms:

select * from `users` 
where `first_name` ='gates' 
or exists 
(
  select * from `companies` 
  where `users`.`company_id` = `companies`.`id`
  and `name` = 'gates'
)

https://i.stack.imgur.com/mpcAK.png

However, if we avoid the exists query and write it using:

select * from `users` 
where `first_name` ='gates' 
or users.company_id in  
(
    select id from `companies` 
    where  `name` = 'gates'
)

Then depended sub queries are avoided and the query would run in 0,012 ms

https://i.stack.imgur.com/FHt2d.png


D
Deva

EXISTS Is Faster in Performance than IN. If Most of the filter criteria is in subquery then better to use IN and If most of the filter criteria is in main query then better to use EXISTS.


That claim is really not backed by any evidence, is it?
B
Ben

If you are using the IN operator, the SQL engine will scan all records fetched from the inner query. On the other hand if we are using EXISTS, the SQL engine will stop the scanning process as soon as it found a match.


@ziggy explain? This is pretty much what the accepted answer also says. In MUST check every single record, exists can stop as soon as it finds just one.
Nope, not correct. IN and EXISTS can be equivalent and transformed into each other.