UNION
removes duplicate records (where all columns in the results are the same), UNION ALL
does not.
There is a performance hit when using UNION
instead of UNION ALL
, since the database server must do additional work to remove the duplicate rows, but usually you do not want the duplicates (especially when developing reports).
To identify duplicates, records must be comparable types as well as compatible types. This will depend on the SQL system. For example the system may truncate all long text fields to make short text fields for comparison (MS Jet), or may refuse to compare binary fields (ORACLE)
UNION Example:
SELECT 'foo' AS bar UNION SELECT 'foo' AS bar
Result:
+-----+
| bar |
+-----+
| foo |
+-----+
1 row in set (0.00 sec)
UNION ALL example:
SELECT 'foo' AS bar UNION ALL SELECT 'foo' AS bar
Result:
+-----+
| bar |
+-----+
| foo |
| foo |
+-----+
2 rows in set (0.00 sec)
Both UNION and UNION ALL concatenate the result of two different SQLs. They differ in the way they handle duplicates.
UNION performs a DISTINCT on the result set, eliminating any duplicate rows.
UNION ALL does not remove duplicates, and it therefore faster than UNION.
Note: While using this commands all selected columns need to be of the same data type.
Example: If we have two tables, 1) Employee and 2) Customer
Employee table data:
https://i.stack.imgur.com/huYEL.png
Customer table data:
https://i.stack.imgur.com/FEaKe.png
UNION Example (It removes all duplicate records):
https://i.stack.imgur.com/lLiS1.png
UNION ALL Example (It just concatenate records, not eliminate duplicates, so it is faster than UNION):
https://i.stack.imgur.com/n5gvq.png
UNION
removes duplicates, whereas UNION ALL
does not.
In order to remove duplicates the result set must be sorted, and this may have an impact on the performance of the UNION, depending on the volume of data being sorted, and the settings of various RDBMS parameters ( For Oracle PGA_AGGREGATE_TARGET
with WORKAREA_SIZE_POLICY=AUTO
or SORT_AREA_SIZE
and SOR_AREA_RETAINED_SIZE
if WORKAREA_SIZE_POLICY=MANUAL
).
Basically, the sort is faster if it can be carried out in memory, but the same caveat about the volume of data applies.
Of course, if you need data returned without duplicates then you must use UNION, depending on the source of your data.
I would have commented on the first post to qualify the "is much less performant" comment, but have insufficient reputation (points) to do so.
In ORACLE: UNION does not support BLOB (or CLOB) column types, UNION ALL does.
The basic difference between UNION and UNION ALL is union operation eliminates the duplicated rows from the result set but union all returns all rows after joining.
from http://zengin.wordpress.com/2007/07/31/union-vs-union-all/
UNION
The UNION
command is used to select related information from two tables, much like the JOIN
command. However, when using the UNION
command all selected columns need to be of the same data type. With UNION
, only distinct values are selected.
UNION ALL
The UNION ALL
command is equal to the UNION
command, except that UNION ALL
selects all values.
The difference between Union
and Union all
is that Union all
will not eliminate duplicate rows, instead it just pulls all rows from all tables fitting your query specifics and combines them into a table.
A UNION
statement effectively does a SELECT DISTINCT
on the results set. If you know that all the records returned are unique from your union, use UNION ALL
instead, it gives faster results.
You can avoid duplicates and still run much faster than UNION DISTINCT (which is actually same as UNION) by running query like this:
SELECT * FROM mytable WHERE a=X UNION ALL SELECT * FROM mytable WHERE b=Y AND a!=X
Notice the AND a!=X
part. This is much faster then UNION.
UNION
- UNION
also removes duplicates that are returned by the subqueries, whereas your approach won't.
Just to add my two cents to the discussion here: one could understand the UNION
operator as a pure, SET-oriented UNION - e.g. set A={2,4,6,8}, set B={1,2,3,4}, A UNION B = {1,2,3,4,6,8}
When dealing with sets, you would not want numbers 2 and 4 appearing twice, as an element either is or is not in a set.
In the world of SQL, though, you might want to see all the elements from the two sets together in one "bag" {2,4,6,8,1,2,3,4}. And for this purpose T-SQL offers the operator UNION ALL
.
UNION ALL
isn't "offered" by T-SQL. UNION ALL
is part of the ANSI SQL standard and not specific to MS SQL Server.
UNION - results in distinct records while UNION ALL - results in all the records including duplicates.
Both are blocking operators and hence I personally prefer using JOINS over Blocking Operators(UNION, INTERSECT, UNION ALL etc. ) anytime.
To illustrate why Union operation performs poorly in comparison to Union All checkout the following example.
CREATE TABLE #T1 (data VARCHAR(10))
INSERT INTO #T1
SELECT 'abc'
UNION ALL
SELECT 'bcd'
UNION ALL
SELECT 'cde'
UNION ALL
SELECT 'def'
UNION ALL
SELECT 'efg'
CREATE TABLE #T2 (data VARCHAR(10))
INSERT INTO #T2
SELECT 'abc'
UNION ALL
SELECT 'cde'
UNION ALL
SELECT 'efg'
https://i.stack.imgur.com/kdD4p.jpg
Following are results of UNION ALL and UNION operations.
https://i.stack.imgur.com/LpjWD.jpg
A UNION statement effectively does a SELECT DISTINCT on the results set. If you know that all the records returned are unique from your union, use UNION ALL instead, it gives faster results.
Using UNION results in Distinct Sort operations in the Execution Plan. Proof to prove this statement is shown below:
https://i.stack.imgur.com/QmQlL.jpg
UNION
/UNION ALL
).
union
using a combination of join
s and some really nasty case
s, but it makes the query darn-near impossible to read and maintain, and in my experience it is also terrible for performance. Compare: select foo.bar from foo union select fizz.buzz from fizz
against select case when foo.bar is null then fizz.buzz else foo.bar end from foo join fizz where foo.bar is null or fizz.buzz is null
Not sure that it matters which database
UNION
and UNION ALL
should work on all SQL Servers.
You should avoid of unnecessary UNION
s they are huge performance leak. As a rule of thumb use UNION ALL
if you are not sure which to use.
It is good to understand with a Venn diagramm.
here is the link to the source. There is a good description.
https://i.stack.imgur.com/M0gvT.png
()
shown a second time. Actually, on second thoughts, because the union all
result is not a set, you should make no attempt to draw it using a Venn diagram!
union is used to select distinct values from two tables where as union all is used to select all values including duplicates from the tables
(From Microsoft SQL Server Book Online)
UNION [ALL]
Specifies that multiple result sets are to be combined and returned as a single result set.
ALL
Incorporates all rows into the results. This includes duplicates. If not specified, duplicate rows are removed.
UNION
will take too long as a duplicate rows finding like DISTINCT
is applied on the results.
SELECT * FROM Table1
UNION
SELECT * FROM Table2
is equivalent of:
SELECT DISTINCT * FROM (
SELECT * FROM Table1
UNION ALL
SELECT * FROM Table2) DT
A side effect of applying DISTINCT over results is a sorting operation on results.
UNION ALL
results will be shown as arbitrary order on results But UNION
results will be shown as ORDER BY 1, 2, 3, ..., n (n = column number of Tables)
applied on results. You can see this side effect when you don't have any duplicate row.
I add an example,
UNION, it is merging with distinct --> slower, because it need comparing (In Oracle SQL developer, choose query, press F10 to see cost analysis).
UNION ALL, it is merging without distinct --> faster.
SELECT to_date(sysdate, 'yyyy-mm-dd') FROM dual
UNION
SELECT to_date(sysdate, 'yyyy-mm-dd') FROM dual;
and
SELECT to_date(sysdate, 'yyyy-mm-dd') FROM dual
UNION ALL
SELECT to_date(sysdate, 'yyyy-mm-dd') FROM dual;
UNION
merges the contents of two structurally-compatible tables into a single combined table.
Difference:
The difference between UNION
and UNION ALL
is that UNION will
omit duplicate records whereas UNION ALL
will include duplicate records.
Union
Result set is sorted in ascending order whereas UNION ALL
Result set is not sorted
UNION
performs a DISTINCT
on its Result set so it will eliminate any duplicate rows. Whereas UNION ALL
won't remove duplicates and therefore it is faster than UNION
.*
Note: The performance of UNION ALL
will typically be better than UNION
, since UNION
requires the server to do the additional work of removing any duplicates. So, in cases where it is certain that there will not be any duplicates, or where having duplicates is not a problem, use of UNION ALL
would be recommended for performance reasons.
ORDER BY
, sorted results are not guaranteed. Maybe you have a particular SQL vendor in mind (even then, ascending order what exactly...?) but this question has no vendor=specific tags.
Suppose that you have two table Teacher & Student
Both have 4 Column with different Name like this
Teacher - ID(int), Name(varchar(50)), Address(varchar(50)), PositionID(varchar(50))
https://i.stack.imgur.com/8uYB7.png
Student- ID(int), Name(varchar(50)), Email(varchar(50)), PositionID(int)
https://i.stack.imgur.com/lyCoB.png
You can apply UNION or UNION ALL for those two table which have same number of columns. But they have different name or data type.
When you apply UNION
operation on 2 tables, it neglects all duplicate entries(all columns value of row in a table is same of another table). Like this
SELECT * FROM Student
UNION
SELECT * FROM Teacher
the result will be
https://i.stack.imgur.com/FNzdi.png
When you apply UNION ALL
operation on 2 tables, it returns all entries with duplicate(if there is any difference between any column value of a row in 2 tables). Like this
SELECT * FROM Student
UNION ALL
SELECT * FROM Teacher
https://i.stack.imgur.com/DTBLt.png
Performance:
Obviously UNION ALL performance is better that UNION as they do additional task to remove the duplicate values. You can check that from Execution Estimated Time by press ctrl+L at MSSQL
UNION
to convey intent (i.e. no duplicates) because UNION ALL
is unlikely to given any real life performance gain in absolute terms.
One more thing i would like to add-
Union:- Result set is sorted in ascending order.
Union All:- Result set is not sorted. two Query output just gets appended.
UNION
will NOT sort the result in ascending order. Any ordering you see in a result without using order by
is pure coincidence. The DBMS is free to use any strategy it thinks is efficient to remove the duplicates. This might be sorting, but it could also be a hashing algorithm or something entirely different - and the strategy will change with the number of rows. A union
that appears sorted with 100 rows might not be with 100.000 rows
ORDER BY
clause.
UNION removes duplicate records in other hand UNION ALL does not. But one need to check the bulk of data that is going to be processed and the column and data type must be same.
since union internally uses "distinct" behavior to select the rows hence it is more costly in terms of time and performance. like
select project_id from t_project
union
select project_id from t_project_contact
this gives me 2020 records
on other hand
select project_id from t_project
union all
select project_id from t_project_contact
gives me more than 17402 rows
on precedence perspective both has same precedence.
If there is no ORDER BY
, a UNION ALL
may bring rows back as it goes, whereas a UNION
would make you wait until the very end of the query before giving you the whole result set at once. This can make a difference in a time-out situation - a UNION ALL
keeps the connection alive, as it were.
So if you have a time-out issue, and there's no sorting, and duplicates aren't an issue, UNION ALL
may be rather helpful.
UNION and UNION ALL used to combine two or more query results.
UNION command selects distinct and related information from two tables which will eliminates duplicate rows.
On the other hand, UNION ALL command selects all the values from both the tables, which displays all rows.
Important! Difference between Oracle and Mysql: Let's say that t1 t2 don't have duplicate rows between them but they have duplicate rows individual. Example: t1 has sales from 2017 and t2 from 2018
SELECT T1.YEAR, T1.PRODUCT FROM T1
UNION ALL
SELECT T2.YEAR, T2.PRODUCT FROM T2
In ORACLE UNION ALL fetches all rows from both tables. The same will occur in MySQL.
However:
SELECT T1.YEAR, T1.PRODUCT FROM T1
UNION
SELECT T2.YEAR, T2.PRODUCT FROM T2
In ORACLE, UNION fetches all rows from both tables because there are no duplicate values between t1 and t2. On the other hand in MySQL the resultset will have fewer rows because there will be duplicate rows within table t1 and also within table t2!
x union y
is select distinct * from (x union all y)
. select 1 from dual union select 1 from dual
& (select 1 from dual union all select 1 from dual) union select 1 from dual
both return 1 row. PS I don't know whether by t1 & t2 you mean T1 & T1, but what matters is what's in the selects. PS For the UNION (distinct) example you don't clearly say, in terms of duplicates, what is input & for each DBMS what it returns or why. Use enough words & sentences & references to parts of examples to be clear.
UNION ALL
also works on more data types as well. For example when trying to union spatial data types. For example:
select a.SHAPE from tableA a
union
select b.SHAPE from tableB b
will throw
The data type geometry cannot be used as an operand to the UNION, INTERSECT or EXCEPT operators because it is not comparable.
However union all
will not.
Success story sharing