ChatGPT解决这个技术问题 Extra ChatGPT

Real life example, when to use OUTER / CROSS APPLY in SQL

I have been looking at CROSS / OUTER APPLY with a colleague and we're struggling to find real life examples of where to use them.

I've spent quite a lot of time looking at When should I use Cross Apply over Inner Join? and googling but the main (only) example seems pretty bizarre (using the rowcount from a table to determine how many rows to select from another table).

I thought this scenario may benefit from OUTER APPLY:

Contacts Table (contains 1 record for each contact) Communication Entries Table (can contain n phone, fax, email fro each contact)

But using subqueries, common table expressions, OUTER JOIN with RANK() and OUTER APPLY all seem to perform equally. I'm guessing this means the scenario isn't applicable to APPLY.

Please share some real life examples and help explain the feature!


M
Martin Smith

Some uses for APPLY are...

1) Top N per group queries (can be more efficient for some cardinalities)

SELECT pr.name,
       pa.name
FROM   sys.procedures pr
       OUTER APPLY (SELECT TOP 2 *
                    FROM   sys.parameters pa
                    WHERE  pa.object_id = pr.object_id
                    ORDER  BY pr.name) pa
ORDER  BY pr.name,
          pa.name 

2) Calling a Table Valued Function for each row in the outer query

SELECT *
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle)

3) Reusing a column alias

SELECT number,
       doubled_number,
       doubled_number_plus_one
FROM master..spt_values
CROSS APPLY (SELECT 2 * CAST(number AS BIGINT)) CA1(doubled_number)  
CROSS APPLY (SELECT doubled_number + 1) CA2(doubled_number_plus_one)  

4) Unpivoting more than one group of columns

Assumes 1NF violating table structure....

CREATE TABLE T
  (
     Id   INT PRIMARY KEY,

     Foo1 INT, Foo2 INT, Foo3 INT,
     Bar1 INT, Bar2 INT, Bar3 INT
  ); 

Example using 2008+ VALUES syntax.

SELECT Id,
       Foo,
       Bar
FROM   T
       CROSS APPLY (VALUES(Foo1, Bar1),
                          (Foo2, Bar2),
                          (Foo3, Bar3)) V(Foo, Bar); 

In 2005 UNION ALL can be used instead.

SELECT Id,
       Foo,
       Bar
FROM   T
       CROSS APPLY (SELECT Foo1, Bar1 
                    UNION ALL
                    SELECT Foo2, Bar2 
                    UNION ALL
                    SELECT Foo3, Bar3) V(Foo, Bar);

A nice list of uses there but the key is the real life examples- i'd love to see one for each.
For #1 this can be achieved equally using rank, subqueries or common table expressions? Can you provide an example when this isn't true?
@LeeTickett - Please read the link. It has a 4 page discussion about when you would prefer one to another.
Make sure to visit the link included in example #1. I've used both of these approaches (ROW OVER and CROSS APPLY) with both performing well in various scenarios, but I've never understood why they perform differently. That article was sent from the heavens!! The focus on proper indexing matching the order by directions helped in a big way for queries that have "proper" structure but performance issues when queried. Thank you for including it!!
@mr_eclair looks like it is now at itprotoday.com/software-development/…
S
Sarath Subramanian

There are various situations where you cannot avoid CROSS APPLY or OUTER APPLY.

Consider you have two tables.

MASTER TABLE

x------x--------------------x
| Id   |        Name        |
x------x--------------------x
|  1   |          A         |
|  2   |          B         |
|  3   |          C         |
x------x--------------------x

DETAILS TABLE

x------x--------------------x-------x
| Id   |      PERIOD        |   QTY |
x------x--------------------x-------x
|  1   |   2014-01-13       |   10  |
|  1   |   2014-01-11       |   15  |
|  1   |   2014-01-12       |   20  |
|  2   |   2014-01-06       |   30  |
|  2   |   2014-01-08       |   40  |
x------x--------------------x-------x                                       

CROSS APPLY

There are many situation where we need to replace INNER JOIN with CROSS APPLY.

1. If we want to join 2 tables on TOP n results with INNER JOIN functionality

Consider if we need to select Id and Name from Master and last two dates for each Id from Details table.

SELECT M.ID,M.NAME,D.PERIOD,D.QTY
FROM MASTER M
INNER JOIN
(
    SELECT TOP 2 ID, PERIOD,QTY 
    FROM DETAILS D      
    ORDER BY CAST(PERIOD AS DATE)DESC
)D
ON M.ID=D.ID

The above query generates the following result.

x------x---------x--------------x-------x
|  Id  |   Name  |   PERIOD     |  QTY  |
x------x---------x--------------x-------x
|   1  |   A     | 2014-01-13   |  10   |
|   1  |   A     | 2014-01-12   |  20   |
x------x---------x--------------x-------x

See, it generated results for last two dates with last two date's Id and then joined these records only in outer query on Id, which is wrong. To accomplish this, we need to use CROSS APPLY.

SELECT M.ID,M.NAME,D.PERIOD,D.QTY
FROM MASTER M
CROSS APPLY
(
    SELECT TOP 2 ID, PERIOD,QTY 
    FROM DETAILS D  
    WHERE M.ID=D.ID
    ORDER BY CAST(PERIOD AS DATE)DESC
)D

and forms he following result.

x------x---------x--------------x-------x
|  Id  |   Name  |   PERIOD     |  QTY  |
x------x---------x--------------x-------x
|   1  |   A     | 2014-01-13   |  10   |
|   1  |   A     | 2014-01-12   |  20   |
|   2  |   B     | 2014-01-08   |  40   |
|   2  |   B     | 2014-01-06   |  30   |
x------x---------x--------------x-------x

Here is the working. The query inside CROSS APPLY can reference the outer table, where INNER JOIN cannot do this(throws compile error). When finding the last two dates, joining is done inside CROSS APPLY ie, WHERE M.ID=D.ID.

2. When we need INNER JOIN functionality using functions.

CROSS APPLY can be used as a replacement with INNER JOIN when we need to get result from Master table and a function.

SELECT M.ID,M.NAME,C.PERIOD,C.QTY
FROM MASTER M
CROSS APPLY dbo.FnGetQty(M.ID) C

And here is the function

CREATE FUNCTION FnGetQty 
(   
    @Id INT 
)
RETURNS TABLE 
AS
RETURN 
(
    SELECT ID,PERIOD,QTY 
    FROM DETAILS
    WHERE ID=@Id
)

which generated the following result

x------x---------x--------------x-------x
|  Id  |   Name  |   PERIOD     |  QTY  |
x------x---------x--------------x-------x
|   1  |   A     | 2014-01-13   |  10   |
|   1  |   A     | 2014-01-11   |  15   |
|   1  |   A     | 2014-01-12   |  20   |
|   2  |   B     | 2014-01-06   |  30   |
|   2  |   B     | 2014-01-08   |  40   |
x------x---------x--------------x-------x

OUTER APPLY

1. If we want to join 2 tables on TOP n results with LEFT JOIN functionality

Consider if we need to select Id and Name from Master and last two dates for each Id from Details table.

SELECT M.ID,M.NAME,D.PERIOD,D.QTY
FROM MASTER M
LEFT JOIN
(
    SELECT TOP 2 ID, PERIOD,QTY 
    FROM DETAILS D  
    ORDER BY CAST(PERIOD AS DATE)DESC
)D
ON M.ID=D.ID

which forms the following result

x------x---------x--------------x-------x
|  Id  |   Name  |   PERIOD     |  QTY  |
x------x---------x--------------x-------x
|   1  |   A     | 2014-01-13   |  10   |
|   1  |   A     | 2014-01-12   |  20   |
|   2  |   B     |   NULL       |  NULL |
|   3  |   C     |   NULL       |  NULL |
x------x---------x--------------x-------x

This will bring wrong results ie, it will bring only latest two dates data from Details table irrespective of Id even though we join with Id. So the proper solution is using OUTER APPLY.

SELECT M.ID,M.NAME,D.PERIOD,D.QTY
FROM MASTER M
OUTER APPLY
(
    SELECT TOP 2 ID, PERIOD,QTY 
    FROM DETAILS D  
    WHERE M.ID=D.ID
    ORDER BY CAST(PERIOD AS DATE)DESC
)D

which forms the following desired result

x------x---------x--------------x-------x
|  Id  |   Name  |   PERIOD     |  QTY  |
x------x---------x--------------x-------x
|   1  |   A     | 2014-01-13   |  10   |
|   1  |   A     | 2014-01-12   |  20   |
|   2  |   B     | 2014-01-08   |  40   |
|   2  |   B     | 2014-01-06   |  30   |
|   3  |   C     |   NULL       |  NULL |
x------x---------x--------------x-------x

2. When we need LEFT JOIN functionality using functions.

OUTER APPLY can be used as a replacement with LEFT JOIN when we need to get result from Master table and a function.

SELECT M.ID,M.NAME,C.PERIOD,C.QTY
FROM MASTER M
OUTER APPLY dbo.FnGetQty(M.ID) C

And the function goes here.

CREATE FUNCTION FnGetQty 
(   
    @Id INT 
)
RETURNS TABLE 
AS
RETURN 
(
    SELECT ID,PERIOD,QTY 
    FROM DETAILS
    WHERE ID=@Id
)

which generated the following result

x------x---------x--------------x-------x
|  Id  |   Name  |   PERIOD     |  QTY  |
x------x---------x--------------x-------x
|   1  |   A     | 2014-01-13   |  10   |
|   1  |   A     | 2014-01-11   |  15   |
|   1  |   A     | 2014-01-12   |  20   |
|   2  |   B     | 2014-01-06   |  30   |
|   2  |   B     | 2014-01-08   |  40   |
|   3  |   C     |   NULL       |  NULL |
x------x---------x--------------x-------x

Common feature of CROSS APPLY and OUTER APPLY

CROSS APPLY or OUTER APPLY can be used to retain NULL values when unpivoting, which are interchangeable.

Consider you have the below table

x------x-------------x--------------x
|  Id  |   FROMDATE  |   TODATE     |
x------x-------------x--------------x
|   1  |  2014-01-11 | 2014-01-13   | 
|   1  |  2014-02-23 | 2014-02-27   | 
|   2  |  2014-05-06 | 2014-05-30   |    
|   3  |   NULL      |   NULL       | 
x------x-------------x--------------x

When you use UNPIVOT to bring FROMDATE AND TODATE to one column, it will eliminate NULL values by default.

SELECT ID,DATES
FROM MYTABLE
UNPIVOT (DATES FOR COLS IN (FROMDATE,TODATE)) P

which generates the below result. Note that we have missed the record of Id number 3

  x------x-------------x
  | Id   |    DATES    |
  x------x-------------x
  |  1   |  2014-01-11 |
  |  1   |  2014-01-13 |
  |  1   |  2014-02-23 |
  |  1   |  2014-02-27 |
  |  2   |  2014-05-06 |
  |  2   |  2014-05-30 |
  x------x-------------x

In such cases a CROSS APPLY or OUTER APPLY will be useful

SELECT DISTINCT ID,DATES
FROM MYTABLE 
OUTER APPLY(VALUES (FROMDATE),(TODATE))
COLUMNNAMES(DATES)

which forms the following result and retains Id where its value is 3

  x------x-------------x
  | Id   |    DATES    |
  x------x-------------x
  |  1   |  2014-01-11 |
  |  1   |  2014-01-13 |
  |  1   |  2014-02-23 |
  |  1   |  2014-02-27 |
  |  2   |  2014-05-06 |
  |  2   |  2014-05-30 |
  |  3   |     NULL    |
  x------x-------------x

Instead of posting the exact same answer on two questions, why not flag one as a duplicate?
I find this answer to be more applicable to answering the original question. Its examples show 'real-life' scenarios.
Great answer overall! For sure this is a better answer than the accepted one, because it is: simple, with handy visual examples, and explanations.
Thank you very much for the detailed and comprehensible answer!
I think this should be the accepted answer. Its shows 'real-life' example which title says.
B
BJury

One real life example would be if you had a scheduler and wanted to see what the most recent log entry was for each scheduled task.

select t.taskName, lg.logResult, lg.lastUpdateDate
from task t
cross apply (select top 1 taskID, logResult, lastUpdateDate
             from taskLog l
             where l.taskID = t.taskID
             order by lastUpdateDate desc) lg

in our tests we always found join with window function the most efficient for top n (i thought this would always be true as apply and subquery are both cursive/require nested loops). although i think i may have now cracked it... thanks to Martin's link which suggests if you're not returning the entire table and there are not optimal indexes on the table then the number of reads would be much smaller using cross apply (or a subquery if top n where n = 1)
I've got essentially that query right here and its certainly not performing any subquery with nested loops. Given the log table has a PK of taskID and lastUpdateDate, its a very quick operation. How would you reform that query to use a window function?
select * from task t inner join (select taskid, logresult, lastupdatedate, rank() over(partition by taskid order by lastupdatedate desc) _rank) lg on lg.taskid = t.taskid and lg._rank = 1
B
BJury

To answer the point above knock up an example:

create table #task (taskID int identity primary key not null, taskName varchar(50) not null)
create table #log (taskID int not null, reportDate datetime not null, result varchar(50) not null, primary key(reportDate, taskId))

insert #task select 'Task 1'
insert #task select 'Task 2'
insert #task select 'Task 3'
insert #task select 'Task 4'
insert #task select 'Task 5'
insert #task select 'Task 6'

insert  #log
select  taskID, 39951 + number, 'Result text...'
from    #task
        cross join (
            select top 1000 row_number() over (order by a.id) as number from syscolumns a cross join syscolumns b cross join syscolumns c) n

And now run the two queries with a execution plan.

select  t.taskID, t.taskName, lg.reportDate, lg.result
from    #task t
        left join (select taskID, reportDate, result, rank() over (partition by taskID order by reportDate desc) rnk from #log) lg
            on lg.taskID = t.taskID and lg.rnk = 1

select  t.taskID, t.taskName, lg.reportDate, lg.result
from    #task t
        outer apply (   select  top 1 l.*
                        from    #log l
                        where   l.taskID = t.taskID
                        order   by reportDate desc) lg

You can see that the outer apply query is more efficient. (Couldn't attach the plan as I'm a new user... Doh.)


the execution plan interests me- do you know why the rank() solution does an index scan and an expensive sort as opposed to outer apply which does an index seek and doesn't appear to do a sort (although it must because you can't do a top without a sort?)
The outer apply doesn't need to perform a sort, as it can use the index on the underlying table. Presumably the query with the rank() function needs to process the entire table to ensure its rankings are correct.
you can't do a top without a sort. although your point about processing the whole table COULD be true it would surprise me (i know the sql optimizer/compiler can disappoint from time to time but this would be crazy behaviour)
You can top a top without a sort when the data your grouping by is against an index, as the optimiser knows its already sorted so literally just needs to pull the first (or last) entry off from the index.