ChatGPT解决这个技术问题 Extra ChatGPT

How can I SELECT rows with MAX(Column value), PARTITION by another column in MYSQL?

I have a table of player performance:

CREATE TABLE TopTen (
  id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
  home INT UNSIGNED NOT NULL,
  `datetime`DATETIME NOT NULL,
  player VARCHAR(6) NOT NULL,
  resource INT NOT NULL
);

What query will return the rows for each distinct home holding its maximum value of datetime? In other words, how can I filter by the maximum datetime (grouped by home) and still include other non-grouped, non-aggregate columns (such as player) in the result?

For this sample data:

INSERT INTO TopTen
  (id, home, `datetime`, player, resource)
VALUES
  (1, 10, '04/03/2009', 'john', 399),
  (2, 11, '04/03/2009', 'juliet', 244),
  (5, 12, '04/03/2009', 'borat', 555),
  (3, 10, '03/03/2009', 'john', 300),
  (4, 11, '03/03/2009', 'juliet', 200),
  (6, 12, '03/03/2009', 'borat', 500),
  (7, 13, '24/12/2008', 'borat', 600),
  (8, 13, '01/01/2009', 'borat', 700)
;

the result should be:

id home datetime player resource 1 10 04/03/2009 john 399 2 11 04/03/2009 juliet 244 5 12 04/03/2009 borat 555 8 13 01/01/2009 borat 700

I tried a subquery getting the maximum datetime for each home:

-- 1 ..by the MySQL manual: 

SELECT DISTINCT
  home,
  id,
  datetime AS dt,
  player,
  resource
FROM TopTen t1
WHERE `datetime` = (SELECT
  MAX(t2.datetime)
FROM TopTen t2
GROUP BY home)
GROUP BY `datetime`
ORDER BY `datetime` DESC

The result-set has 130 rows although database holds 187, indicating the result includes some duplicates of home.

Then I tried joining to a subquery that gets the maximum datetime for each row id:

-- 2 ..join

SELECT
  s1.id,
  s1.home,
  s1.datetime,
  s1.player,
  s1.resource
FROM TopTen s1
JOIN (SELECT
  id,
  MAX(`datetime`) AS dt
FROM TopTen
GROUP BY id) AS s2
  ON s1.id = s2.id
ORDER BY `datetime`

Nope. Gives all the records.

I tried various exotic queries, each with various results, but nothing that got me any closer to solving this problem.


A
Andreas Rejbrand

You are so close! All you need to do is select BOTH the home and its max date time, then join back to the topten table on BOTH fields:

SELECT tt.*
FROM topten tt
INNER JOIN
    (SELECT home, MAX(datetime) AS MaxDateTime
    FROM topten
    GROUP BY home) groupedtt 
ON tt.home = groupedtt.home 
AND tt.datetime = groupedtt.MaxDateTime

Test it for distinct, if two equal max datetime be in the same home (with different players)
I think the classic way to do this is with a natural join: "SELECT tt.* FROM topten tt NATURAL JOIN ( SELECT home, MAX(datetime) AS datetime FROM topten GROUP BY home ) mostrecent;" Same query exactly, but arguably more readable
what about if there are two rows which have same 'home' and 'datetime' field values?
@Young the problem with your query is that it may return id, player and resource of non-max row for a given home i.e. for home = 10 you may get : 3 | 10 | 04/03/2009 | john | 300 In other words it doesn't guarantees that all column of a row in resultset will belong to max(datetime) for given home.
Regarding @KemalDuran 's comment above , if there are two rows with the same home and datetime fields, what you need to do is take Michael La Voie 's solution and add MAX(id) AS MaxID to the inner SELECT statement and then go and add another line AND tt.id = groupedtt.MaxID at the end.
f
fedorqui

The fastest MySQL solution, without inner queries and without GROUP BY:

SELECT m.*                    -- get the row that contains the max value
FROM topten m                 -- "m" from "max"
    LEFT JOIN topten b        -- "b" from "bigger"
        ON m.home = b.home    -- match "max" row with "bigger" row by `home`
        AND m.datetime < b.datetime           -- want "bigger" than "max"
WHERE b.datetime IS NULL      -- keep only if there is no bigger than max

Explanation:

Join the table with itself using the home column. The use of LEFT JOIN ensures all the rows from table m appear in the result set. Those that don't have a match in table b will have NULLs for the columns of b.

The other condition on the JOIN asks to match only the rows from b that have bigger value on the datetime column than the row from m.

Using the data posted in the question, the LEFT JOIN will produce this pairs:

+------------------------------------------+--------------------------------+
|              the row from `m`            |    the matching row from `b`   |
|------------------------------------------|--------------------------------|
| id  home  datetime     player   resource | id    home   datetime      ... |
|----|-----|------------|--------|---------|------|------|------------|-----|
| 1  | 10  | 04/03/2009 | john   | 399     | NULL | NULL | NULL       | ... | *
| 2  | 11  | 04/03/2009 | juliet | 244     | NULL | NULL | NULL       | ... | *
| 5  | 12  | 04/03/2009 | borat  | 555     | NULL | NULL | NULL       | ... | *
| 3  | 10  | 03/03/2009 | john   | 300     | 1    | 10   | 04/03/2009 | ... |
| 4  | 11  | 03/03/2009 | juliet | 200     | 2    | 11   | 04/03/2009 | ... |
| 6  | 12  | 03/03/2009 | borat  | 500     | 5    | 12   | 04/03/2009 | ... |
| 7  | 13  | 24/12/2008 | borat  | 600     | 8    | 13   | 01/01/2009 | ... |
| 8  | 13  | 01/01/2009 | borat  | 700     | NULL | NULL | NULL       | ... | *
+------------------------------------------+--------------------------------+

Finally, the WHERE clause keeps only the pairs that have NULLs in the columns of b (they are marked with * in the table above); this means, due to the second condition from the JOIN clause, the row selected from m has the biggest value in column datetime.

Read the SQL Antipatterns: Avoiding the Pitfalls of Database Programming book for other SQL tips.


With SQLite, the first one is much much slower than La Voie's version when there is no index on the matched column (i.e. "home"). (Tested with 24k rows resulting in 13k rows)
This is the best answer, if you show the execution plan you will see one step less with this query
what will happen if 2 rows have the same home and datetime and the datetime is the maximum for that particular home ?
@AjaxLeung an index on the columns home and datetime. As a general rule, an index helps if it contains the columns used in the ON, WHERE or ORDER BY clauses. However, it depends on how the columns are used. An index is useless if the column is used in an expression. Put EXPLAIN in front of the query to find out what indexes are used (and how).
This idea work for me.It help simpifying my subquery
M
Maksym Gontar

Here goes T-SQL version:

-- Test data
DECLARE @TestTable TABLE (id INT, home INT, date DATETIME, 
  player VARCHAR(20), resource INT)
INSERT INTO @TestTable
SELECT 1, 10, '2009-03-04', 'john', 399 UNION
SELECT 2, 11, '2009-03-04', 'juliet', 244 UNION
SELECT 5, 12, '2009-03-04', 'borat', 555 UNION
SELECT 3, 10, '2009-03-03', 'john', 300 UNION
SELECT 4, 11, '2009-03-03', 'juliet', 200 UNION
SELECT 6, 12, '2009-03-03', 'borat', 500 UNION
SELECT 7, 13, '2008-12-24', 'borat', 600 UNION
SELECT 8, 13, '2009-01-01', 'borat', 700

-- Answer
SELECT id, home, date, player, resource 
FROM (SELECT id, home, date, player, resource, 
    RANK() OVER (PARTITION BY home ORDER BY date DESC) N
    FROM @TestTable
)M WHERE N = 1

-- and if you really want only home with max date
SELECT T.id, T.home, T.date, T.player, T.resource 
    FROM @TestTable T
INNER JOIN 
(   SELECT TI.id, TI.home, TI.date, 
        RANK() OVER (PARTITION BY TI.home ORDER BY TI.date) N
    FROM @TestTable TI
    WHERE TI.date IN (SELECT MAX(TM.date) FROM @TestTable TM)
)TJ ON TJ.N = 1 AND T.id = TJ.id

EDIT
Unfortunately, there are no RANK() OVER function in MySQL.
But it can be emulated, see Emulating Analytic (AKA Ranking) Functions with MySQL.
So this is MySQL version:

SELECT id, home, date, player, resource 
FROM TestTable AS t1 
WHERE 
    (SELECT COUNT(*) 
            FROM TestTable AS t2 
            WHERE t2.home = t1.home AND t2.date > t1.date
    ) = 0

@MaxGontar, your mysql solution rocks, thx. what if in your @_TestTable you remove row#1>: SELECT 1, 10, '2009-03-04', 'john', 399 , this is, what if you have a single row for a given home value? thx.
BUG: Replace "RANK()" with "ROW_NUMBER()". If you have a tie (caused by a duplicate date value) you will have two records with "1" for N.
Q
Quassnoi

This will work even if you have two or more rows for each home with equal DATETIME's:

SELECT id, home, datetime, player, resource
FROM   (
       SELECT (
              SELECT  id
              FROM    topten ti
              WHERE   ti.home = t1.home
              ORDER BY
                      ti.datetime DESC
              LIMIT 1
              ) lid
       FROM   (
              SELECT  DISTINCT home
              FROM    topten
              ) t1
       ) ro, topten t2
WHERE  t2.id = ro.lid

added lid field in table, No Good
This one didn't execute on PHPMyAdmin. Page refreshes but there's no result nor error..?
WHERE ti.home = t1.home - can you explain the syntax ?
@IstiaqueAhmed: what exactly is that you don't understand here? It's a correlated query, and the expression you mention is a correlation condition.
@Quassnoi, The select query that has the line WHERE ti.home = t1.home does not need the FROM clause that defines t1. So how is it used ?
s
sactiw

I think this will give you the desired result:

SELECT   home, MAX(datetime)
FROM     my_table
GROUP BY home

BUT if you need other columns as well, just make a join with the original table (check Michael La Voie answer)

Best regards.


He needs other columns also.
id, home, datetime, player, resource
M
MJB

Since people seem to keep running into this thread (comment date ranges from 1.5 year) isn't this much simpler:

SELECT * FROM (SELECT * FROM topten ORDER BY datetime DESC) tmp GROUP BY home

No aggregation functions needed...

Cheers.


This doesn't seem to work. Error Message: Column 'x' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
This definitely won't work in SQL Server or Oracle, though it looks like it might work in MySQL.
This is really beautiful! How does this work? By using DESC and the default group return column? So if I changed it to datetime ASC, it would return the earliest row for each home?
This straight-up doesn't work if you have nonaggregated columns (in MySQL).
S
Shiva

You can also try this one and for large tables query performance will be better. It works when there no more than two records for each home and their dates are different. Better general MySQL query is one from Michael La Voie above.

SELECT t1.id, t1.home, t1.date, t1.player, t1.resource
FROM   t_scores_1 t1 
INNER JOIN t_scores_1 t2
   ON t1.home = t2.home
WHERE t1.date > t2.date

Or in case of Postgres or those dbs that provide analytic functions try

SELECT t.* FROM 
(SELECT t1.id, t1.home, t1.date, t1.player, t1.resource
  , row_number() over (partition by t1.home order by t1.date desc) rw
 FROM   topten t1 
 INNER JOIN topten t2
   ON t1.home = t2.home
 WHERE t1.date > t2.date 
) t
WHERE t.rw = 1

Is this answer correct? I tried to use it, but it seams not to select the record with newest date for 'home', but only removes record with oldest date. Here's an example: SQLfiddle
@kidOfDeath - Updated my reply with context and Postgres query
With SQLite, the first one is much much slower than La Voie's version when there is no index on the matched column (i.e. "home").
Q
Quassnoi
SELECT  tt.*
FROM    TestTable tt 
INNER JOIN 
        (
        SELECT  coord, MAX(datetime) AS MaxDateTime 
        FROM    rapsa 
        GROUP BY
                krd 
        ) groupedtt
ON      tt.coord = groupedtt.coord
        AND tt.datetime = groupedtt.MaxDateTime

F
FerranB

This works on Oracle:

with table_max as(
  select id
       , home
       , datetime
       , player
       , resource
       , max(home) over (partition by home) maxhome
    from table  
)
select id
     , home
     , datetime
     , player
     , resource
  from table_max
 where home = maxhome

how does this pick the max datetime? he asked to group by home, and select max datetime. I dont see how this does that.
S
SysDragon

Try this for SQL Server:

WITH cte AS (
   SELECT home, MAX(year) AS year FROM Table1 GROUP BY home
)
SELECT * FROM Table1 a INNER JOIN cte ON a.home = cte.home AND a.year = cte.year

J
Jason Heo

Here is MySQL version which prints only one entry where there are duplicates MAX(datetime) in a group.

You could test here http://www.sqlfiddle.com/#!2/0a4ae/1

Sample Data

mysql> SELECT * from topten;
+------+------+---------------------+--------+----------+
| id   | home | datetime            | player | resource |
+------+------+---------------------+--------+----------+
|    1 |   10 | 2009-04-03 00:00:00 | john   |      399 |
|    2 |   11 | 2009-04-03 00:00:00 | juliet |      244 |
|    3 |   10 | 2009-03-03 00:00:00 | john   |      300 |
|    4 |   11 | 2009-03-03 00:00:00 | juliet |      200 |
|    5 |   12 | 2009-04-03 00:00:00 | borat  |      555 |
|    6 |   12 | 2009-03-03 00:00:00 | borat  |      500 |
|    7 |   13 | 2008-12-24 00:00:00 | borat  |      600 |
|    8 |   13 | 2009-01-01 00:00:00 | borat  |      700 |
|    9 |   10 | 2009-04-03 00:00:00 | borat  |      700 |
|   10 |   11 | 2009-04-03 00:00:00 | borat  |      700 |
|   12 |   12 | 2009-04-03 00:00:00 | borat  |      700 |
+------+------+---------------------+--------+----------+

MySQL Version with User variable

SELECT *
FROM (
    SELECT ord.*,
        IF (@prev_home = ord.home, 0, 1) AS is_first_appear,
        @prev_home := ord.home
    FROM (
        SELECT t1.id, t1.home, t1.player, t1.resource
        FROM topten t1
        INNER JOIN (
            SELECT home, MAX(datetime) AS mx_dt
            FROM topten
            GROUP BY home
          ) x ON t1.home = x.home AND t1.datetime = x.mx_dt
        ORDER BY home
    ) ord, (SELECT @prev_home := 0, @seq := 0) init
) y
WHERE is_first_appear = 1;
+------+------+--------+----------+-----------------+------------------------+
| id   | home | player | resource | is_first_appear | @prev_home := ord.home |
+------+------+--------+----------+-----------------+------------------------+
|    9 |   10 | borat  |      700 |               1 |                     10 |
|   10 |   11 | borat  |      700 |               1 |                     11 |
|   12 |   12 | borat  |      700 |               1 |                     12 |
|    8 |   13 | borat  |      700 |               1 |                     13 |
+------+------+--------+----------+-----------------+------------------------+
4 rows in set (0.00 sec)

Accepted Answers' outout

SELECT tt.*
FROM topten tt
INNER JOIN
    (
    SELECT home, MAX(datetime) AS MaxDateTime
    FROM topten
    GROUP BY home
) groupedtt ON tt.home = groupedtt.home AND tt.datetime = groupedtt.MaxDateTime
+------+------+---------------------+--------+----------+
| id   | home | datetime            | player | resource |
+------+------+---------------------+--------+----------+
|    1 |   10 | 2009-04-03 00:00:00 | john   |      399 |
|    2 |   11 | 2009-04-03 00:00:00 | juliet |      244 |
|    5 |   12 | 2009-04-03 00:00:00 | borat  |      555 |
|    8 |   13 | 2009-01-01 00:00:00 | borat  |      700 |
|    9 |   10 | 2009-04-03 00:00:00 | borat  |      700 |
|   10 |   11 | 2009-04-03 00:00:00 | borat  |      700 |
|   12 |   12 | 2009-04-03 00:00:00 | borat  |      700 |
+------+------+---------------------+--------+----------+
7 rows in set (0.00 sec)

Altho I love this answer, as this is helping me so much, I have to point to one major flaw, that it dependat on used mysql system. Basically, this solution relies on ORDER BY clause in subselect. This MIGHT, or MIGHT NOT work in various mysql environment. I haven't tried it on pure MySQL, but for sure this doesn't work RELIABLY on MariaDB 10.1, as explained here stackoverflow.com/questions/26372511/… but the very same code does work ok on Percona Server. To be precise, you MIGHT, or MIGHT NOT get the same results, depending on amount of t1 columns.
The example for this statement is, that on MariaDB 10.1 it worked, when I used 5 columns from t1 table. As soon as I added sixth column, obviously messing with the "natural" data sort in original table, it stopped working. The reason is, the data in subselect became un-ordered and thus I had "is_first_appear=1" condition met several times. The very same code, with same data, worked on Percona ok.
W
Woot4Moo
SELECT c1, c2, c3, c4, c5 FROM table1 WHERE c3 = (select max(c3) from table)

SELECT * FROM table1 WHERE c3 = (select max(c3) from table1)

M
M Khalid Junaid

Another way to gt the most recent row per group using a sub query which basically calculates a rank for each row per group and then filter out your most recent rows as with rank = 1

select a.*
from topten a
where (
  select count(*)
  from topten b
  where a.home = b.home
  and a.`datetime` < b.`datetime`
) +1 = 1

DEMO

Here is the visual demo for rank no for each row for better understanding

By reading some comments what about if there are two rows which have same 'home' and 'datetime' field values?

Above query will fail and will return more than 1 rows for above situation. To cover up this situation there will be a need of another criteria/parameter/column to decide which row should be taken which falls in above situation. By viewing sample data set i assume there is a primary key column id which should be set to auto increment. So we can use this column to pick the most recent row by tweaking same query with the help of CASE statement like

select a.*
from topten a
where (
  select count(*)
  from topten b
  where a.home = b.home
  and  case 
       when a.`datetime` = b.`datetime`
       then a.id < b.id
       else a.`datetime` < b.`datetime`
       end
) + 1 = 1

DEMO

Above query will pick the row with highest id among the same datetime values

visual demo for rank no for each row


R
Roland

Why not using: SELECT home, MAX(datetime) AS MaxDateTime,player,resource FROM topten GROUP BY home Did I miss something?


That would only be valid with MySQL, and only versions before 5.7 (?) or after 5.7 with ONLY_FULL_GROUP_BY disabled, since it is SELECTing columns that have not been aggregated/GROUPed (player, resource) which means MySQL will provide randomly chosen values for those two result fields. It would not be a problem for the player column since that correlates to the home column, but the resource column would not correlate with the home or datetime column and you could not guarantee which resource value you'd receive.
+1 for the explanation, BUT w.r.t the asked question this query won't return the expected output in MySQL version 5.6 and before and I highly doubt it to behave otherwise in MySQL version 5.7 and after.
@simpleuser, ` It would not be a problem for the player column since that correlates to the home column` - can you explain more ?
@IstiaqueAhmed as I look at it again, that statement is incorrect. I had thought each player always had the same home value, but I see now that they do not, so the same random select issue will occur for that column as well
K
Kazi Mohammad Ali Nur

In MySQL 8.0 this can be achieved efficiently by using row_number() window function with common table expression.

(Here row_number() basically generating unique sequence for each row for every player starting with 1 in descending order of resource. So, for every player row with sequence number 1 will be with highest resource value. Now all we need to do is selecting row with sequence number 1 for each player. It can be done by writing an outer query around this query. But we used common table expression instead since it's more readable.)

Schema:

 create  TABLE TestTable(id INT, home INT, date DATETIME, 
   player VARCHAR(20), resource INT);
 INSERT INTO TestTable
 SELECT 1, 10, '2009-03-04', 'john', 399 UNION
 SELECT 2, 11, '2009-03-04', 'juliet', 244 UNION
 SELECT 5, 12, '2009-03-04', 'borat', 555 UNION
 SELECT 3, 10, '2009-03-03', 'john', 300 UNION
 SELECT 4, 11, '2009-03-03', 'juliet', 200 UNION
 SELECT 6, 12, '2009-03-03', 'borat', 500 UNION
 SELECT 7, 13, '2008-12-24', 'borat', 600 UNION
 SELECT 8, 13, '2009-01-01', 'borat', 700

Query:

 with cte as 
 (
     select id, home, date , player, resource, 
     Row_Number()Over(Partition by home order by date desc) rownumber from TestTable
 )
 select id, home, date , player, resource from cte where rownumber=1

Output:

id home date player resource 1 10 2009-03-04 00:00:00 john 399 2 11 2009-03-04 00:00:00 juliet 244 5 12 2009-03-04 00:00:00 borat 555 8 13 2009-01-01 00:00:00 borat 700

db<>fiddle here


M
Manoj Kargeti

@Michae The accepted answer will working fine in most of the cases but it fail for one for as below.

In case if there were 2 rows having HomeID and Datetime same the query will return both rows, not distinct HomeID as required, for that add Distinct in query as below.

SELECT DISTINCT tt.home  , tt.MaxDateTime
FROM topten tt
INNER JOIN
    (SELECT home, MAX(datetime) AS MaxDateTime
    FROM topten
    GROUP BY home) groupedtt 
ON tt.home = groupedtt.home 
AND tt.datetime = groupedtt.MaxDateTime

result shows - "#1054 - Unknown column 'tt.MaxDateTime' in 'field list'"
@IstiaqueAhmed do you have MaxDatetime filed i.e any column name like that..?
No, the table in OP does not have any such column.
the error also saying the same please..what exactly you wanna do ? can u send the table structure and your query.
K
Khb

Try this

select * from mytable a join
(select home, max(datetime) datetime
from mytable
group by home) b
 on a.home = b.home and a.datetime = b.datetime

Regards K


Test it for distinct, if two equal max datetime be in the same home (with different players)
the alias for max(datetime) is datetime. Won't it make any problem ?
How is the highest datetime selected ?
S
Simon

this is the query you need:

 SELECT b.id, a.home,b.[datetime],b.player,a.resource FROM
 (SELECT home,MAX(resource) AS resource FROM tbl_1 GROUP BY home) AS a

 LEFT JOIN

 (SELECT id,home,[datetime],player,resource FROM tbl_1) AS b
 ON  a.resource = b.resource WHERE a.home =b.home;

can you explain your answer ?
ס
סטנלי גרונן

Hope below query will give the desired output:

Select id, home,datetime,player,resource, row_number() over (Partition by home ORDER by datetime desc) as rownum from tablename where rownum=1

M
Moradnejad

(NOTE: The answer of Michael is perfect for a situation where the target column datetime cannot have duplicate values for each distinct home.)

If your table has duplicate rows for homexdatetime and you need to only select one row for each distinct home column, here is my solution to it:

Your table needs one unique column (like id). If it doesn't, create a view and add a random column to it.

Use this query to select a single row for each unique home value. Selects the lowest id in case of duplicate datetime.

SELECT tt.*
FROM topten tt
INNER JOIN
    (
    SELECT min(id) as min_id, home from topten tt2
    INNER JOIN 
        (
        SELECT home, MAX(datetime) AS MaxDateTime
        FROM topten
        GROUP BY home) groupedtt2
    ON tt2.home = groupedtt2.home
    ) as groupedtt
ON tt.id = groupedtt.id

J
Jeka Developer

Accepted answer doesn't work for me if there are 2 records with same date and home. It will return 2 records after join. While I need to select any (random) of them. This query is used as joined subquery so just limit 1 is not possible there. Here is how I reached desired result. Don't know about performance however.

select SUBSTRING_INDEX(GROUP_CONCAT(id order by datetime desc separator ','),',',1) as id, home, MAX(datetime) as 'datetime'
 from topten
 group by (home)

N
Neon Tetra

Because this hasn't been posted - this works in SQLServer, and is the only solution I've seen that doesn't require subqueries or CTEs - I think this is the most elegant way to solve this kind of problem

  SELECT TOP 1 WITH TIES *
    FROM TopTen
ORDER BY ROW_NUMBER() OVER (PARTITION BY home
                                ORDER BY [datetime] DESC)

Some notes on how it works - The Window Function in the Order By clause applies a counter to each group of home values, such that the one with the highest [datetime] value receives 1.

By SELECTing TOP 1 WITH TIES , you're selecting the record with the first ROW_NUMBER value (which is 1), as well as all other records with the same 'tying' ROW_NUMBER value of 1.

As a consequence, you retrieve all data for each of the 1st ranked records.