I'm trying to port some old MySQL queries to PostgreSQL, but I'm having trouble with this one:
DELETE FROM logtable ORDER BY timestamp LIMIT 10;
PostgreSQL doesn't allow ordering or limits in its delete syntax, and the table doesn't have a primary key so I can't use a subquery. Additionally, I want to preserve the behavior where the query deletes exactly the given number or records -- for example, if the table contains 30 rows but they all have the same timestamp, I still want to delete 10, although it doesn't matter which 10.
So; how do I delete a fixed number of rows with sorting in PostgreSQL?
Edit: No primary key means there's no log_id
column or similar. Ah, the joys of legacy systems!
alter table foo add column id serial primary key
.
You could try using the ctid
:
DELETE FROM logtable
WHERE ctid IN (
SELECT ctid
FROM logtable
ORDER BY timestamp
LIMIT 10
)
The ctid
is:
The physical location of the row version within its table. Note that although the ctid can be used to locate the row version very quickly, a row's ctid will change if it is updated or moved by VACUUM FULL. Therefore ctid is useless as a long-term row identifier.
There's also oid
but that only exists if you specifically ask for it when you create the table.
Postgres docs recommend to use array instead of IN and subquery. This should work much faster
DELETE FROM logtable
WHERE id = any (array(SELECT id FROM logtable ORDER BY timestamp LIMIT 10));
This and some other tricks can be found here
any (array( ... ));
is faster than in ( ... )
that sounds like a bug in the query optimiser - it should be able to spot that transformation and do the same thing with the data itself.
IN
on an UPDATE
(which might be the difference).
delete from logtable where log_id in (
select log_id from logtable order by timestamp limit 10);
If you don't have a primary key you can use the array Where IN syntax with a composite key.
delete from table1 where (schema,id,lac,cid) in (select schema,id,lac,cid from table1 where lac = 0 limit 1000);
This worked for me.
Assuming you want to delete ANY 10 records (without the ordering) you could do this:
DELETE FROM logtable as t1 WHERE t1.ctid < (select t2.ctid from logtable as t2 where (Select count(*) from logtable t3 where t3.ctid < t2.ctid ) = 10 LIMIT 1);
For my use case, deleting 10M records, this turned out to be faster.
delete from "table" where ("table_ID") in (select "table_ID" from "table" limit 10)
You could write a procedure which loops over the delete for individual lines, the procedure could take a parameter to specify the number of items you want to delete. But that's a bit overkill compared to MySQL.
Success story sharing
VACUUM FULL
or autovacuum to cause problems if they change thectid
values in the table while the query is running?ctid
documentation is thatctid
is stable enough to make this DELETE work okay but not stable enough to, for example, put in another table as a ghetto-FK. Presumably you don't UPDATE thelogtable
so you don't have to worry about that changingctid
s andVACUUM FULL
does lock the table (postgresql.org/docs/current/static/routine-vacuuming.html) so you don't have to worry about the other way thatctid
s can change. @araqnid's PostgreSQL-Fu is pretty strong and the docs agree with him to boot.