Fastest check if row exists in PostgreSQL

sql postgresql

I have a bunch of rows that I need to insert into table, but these inserts are always done in batches. So I want to check if a single row from the batch exists in the table because then I know they all were inserted.

So its not a primary key check, but shouldn't matter too much. I would like to only check single row so count(*) probably isn't good, so its something like exists I guess.

But since I'm fairly new to PostgreSQL I'd rather ask people who know.

My batch contains rows with following structure:

userid | rightid | remaining_count

So if table contains any rows with provided userid it means they all are present there.

You want to see if the table has ANY rows, or any rows from your batch?

any rows from my batch yes. they all share same field ill edit a little.

Please clarify your question. You want to add a batch of records, all or nothing? Is there something special about count ? (BTW a reserved word, impractical as a column name)

okay, I was trying to simplify actual situation a little but we are getting closer and closer to real implementation. Once those rows are inserted (theres another field for_date) I begin decrementing rights for specified user as they use specific rights, once rights become 0 they cannot perform those actions anymore for that date. thats the real story

Just show (the relevant part of) the table definitions, and tell what you intend to do.

jdhao

Use the EXISTS key word for TRUE / FALSE return:

select exists(select 1 from contact where id=12)

Extension on this, you can name the returned column for easy reference. Eg select exists(select 1 from contact where id=12) AS "exists"

This is better, because it will always return a value (true or false) instead of sometimes None (depending on your programing language) which might not expand the way you expect.

I have Seq Scan with using this method. I do something wrong?

@Michael.M I have DB table with 30 millions rows and when I use exists or limit 1 I have strong performance drop because Postgres uses Seq Scan instead of Index Scan. And analyze doesn't help.

Would limit 1 in subquery here help or slowen the query?

NPE

How about simply:

select 1 from tbl where userid = 123 limit 1;

where 123 is the userid of the batch that you're about to insert.

The above query will return either an empty set or a single row, depending on whether there are records with the given userid.

If this turns out to be too slow, you could look into creating an index on tbl.userid.

if even a single row from batch exists in table, in that case I don't have to insert my rows because I know for sure they all were inserted.

For this to remain true even if your program gets interrupted mid-batch, I'd recommend that you make sure you manage database transactions appropriately (i.e. that the entire batch gets inserted within a single transaction).

It might be sometimes be programatically easier to "select count(*) from (select 1 ... limit 1)" as it's guaranteed to always return a row with a value of count(*) of 0 or 1.

@DavidAldridge count(*) still means that all the rows have to be read, whereas limit 1 stops at the first record and returns

@Imraan I think you've misinterpreted the query. The COUNT acts on a nested SELECT that has at most 1 row (because the LIMIT is in the subquery).

wildplasser

INSERT INTO target( userid, rightid, count )
  SELECT userid, rightid, count 
  FROM batch
  WHERE NOT EXISTS (
    SELECT * FROM target t2, batch b2
    WHERE t2.userid = b2.userid
    -- ... other keyfields ...
    )       
    ;

BTW: if you want the whole batch to fail in case of a duplicate, then (given a primary key constraint)

INSERT INTO target( userid, rightid, count )
SELECT userid, rightid, count 
FROM batch
    ;

will do exactly what you want: either it succeeds, or it fails.

This will check each row. He wants to do a single check.

No, it does a single check. The subquery is uncorrelated. It will bail out once one matching pair is found.

Right you are, I thought it referred to the outer query. +1 to you

BTW: since the query is inside a transaction, nothing will happen if a duplicate id were to be inserted, hence the subquery can be omitted.

hmm I am not sure I understand. After rights are inserted, I begin to decrement count column. (just some details for picture) If rows already exist and subquery is omitted I think ill get errors with duplicate unique key thrown or? (userid&right form that unique key)

hcnak

as @MikeM pointed out.

select exists(select 1 from contact where id=12)

with index on contact, it can usually reduce time cost to 1 ms.

CREATE INDEX index_contact on contact(id);

Cost of 1ms is huge - can only do 1000 such checks per second. Something around 10M checks per second should be aimed.

aleroot

If you think about the performace ,may be you can use "PERFORM" in a function just like this:

 PERFORM 1 FROM skytf.test_2 WHERE id=i LIMIT 1;
  IF FOUND THEN
      RAISE NOTICE ' found record id=%', i;  
  ELSE
      RAISE NOTICE ' not found record id=%', i;  
 END IF;

doesn't work with me : I get a syntax error near perform

that's pl/pgsql, not SQL, hence the syntax error for "PERFORM" if trying to run it as SQL

Fabian Barney

SELECT 1 FROM user_right where userid = ? LIMIT 1

If your resultset contains a row then you do not have to insert. Otherwise insert your records.

if bunch contains 100 rows it will return me 100 rows, you think thats good?

You can limit it to 1 row. Should perform better. Have a look at edited answer from @aix for that.

Royce

select true from tablename where condition limit 1;

I believe that this is the query that postgres uses for checking foreign keys.

In your case, you could do this in one go too:

insert into yourtable select $userid, $rightid, $count where not (select true from yourtable where userid = $userid limit 1);

Fastest check if row exists in PostgreSQL

Follow WeChat

Want to stay one step ahead of the latest teleworks?

相似问题

Platform

Support

Links

Contact US